* [Patch 0/2] UefiCpuPkg: Default avoid print. @ 2019-07-31 7:35 Dong, Eric 2019-07-31 7:35 ` [Patch 1/2] UefiCpuPkg/RegisterCpuFeaturesLib: " Dong, Eric ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Dong, Eric @ 2019-07-31 7:35 UTC (permalink / raw) To: devel; +Cc: Ray Ni, Laszlo Ersek REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1984 Current debug message brings much restriction for the platform which use this driver. For PEI and DXE phase, platform mush link base DebugLib (without using any pei/dxe services, even for its dependent libraries). This patch default disable this debug message, only open it when need to debug the related code. Signed-off-by: Eric Dong <eric.dong@intel.com> Cc: Ray Ni <ray.ni@intel.com> Cc: Laszlo Ersek <lersek@redhat.com> Eric Dong (2): UefiCpuPkg/RegisterCpuFeaturesLib: Default avoid print. UefiCpuPkg/PiSmmCpuDxeSmm: Default avoid print. .../Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 4 +++- UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) -- 2.21.0.windows.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Patch 1/2] UefiCpuPkg/RegisterCpuFeaturesLib: Default avoid print. 2019-07-31 7:35 [Patch 0/2] UefiCpuPkg: Default avoid print Dong, Eric @ 2019-07-31 7:35 ` Dong, Eric 2019-07-31 7:35 ` [Patch 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Dong, Eric ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: Dong, Eric @ 2019-07-31 7:35 UTC (permalink / raw) To: devel; +Cc: Ray Ni, Laszlo Ersek REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1984 Current debug message brings much restriction for the platform which use this driver. For PEI and DXE phase, platform mush link base DebugLib (without using any pei/dxe services, even for its dependent libraries). This patch default disable this debug message, only open it when need to debug the related code. Signed-off-by: Eric Dong <eric.dong@intel.com> Cc: Ray Ni <ray.ni@intel.com> Cc: Laszlo Ersek <lersek@redhat.com> --- .../Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c index 4e97e863c7..48fbd58c39 100644 --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c @@ -9,7 +9,9 @@ #include "RegisterCpuFeatures.h" CHAR16 *mDependTypeStr[] = {L"None", L"Thread", L"Core", L"Package", L"Invalid" }; +GLOBAL_REMOVE_IF_UNREFERENCED CHAR16 *mRegisterTypeStr[] = {L"MSR", L"CR", L"MMIO", L"CACHE", L"SEMAP", L"INVALID" }; +#define DEBUG_CPU_MSG 0 /** Worker function to save PcdCpuFeaturesCapability. @@ -796,7 +798,7 @@ ProgramProcessorRegister ( ApLocation->Core * CpuStatus->MaxThreadCount + ApLocation->Thread; DEBUG (( - DEBUG_INFO, + DEBUG_CPU_MSG, "Processor = %08lu, Index %08lu, Type = %s!\n", (UINT64)ThreadIndex, (UINT64)Index, -- 2.21.0.windows.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Patch 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Default avoid print. 2019-07-31 7:35 [Patch 0/2] UefiCpuPkg: Default avoid print Dong, Eric 2019-07-31 7:35 ` [Patch 1/2] UefiCpuPkg/RegisterCpuFeaturesLib: " Dong, Eric @ 2019-07-31 7:35 ` Dong, Eric 2019-07-31 12:43 ` [Patch 0/2] UefiCpuPkg: " Laszlo Ersek 2019-08-01 7:51 ` Ni, Ray 3 siblings, 0 replies; 14+ messages in thread From: Dong, Eric @ 2019-07-31 7:35 UTC (permalink / raw) To: devel; +Cc: Ray Ni, Laszlo Ersek REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1984 Current debug message brings much restriction for the platform which use this driver. For PEI and DXE phase, platform mush link base DebugLib (without using any pei/dxe services, even for its dependent libraries). This patch default disable this debug message, only open it when need to debug the related code. Signed-off-by: Eric Dong <eric.dong@intel.com> Cc: Ray Ni <ray.ni@intel.com> Cc: Laszlo Ersek <lersek@redhat.com> --- UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c index 2cfc61b2c6..b8cbc3a9d1 100644 --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c @@ -1,7 +1,7 @@ /** @file Code for Processor S3 restoration -Copyright (c) 2006 - 2018, Intel Corporation. All rights reserved.<BR> +Copyright (c) 2006 - 2019, Intel Corporation. All rights reserved.<BR> SPDX-License-Identifier: BSD-2-Clause-Patent **/ @@ -90,7 +90,9 @@ UINT8 mApHltLoopCodeTemplate[] = { 0xEB, 0xFC // jmp $-2 }; +GLOBAL_REMOVE_IF_UNREFERENCED CHAR16 *mRegisterTypeStr[] = {L"MSR", L"CR", L"MMIO", L"CACHE", L"SEMAP", L"INVALID" }; +#define DEBUG_AP_MSG 0 /** Sync up the MTRR values for all processors. @@ -209,7 +211,7 @@ ProgramProcessorRegister ( ApLocation->Core * CpuStatus->MaxThreadCount + ApLocation->Thread; DEBUG (( - DEBUG_INFO, + DEBUG_AP_MSG, "Processor = %lu, Entry Index %lu, Type = %s!\n", (UINT64)ThreadIndex, (UINT64)Index, -- 2.21.0.windows.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-07-31 7:35 [Patch 0/2] UefiCpuPkg: Default avoid print Dong, Eric 2019-07-31 7:35 ` [Patch 1/2] UefiCpuPkg/RegisterCpuFeaturesLib: " Dong, Eric 2019-07-31 7:35 ` [Patch 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Dong, Eric @ 2019-07-31 12:43 ` Laszlo Ersek 2019-07-31 16:34 ` [edk2-devel] " Brian J. Johnson 2019-08-01 7:51 ` Ni, Ray 3 siblings, 1 reply; 14+ messages in thread From: Laszlo Ersek @ 2019-07-31 12:43 UTC (permalink / raw) To: Eric Dong, devel; +Cc: Ray Ni, Michael Kinney (adding Mike) On 07/31/19 09:35, Eric Dong wrote: > REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1984 > > Current debug message brings much restriction for the platform > which use this driver. > > For PEI and DXE phase, platform mush link base DebugLib (without > using any pei/dxe services, even for its dependent libraries). > > This patch default disable this debug message, only open it when > need to debug the related code. > > Signed-off-by: Eric Dong <eric.dong@intel.com> > Cc: Ray Ni <ray.ni@intel.com> > Cc: Laszlo Ersek <lersek@redhat.com> > > Eric Dong (2): > UefiCpuPkg/RegisterCpuFeaturesLib: Default avoid print. > UefiCpuPkg/PiSmmCpuDxeSmm: Default avoid print. > > .../Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 4 +++- > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 +++- > 2 files changed, 6 insertions(+), 2 deletions(-) > The basic problem seems to be that APs should not use "thick" services that might underlie the DebugLib instance that is picked by the platform. That requirement appears sane to me. I think I disagree with the proposed mitigation though. Reasons: (a) The mitigation is duplicated to independent modules. (b) It is not possible to change the debug mask without modifying C language source code. (c) Passing a zero log mask to DEBUG() on the APs does not guarantee thread safety: - The DEBUG() macro calls DebugPrintEnabled() regardless of the log mask passed to DEBUG(). - The DEBUG() macro may or may not call DebugPrintLevelEnabled(), dependent on architecture & toolchain. - Both DebugPrintEnabled() and DebugPrintLevelEnabled() are DebugLib interfaces. The library instance may implement them unsafely for APs, and a zero log mask at the DEBUG call site could not prevent that. - Finally, DebugPrint() itself could invoke thread-unsafe logic, before consulting the log mask. I would propose the following, instead: (i) Introduce BIT6 for PcdDebugPropertyMask in "MdePkg.dec". The default value should be zero. The bit stands for "DEBUG is safe to call on APs". (ii) Add a macro called AP_DEBUG to <DebugLib.h>. This macro should work the same as DEBUG, except it should do nothing if BIT6 in PcdDebugProperyMask is clear. Fetching PcdDebugPropertyMask inside AP_DEBUG() is safe, because: - the PCD can only be fixed-at-build or patchable-in-module (therefore it is safe to read on APs -- no PCD PPI or PCD Protocol is needed); - PcdDebugPropertyMask is a preexistent PCD that *all* existent DebugLib instances are expected to consume -- per the API specifications in <DebugLib.h> --, therefore no new PCD dependency would be introduced to DebugLib instances. (iii) Modules that call DEBUG on APs should replace those calls with AP_DEBUG. Code that currently calls DEBUG while running on either BSP or APs should discriminate those cases from each other, and use AP_DEBUG explicitly, when it runs on APs. As a further refinement, a macro called MP_DEBUG could be introduced too, with a new initial parameter called "Bsp". If the Bsp parameter is TRUE, then MP_DEBUG is identical to DEBUG. Otherwise, MP_DEBUG is identical to AP_DEBUG. This way, DEBUG() calls such as described above wouldn't have to be split into DEBUG / AP_DEBUG calls; they could be changed into MP_DEBUG calls (with an extra parameter in the front). (iv) platforms can set BIT6 in PcdDebugPropertyMask in DSC files. This need not be a full platform-level setting: the PCD can be overridden in module scope, just like the DebugLib resolution can be module-scoped. As an end result, AP_DEBUG messages will disappear by default (safely), and platforms will have to do extra work only if they want AP_DEBUG messages to appear. Otherwise the change is transparent to platforms. And, I think that AP_DEBUG belongs in MdePkg (and not UefiCpuPkg) because both DebugLib and EFI_MP_SERVICES_PROTOCOL are declared in MdePkg. While UefiCpuPkg provides the multiprocessing implementation for IA32 and X64, the problem is architecture-independent. Furthermore, the problem is a long-standing and recurrent one -- please refer to commit 81f560498bf1, for example --, so it makes sense to solve it once and for all. Thanks Laszlo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-07-31 12:43 ` [Patch 0/2] UefiCpuPkg: " Laszlo Ersek @ 2019-07-31 16:34 ` Brian J. Johnson 2019-07-31 18:06 ` Andrew Fish 2019-07-31 18:58 ` Laszlo Ersek 0 siblings, 2 replies; 14+ messages in thread From: Brian J. Johnson @ 2019-07-31 16:34 UTC (permalink / raw) To: devel, lersek, Eric Dong; +Cc: Ray Ni, Michael Kinney On 7/31/19 7:43 AM, Laszlo Ersek wrote: > (adding Mike) > > On 07/31/19 09:35, Eric Dong wrote: >> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1984 >> >> Current debug message brings much restriction for the platform >> which use this driver. >> >> For PEI and DXE phase, platform mush link base DebugLib (without >> using any pei/dxe services, even for its dependent libraries). >> >> This patch default disable this debug message, only open it when >> need to debug the related code. >> >> Signed-off-by: Eric Dong <eric.dong@intel.com> >> Cc: Ray Ni <ray.ni@intel.com> >> Cc: Laszlo Ersek <lersek@redhat.com> >> >> Eric Dong (2): >> UefiCpuPkg/RegisterCpuFeaturesLib: Default avoid print. >> UefiCpuPkg/PiSmmCpuDxeSmm: Default avoid print. >> >> .../Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 4 +++- >> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 +++- >> 2 files changed, 6 insertions(+), 2 deletions(-) >> > > The basic problem seems to be that APs should not use "thick" services > that might underlie the DebugLib instance that is picked by the > platform. That requirement appears sane to me. > > I think I disagree with the proposed mitigation though. Reasons: > > (a) The mitigation is duplicated to independent modules. > > (b) It is not possible to change the debug mask without modifying C > language source code. > > (c) Passing a zero log mask to DEBUG() on the APs does not guarantee > thread safety: > > - The DEBUG() macro calls DebugPrintEnabled() regardless of the log mask > passed to DEBUG(). > > - The DEBUG() macro may or may not call DebugPrintLevelEnabled(), > dependent on architecture & toolchain. > > - Both DebugPrintEnabled() and DebugPrintLevelEnabled() are DebugLib > interfaces. The library instance may implement them unsafely for APs, > and a zero log mask at the DEBUG call site could not prevent that. > > - Finally, DebugPrint() itself could invoke thread-unsafe logic, before > consulting the log mask. > > > I would propose the following, instead: > > (i) Introduce BIT6 for PcdDebugPropertyMask in "MdePkg.dec". The default > value should be zero. The bit stands for "DEBUG is safe to call on APs". > > (ii) Add a macro called AP_DEBUG to <DebugLib.h>. > > This macro should work the same as DEBUG, except it should do nothing if > BIT6 in PcdDebugProperyMask is clear. > > Fetching PcdDebugPropertyMask inside AP_DEBUG() is safe, because: > > - the PCD can only be fixed-at-build or patchable-in-module (therefore > it is safe to read on APs -- no PCD PPI or PCD Protocol is needed); > > - PcdDebugPropertyMask is a preexistent PCD that *all* existent DebugLib > instances are expected to consume -- per the API specifications in > <DebugLib.h> --, therefore no new PCD dependency would be introduced to > DebugLib instances. > > (iii) Modules that call DEBUG on APs should replace those calls with > AP_DEBUG. Code that currently calls DEBUG while running on either BSP or > APs should discriminate those cases from each other, and use AP_DEBUG > explicitly, when it runs on APs. > > As a further refinement, a macro called MP_DEBUG could be introduced > too, with a new initial parameter called "Bsp". If the Bsp parameter is > TRUE, then MP_DEBUG is identical to DEBUG. Otherwise, MP_DEBUG is > identical to AP_DEBUG. This way, DEBUG() calls such as described above > wouldn't have to be split into DEBUG / AP_DEBUG calls; they could be > changed into MP_DEBUG calls (with an extra parameter in the front). > > (iv) platforms can set BIT6 in PcdDebugPropertyMask in DSC files. This > need not be a full platform-level setting: the PCD can be overridden in > module scope, just like the DebugLib resolution can be module-scoped. > > > As an end result, AP_DEBUG messages will disappear by default (safely), > and platforms will have to do extra work only if they want AP_DEBUG > messages to appear. Otherwise the change is transparent to platforms. > > And, I think that AP_DEBUG belongs in MdePkg (and not UefiCpuPkg) > because both DebugLib and EFI_MP_SERVICES_PROTOCOL are declared in > MdePkg. While UefiCpuPkg provides the multiprocessing implementation for > IA32 and X64, the problem is architecture-independent. Furthermore, the > problem is a long-standing and recurrent one -- please refer to commit > 81f560498bf1, for example --, so it makes sense to solve it once and for > all. > > Thanks > Laszlo Laszlo, Defining a PCD bit for DEBUG() AP safety is an excellent suggestion. As you said, this is a long-standing, recurrent problem which keeps biting real platforms, and it would be good to have a solid, platform-independent solution. I do wonder if there would be a clean way to let a DebugLib instance itself declare that AP_DEBUG() is safe. That way a platform would only need to override the DebugLib instance in the DSC file, rather than both the instance and the PCD. (I know, I'm nitpicking.) A library can't override PCDs in its calling modules, of course. I suppose the AP_DEBUG() macro could call a new DebugLib entry point to test for AP safety before doing anything else, say DebugPrintOnApIsSafe(). Or it could even be a global CONST BOOLEAN defined by the library. But that would require all DebugLib instances to change, which is something you were trying to avoid. However, it's not always practical to track down all uses of DEBUG(). An AP can easily call a library routine which uses DEBUG() rather than AP_DEBUG(), buried under several layers of transitive library dependencies. In other words, it's not always practical to determine ahead of time if a given DEBUG() call may be done on an AP. I know that AP code runs in a very restricted environment and that people who use MpServices are supposed to understand the repercussions, but it gets very difficult when libraries are involved. :( So would a better solution be to modify the common unsafe DebugLib instances to have DebugPrintEnabled() return FALSE on APs? That would probably require a new BaseLib interface to determine if the caller is running on the BSP or an AP. (For IA32/X64 this isn't too hard -- it just needs to check a bit in the local APIC. I have no idea about other architectures.) That wouldn't solve the problem everywhere -- anyone using a custom DebugLib would have to update it themselves. But it would solve it solidly in the majority of cases. Thoughts? -- Brian J. Johnson Enterprise X86 Lab Hewlett Packard Enterprise ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-07-31 16:34 ` [edk2-devel] " Brian J. Johnson @ 2019-07-31 18:06 ` Andrew Fish 2019-07-31 18:58 ` Laszlo Ersek 1 sibling, 0 replies; 14+ messages in thread From: Andrew Fish @ 2019-07-31 18:06 UTC (permalink / raw) To: devel, brian.johnson; +Cc: Laszlo Ersek, Eric Dong, Ray Ni, Mike Kinney [-- Attachment #1: Type: text/plain, Size: 9272 bytes --] > On Jul 31, 2019, at 9:34 AM, Brian J. Johnson <brian.johnson@hpe.com> wrote: > > On 7/31/19 7:43 AM, Laszlo Ersek wrote: >> (adding Mike) >> On 07/31/19 09:35, Eric Dong wrote: >>> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1984 >>> >>> Current debug message brings much restriction for the platform >>> which use this driver. >>> >>> For PEI and DXE phase, platform mush link base DebugLib (without >>> using any pei/dxe services, even for its dependent libraries). >>> >>> This patch default disable this debug message, only open it when >>> need to debug the related code. >>> >>> Signed-off-by: Eric Dong <eric.dong@intel.com> >>> Cc: Ray Ni <ray.ni@intel.com> >>> Cc: Laszlo Ersek <lersek@redhat.com> >>> >>> Eric Dong (2): >>> UefiCpuPkg/RegisterCpuFeaturesLib: Default avoid print. >>> UefiCpuPkg/PiSmmCpuDxeSmm: Default avoid print. >>> >>> .../Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 4 +++- >>> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 +++- >>> 2 files changed, 6 insertions(+), 2 deletions(-) >>> >> The basic problem seems to be that APs should not use "thick" services >> that might underlie the DebugLib instance that is picked by the >> platform. That requirement appears sane to me. >> I think I disagree with the proposed mitigation though. Reasons: >> (a) The mitigation is duplicated to independent modules. >> (b) It is not possible to change the debug mask without modifying C >> language source code. >> (c) Passing a zero log mask to DEBUG() on the APs does not guarantee >> thread safety: >> - The DEBUG() macro calls DebugPrintEnabled() regardless of the log mask >> passed to DEBUG(). >> - The DEBUG() macro may or may not call DebugPrintLevelEnabled(), >> dependent on architecture & toolchain. >> - Both DebugPrintEnabled() and DebugPrintLevelEnabled() are DebugLib >> interfaces. The library instance may implement them unsafely for APs, >> and a zero log mask at the DEBUG call site could not prevent that. >> - Finally, DebugPrint() itself could invoke thread-unsafe logic, before >> consulting the log mask. >> I would propose the following, instead: >> (i) Introduce BIT6 for PcdDebugPropertyMask in "MdePkg.dec". The default >> value should be zero. The bit stands for "DEBUG is safe to call on APs". >> (ii) Add a macro called AP_DEBUG to <DebugLib.h>. >> This macro should work the same as DEBUG, except it should do nothing if >> BIT6 in PcdDebugProperyMask is clear. >> Fetching PcdDebugPropertyMask inside AP_DEBUG() is safe, because: >> - the PCD can only be fixed-at-build or patchable-in-module (therefore >> it is safe to read on APs -- no PCD PPI or PCD Protocol is needed); >> - PcdDebugPropertyMask is a preexistent PCD that *all* existent DebugLib >> instances are expected to consume -- per the API specifications in >> <DebugLib.h> --, therefore no new PCD dependency would be introduced to >> DebugLib instances. >> (iii) Modules that call DEBUG on APs should replace those calls with >> AP_DEBUG. Code that currently calls DEBUG while running on either BSP or >> APs should discriminate those cases from each other, and use AP_DEBUG >> explicitly, when it runs on APs. >> As a further refinement, a macro called MP_DEBUG could be introduced >> too, with a new initial parameter called "Bsp". If the Bsp parameter is >> TRUE, then MP_DEBUG is identical to DEBUG. Otherwise, MP_DEBUG is >> identical to AP_DEBUG. This way, DEBUG() calls such as described above >> wouldn't have to be split into DEBUG / AP_DEBUG calls; they could be >> changed into MP_DEBUG calls (with an extra parameter in the front). >> (iv) platforms can set BIT6 in PcdDebugPropertyMask in DSC files. This >> need not be a full platform-level setting: the PCD can be overridden in >> module scope, just like the DebugLib resolution can be module-scoped. >> As an end result, AP_DEBUG messages will disappear by default (safely), >> and platforms will have to do extra work only if they want AP_DEBUG >> messages to appear. Otherwise the change is transparent to platforms. >> And, I think that AP_DEBUG belongs in MdePkg (and not UefiCpuPkg) >> because both DebugLib and EFI_MP_SERVICES_PROTOCOL are declared in >> MdePkg. While UefiCpuPkg provides the multiprocessing implementation for >> IA32 and X64, the problem is architecture-independent. Furthermore, the >> problem is a long-standing and recurrent one -- please refer to commit >> 81f560498bf1, for example --, so it makes sense to solve it once and for >> all. >> Thanks >> Laszlo > > Laszlo, > > Defining a PCD bit for DEBUG() AP safety is an excellent suggestion. As you said, this is a long-standing, recurrent problem which keeps biting real platforms, and it would be good to have a solid, platform-independent solution. > > I do wonder if there would be a clean way to let a DebugLib instance itself declare that AP_DEBUG() is safe. That way a platform would only need to override the DebugLib instance in the DSC file, rather than both the instance and the PCD. (I know, I'm nitpicking.) A library can't override PCDs in its calling modules, of course. I suppose the AP_DEBUG() macro could call a new DebugLib entry point to test for AP safety before doing anything else, say DebugPrintOnApIsSafe(). Or it could even be a global CONST BOOLEAN defined by the library. But that would require all DebugLib instances to change, which is something you were trying to avoid. > > However, it's not always practical to track down all uses of DEBUG(). An AP can easily call a library routine which uses DEBUG() rather than AP_DEBUG(), buried under several layers of transitive library dependencies. In other words, it's not always practical to determine ahead of time if a given DEBUG() call may be done on an AP. I know that AP code runs in a very restricted environment and that people who use MpServices are supposed to understand the repercussions, but it gets very difficult when libraries are involved. :( > > So would a better solution be to modify the common unsafe DebugLib instances to have DebugPrintEnabled() return FALSE on APs? That would probably require a new BaseLib interface to determine if the caller is running on the BSP or an AP. (For IA32/X64 this isn't too hard -- it just needs to check a bit in the local APIC. I have no idea about other architectures.) That wouldn't solve the problem everywhere -- anyone using a custom DebugLib would have to update it themselves. But it would solve it solidly in the majority of cases. > > Thoughts? For DXE you can use the MpServices protocol to tell if you are on the BSP or not. https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Protocol/MpService.h EFI_MP_SERVICES_PROTOCOL.WhoAmI() gets you your ProcessorNumber. EFI_MP_SERVICES_PROTOCOL.GetProcessorInfo() returns the ProcessInfoBuffer that lets you know if you are the BSP. You would want to grab the MpServices protocol pointer early in the driver prior to doing any work on the AP. typedef struct { /// /// The unique processor ID determined by system hardware. For IA32 and X64, /// the processor ID is the same as the Local APIC ID. Only the lower 8 bits /// are used, and higher bits are reserved. For IPF, the lower 16 bits contains /// id/eid, and higher bits are reserved. /// UINT64 ProcessorId; /// /// Flags indicating if the processor is BSP or AP, if the processor is enabled /// or disabled, and if the processor is healthy. Bits 3..31 are reserved and /// must be 0. /// /// <pre> /// BSP ENABLED HEALTH Description /// === ======= ====== =================================================== /// 0 0 0 Unhealthy Disabled AP. /// 0 0 1 Healthy Disabled AP. /// 0 1 0 Unhealthy Enabled AP. /// 0 1 1 Healthy Enabled AP. /// 1 0 0 Invalid. The BSP can never be in the disabled state. /// 1 0 1 Invalid. The BSP can never be in the disabled state. /// 1 1 0 Unhealthy Enabled BSP. /// 1 1 1 Healthy Enabled BSP. /// </pre> /// UINT32 StatusFlag; /// /// The physical location of the processor, including the physical package number /// that identifies the cartridge, the physical core number within package, and /// logical thread number within core. /// EFI_CPU_PHYSICAL_LOCATION Location; } EFI_PROCESSOR_INFORMATION; DebugPrinttEnabled() and DebugAssertEnabled() would be needed at a minimum as I don't think you really want to be ASSERTing on the AP, unless that is your intent. I think on a lot of platforms end up having these functions be based on FixedAtBuild PCDs and the function calls get optimized away. Thus it seems like you would want an MP safe version of a given library. This same problem kind of exists for Runtime drivers too. You would not want to ASSERT or DEBUG print at runtime in a Lib that was not Runtime safe. Thanks, Andrew Fish > -- > Brian J. Johnson > Enterprise X86 Lab > > Hewlett Packard Enterprise > > [-- Attachment #2: Type: text/html, Size: 55399 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-07-31 16:34 ` [edk2-devel] " Brian J. Johnson 2019-07-31 18:06 ` Andrew Fish @ 2019-07-31 18:58 ` Laszlo Ersek 2019-07-31 22:04 ` Johnson, Michael 2019-08-01 20:20 ` Brian J. Johnson 1 sibling, 2 replies; 14+ messages in thread From: Laszlo Ersek @ 2019-07-31 18:58 UTC (permalink / raw) To: Brian J. Johnson, devel, Eric Dong; +Cc: Ray Ni, Michael Kinney On 07/31/19 18:34, Brian J. Johnson wrote: > I do wonder if there would be a clean way to let a DebugLib instance > itself declare that AP_DEBUG() is safe. That way a platform would > only need to override the DebugLib instance in the DSC file, rather > than both the instance and the PCD. (I know, I'm nitpicking.) A > library can't override PCDs in its calling modules, of course. I > suppose the AP_DEBUG() macro could call a new DebugLib entry point to > test for AP safety before doing anything else, say > DebugPrintOnApIsSafe(). Or it could even be a global CONST BOOLEAN > defined by the library. But that would require all DebugLib instances > to change, which is something you were trying to avoid. That's right -- I tried to imagine some other approaches, but they'd need new DebugLib functions, and likely force platforms to write new code. > However, it's not always practical to track down all uses of DEBUG(). > An AP can easily call a library routine which uses DEBUG() rather than > AP_DEBUG(), buried under several layers of transitive library > dependencies. In other words, it's not always practical to determine > ahead of time if a given DEBUG() call may be done on an AP. This problem is valid IMO, but I think its scope is a lot wider than just DebugLib. Assume the programmer is looking at a function that may be invoked on an AP, and they are about to call a function for taking care of a specific sub-task. If the programmer cannot prove the thread-safety of the *entire* call tree underneath the function call they are about to add, they simply must not add the call. The thread-safety of the DebugLib instance in use is just a part of the thread-safety of said call tree. Put differently, code that runs APs must be extremely self-contained; I'd rule out any and all lib classes from direct use unless a specific library instance, advertizing thread safety, would be chosen in the platform DSC file. But, if we adopted this approach, we could even introduce a new AP-oriented library *class* for debug messages, offer a Null implementation in edk2, and ask platforms to bring their own. > I know that AP code runs in a very restricted environment and that > people who use MpServices are supposed to understand the > repercussions, but it gets very difficult when libraries are > involved. :( Exactly -- the first restriction people should understand is, "stay away from libraries as much as you can". > So would a better solution be to modify the common unsafe DebugLib > instances to have DebugPrintEnabled() return FALSE on APs? That would > probably require a new BaseLib interface to determine if the caller is > running on the BSP or an AP. I agree that "AmIAnAP()" would be a pre-requisite. > (For IA32/X64 this isn't too hard -- it just needs to check a bit in > the local APIC. Still not trivial, as some DebugLib instances might want to target runtime drivers (or even SMM drivers). For runtime drivers the complication is that a runtime (virtual address) mapping for the LAPIC MMIO range would be needed (if I understand correctly anyway). And for both runtime and SMM drivers, it could be a problem that on physical hardware, the MMIO range of the LAPIC can be moved (reprogrammed) to a different base address, possibly by the OS too. I could be quite confused about this, of course; I don't eat LAPICs for breakfast :) I just recall an SMM firmware vulnerability that was in part based on moving the LAPIC base address elsewhere. Hm... googling suggests the attack was called "The Memory Sinkhole". > I have no idea about other architectures.) That wouldn't solve the > problem everywhere -- anyone using a custom DebugLib would have to > update it themselves. But it would solve it solidly in the majority > of cases. > > Thoughts? My fear might not be reasonable, but I feel quite uncomfortable about LAPIC accesses in DebugLib APIs. The information ("BSP or AP") is safer to determine at the call site, I think, even if it takes more human work. I could very well be biased. In OvmfPkg we have a minuscule amount of code that runs on APs, and even that code is written with total minimalism in mind. Leaping to a different topic... Years ago I was tracking down an MTRR setup bug in the Xen hypervisor (as it was shipped as a part of RHEL5). It is necessary to setup MTRRs identically on all CPUs, plus it has to be done while all CPUs are in a "pen" doing nothing but setting up MTRRs. The bug was exactly in that part of the code (running simultaneously on more than a hundred CPUs). It was impossible to print anything to the serial console -- first because it would be unreadable for humans, and second because the delays would perturb the buggy behavior. In the end I had to introduce an array of per-CPU debug structures where each CPU would record its own view (snapshot) of a shared resource -- a shared resource that should have been protected by mutual exclusion between the CPUs. After the CPUs left the "pen" (with the invalid MTRR configuration established), I'd use the BSP to dump the array. That showed me that some CPUs had overlapping / inconsistent views of the shared resource between each other. This proved that the mutual exclusion primitive didn't work as expected -- it turns out that the semaphore (or spinlock, not sure) in question used an INT8 counter, which overflowed when more than 127 CPUs contended for the resource. I'm sure I'm misremembering parts of this story (from several years distance), the moral is that debugging in a multiprocessing environment may easily require its own dedicated infrastructure. In edk2, we don't have anything like that, I think. Could we build it, sufficiently generally? Like, prepare a log buffer for each CPU before calling StartupAllAps(), log only to that buffer during the concurrent execution, and finally dump the buffers? I guess if we don't *reach* the "finally" part, we could still dump the RAM and investigate the log buffers that way... Dumping the RAM is certainly an option for virtual machines, but it might be viable for physical setups too (JTAG, debug agent... dunno). Sorry about the wild speculation :) Thanks Laszlo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-07-31 18:58 ` Laszlo Ersek @ 2019-07-31 22:04 ` Johnson, Michael 2019-08-02 0:12 ` Laszlo Ersek 2019-08-01 20:20 ` Brian J. Johnson 1 sibling, 1 reply; 14+ messages in thread From: Johnson, Michael @ 2019-07-31 22:04 UTC (permalink / raw) To: Laszlo Ersek, devel [-- Attachment #1: Type: text/plain, Size: 2707 bytes --] On Wed, Jul 31, 2019 at 11:58 AM, Laszlo Ersek wrote: > > I'm sure I'm misremembering parts of this story (from several years > distance), the moral is that debugging in a multiprocessing environment > may easily require its own dedicated infrastructure. In edk2, we don't > have anything like that, I think. Could we build it, sufficiently > generally? Like, prepare a log buffer for each CPU before calling > StartupAllAps(), log only to that buffer during the concurrent > execution, and finally dump the buffers? I guess if we don't *reach* the > "finally" part, we could still dump the RAM and investigate the log > buffers that way... Dumping the RAM is certainly an option for virtual > machines, but it might be viable for physical setups too (JTAG, debug > agent... dunno). > > Sorry about the wild speculation :) Laszlo, The wild speculation is good. In fact, I would say it should get wilder still. Consider systems with multiple CPU sockets contending for control of a single debug channel. There is no edk2 solution for granting and revoking control of debug between sockets. Worse still, early enough in boot there may be no coherent memory where the sockets could put log buffers, so we can't charge one of them with dumping all the output. Instead it becomes necessary to solve the coordination problem more directly. I actually have this problem, and a functional-but-not-pretty solution. I'd like to see an edk2 supported way to handle it. What I've got now funnels all such debug through the report status code infrastructure and replaces the serial port status code handler with an implementation that's gated by a hardware-backed coordination primitive. I don't see a way around putting a hardware-backed primitive in there somewhere if sockets without shared memory are to be supported, but the rest of the details are open to alternatives. A solution would need to define one-or-more coordination primitives via library class and create some way to enable those libraries to gate debug. This can work for sockets or APs at any time, while buffer-and-dump can only work on sockets if they've already got coherent memory. Of course buffer-and-dump has other desirable properties and doesn't require anything from hardware, so maybe we want to enable both things at the levels they work/make sense. While I'm blathering about debug, I do not see a downside to making MAX_DEBUG_MESSAGE_LENGTH a fixed at build PCD. It's currently #define'd in a variety of places, which in my opinion is just asking for trouble. Does anybody have any other cans of worms to open on the broadening-debug-infrastructure front? Thanks, -Michael [-- Attachment #2: Type: text/html, Size: 3163 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-07-31 22:04 ` Johnson, Michael @ 2019-08-02 0:12 ` Laszlo Ersek 0 siblings, 0 replies; 14+ messages in thread From: Laszlo Ersek @ 2019-08-02 0:12 UTC (permalink / raw) To: Johnson, Michael, devel On 08/01/19 00:04, Johnson, Michael wrote: > I do not see a downside to making MAX_DEBUG_MESSAGE_LENGTH a fixed at build PCD. It's currently #define'd in a variety of places, which in my opinion is just asking for trouble. A fixed-at-build PCD sounds good to me. (Not ignoring the rest, just got nothing to add :)) Thanks! Laszlo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-07-31 18:58 ` Laszlo Ersek 2019-07-31 22:04 ` Johnson, Michael @ 2019-08-01 20:20 ` Brian J. Johnson 2019-08-01 21:14 ` Andrew Fish 1 sibling, 1 reply; 14+ messages in thread From: Brian J. Johnson @ 2019-08-01 20:20 UTC (permalink / raw) To: Laszlo Ersek, devel, Eric Dong; +Cc: Ray Ni, Michael Kinney On 7/31/19 1:58 PM, Laszlo Ersek wrote: > On 07/31/19 18:34, Brian J. Johnson wrote: > >> I do wonder if there would be a clean way to let a DebugLib instance >> itself declare that AP_DEBUG() is safe. That way a platform would >> only need to override the DebugLib instance in the DSC file, rather >> than both the instance and the PCD. (I know, I'm nitpicking.) A >> library can't override PCDs in its calling modules, of course. I >> suppose the AP_DEBUG() macro could call a new DebugLib entry point to >> test for AP safety before doing anything else, say >> DebugPrintOnApIsSafe(). Or it could even be a global CONST BOOLEAN >> defined by the library. But that would require all DebugLib instances >> to change, which is something you were trying to avoid. > > That's right -- I tried to imagine some other approaches, but they'd > need new DebugLib functions, and likely force platforms to write new > code. > > >> However, it's not always practical to track down all uses of DEBUG(). >> An AP can easily call a library routine which uses DEBUG() rather than >> AP_DEBUG(), buried under several layers of transitive library >> dependencies. In other words, it's not always practical to determine >> ahead of time if a given DEBUG() call may be done on an AP. > > This problem is valid IMO, but I think its scope is a lot wider than > just DebugLib. Assume the programmer is looking at a function that may > be invoked on an AP, and they are about to call a function for taking > care of a specific sub-task. If the programmer cannot prove the > thread-safety of the *entire* call tree underneath the function call > they are about to add, they simply must not add the call. The > thread-safety of the DebugLib instance in use is just a part of the > thread-safety of said call tree. > > Put differently, code that runs APs must be extremely self-contained; > I'd rule out any and all lib classes from direct use unless a specific > library instance, advertizing thread safety, would be chosen in the > platform DSC file. But, if we adopted this approach, we could even > introduce a new AP-oriented library *class* for debug messages, offer a > Null implementation in edk2, and ask platforms to bring their own. > > >> I know that AP code runs in a very restricted environment and that >> people who use MpServices are supposed to understand the >> repercussions, but it gets very difficult when libraries are >> involved. :( > > Exactly -- the first restriction people should understand is, "stay away > from libraries as much as you can". > > >> So would a better solution be to modify the common unsafe DebugLib >> instances to have DebugPrintEnabled() return FALSE on APs? That would >> probably require a new BaseLib interface to determine if the caller is >> running on the BSP or an AP. > > I agree that "AmIAnAP()" would be a pre-requisite. > > >> (For IA32/X64 this isn't too hard -- it just needs to check a bit in >> the local APIC. > > Still not trivial, as some DebugLib instances might want to target > runtime drivers (or even SMM drivers). For runtime drivers the > complication is that a runtime (virtual address) mapping for the LAPIC > MMIO range would be needed (if I understand correctly anyway). And for > both runtime and SMM drivers, it could be a problem that on physical > hardware, the MMIO range of the LAPIC can be moved (reprogrammed) to a > different base address, possibly by the OS too. > > I could be quite confused about this, of course; I don't eat LAPICs for > breakfast :) I just recall an SMM firmware vulnerability that was in > part based on moving the LAPIC base address elsewhere. Hm... googling > suggests the attack was called "The Memory Sinkhole". > I finally looked at the documentation (imagine!), and the BSP flag is bit 8 in the IA32_APIC_BASE (0x1B) MSR. So no local APIC access is necessary, only reading an MSR. This MSR has been present since the Pentium Pro (family/model 06_01H), and is considered "architectural." > >> I have no idea about other architectures.) That wouldn't solve the >> problem everywhere -- anyone using a custom DebugLib would have to >> update it themselves. But it would solve it solidly in the majority >> of cases. >> >> Thoughts? > > My fear might not be reasonable, but I feel quite uncomfortable about > LAPIC accesses in DebugLib APIs. The information ("BSP or AP") is safer > to determine at the call site, I think, even if it takes more human > work. > I understand the concern. DebugLib is exposed in a lot of awkward end cases and odd environments. Caution is warranted. But checking a bit in a MSR is about as simple a test as you can do. It sounds reasonable to me. Actually reading the local APIC would, as you said, be a lot more problematic. The observation remains that the problem is a lot bigger than DebugLib. We can't add AmIAnAP() calls on every single library interface. Are there any other spots people know of which tend to cause a lot of grief for APs? Maybe the memory allocation boot services? At least if there's a simple test for AP-ness which doesn't have additional dependencies, it's easy to add checks as needed. I don't know if this discussion will go anywhere... but it has been an interesting one to have. > I could very well be biased. In OvmfPkg we have a minuscule amount of > code that runs on APs, and even that code is written with total > minimalism in mind. > > Leaping to a different topic... Years ago I was tracking down an MTRR > setup bug in the Xen hypervisor (as it was shipped as a part of RHEL5). > It is necessary to setup MTRRs identically on all CPUs, plus it has to > be done while all CPUs are in a "pen" doing nothing but setting up > MTRRs. The bug was exactly in that part of the code (running > simultaneously on more than a hundred CPUs). It was impossible to print > anything to the serial console -- first because it would be unreadable > for humans, and second because the delays would perturb the buggy > behavior. In the end I had to introduce an array of per-CPU debug > structures where each CPU would record its own view (snapshot) of a > shared resource -- a shared resource that should have been protected by > mutual exclusion between the CPUs. After the CPUs left the "pen" (with > the invalid MTRR configuration established), I'd use the BSP to dump the > array. That showed me that some CPUs had overlapping / inconsistent > views of the shared resource between each other. This proved that the > mutual exclusion primitive didn't work as expected -- it turns out that > the semaphore (or spinlock, not sure) in question used an INT8 counter, > which overflowed when more than 127 CPUs contended for the resource. > I too have done battle with MTRR initialization, with over a thousand processors. I feel your pain. > I'm sure I'm misremembering parts of this story (from several years > distance), the moral is that debugging in a multiprocessing environment > may easily require its own dedicated infrastructure. In edk2, we don't > have anything like that, I think. Could we build it, sufficiently > generally? Like, prepare a log buffer for each CPU before calling > StartupAllAps(), log only to that buffer during the concurrent > execution, and finally dump the buffers? I guess if we don't *reach* the > "finally" part, we could still dump the RAM and investigate the log > buffers that way... Dumping the RAM is certainly an option for virtual > machines, but it might be viable for physical setups too (JTAG, debug > agent... dunno). > > Sorry about the wild speculation :) > > Thanks > Laszlo > -- Brian -------------------------------------------------------------------- "Are we going to push the edge of the envelope, Brain?" "No, Pinky, but we may get to the sticky part." -- Quoted on the Net ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-08-01 20:20 ` Brian J. Johnson @ 2019-08-01 21:14 ` Andrew Fish 2019-08-02 21:45 ` Laszlo Ersek 0 siblings, 1 reply; 14+ messages in thread From: Andrew Fish @ 2019-08-01 21:14 UTC (permalink / raw) To: devel, brian.johnson; +Cc: Laszlo Ersek, Eric Dong, Ray Ni, Mike Kinney [-- Attachment #1: Type: text/plain, Size: 9274 bytes --] > On Aug 1, 2019, at 1:20 PM, Brian J. Johnson <brian.johnson@hpe.com> wrote: > > On 7/31/19 1:58 PM, Laszlo Ersek wrote: >> On 07/31/19 18:34, Brian J. Johnson wrote: >>> I do wonder if there would be a clean way to let a DebugLib instance >>> itself declare that AP_DEBUG() is safe. That way a platform would >>> only need to override the DebugLib instance in the DSC file, rather >>> than both the instance and the PCD. (I know, I'm nitpicking.) A >>> library can't override PCDs in its calling modules, of course. I >>> suppose the AP_DEBUG() macro could call a new DebugLib entry point to >>> test for AP safety before doing anything else, say >>> DebugPrintOnApIsSafe(). Or it could even be a global CONST BOOLEAN >>> defined by the library. But that would require all DebugLib instances >>> to change, which is something you were trying to avoid. >> That's right -- I tried to imagine some other approaches, but they'd >> need new DebugLib functions, and likely force platforms to write new >> code. >>> However, it's not always practical to track down all uses of DEBUG(). >>> An AP can easily call a library routine which uses DEBUG() rather than >>> AP_DEBUG(), buried under several layers of transitive library >>> dependencies. In other words, it's not always practical to determine >>> ahead of time if a given DEBUG() call may be done on an AP. >> This problem is valid IMO, but I think its scope is a lot wider than >> just DebugLib. Assume the programmer is looking at a function that may >> be invoked on an AP, and they are about to call a function for taking >> care of a specific sub-task. If the programmer cannot prove the >> thread-safety of the *entire* call tree underneath the function call >> they are about to add, they simply must not add the call. The >> thread-safety of the DebugLib instance in use is just a part of the >> thread-safety of said call tree. >> Put differently, code that runs APs must be extremely self-contained; >> I'd rule out any and all lib classes from direct use unless a specific >> library instance, advertizing thread safety, would be chosen in the >> platform DSC file. But, if we adopted this approach, we could even >> introduce a new AP-oriented library *class* for debug messages, offer a >> Null implementation in edk2, and ask platforms to bring their own. >>> I know that AP code runs in a very restricted environment and that >>> people who use MpServices are supposed to understand the >>> repercussions, but it gets very difficult when libraries are >>> involved. :( >> Exactly -- the first restriction people should understand is, "stay away >> from libraries as much as you can". >>> So would a better solution be to modify the common unsafe DebugLib >>> instances to have DebugPrintEnabled() return FALSE on APs? That would >>> probably require a new BaseLib interface to determine if the caller is >>> running on the BSP or an AP. >> I agree that "AmIAnAP()" would be a pre-requisite. >>> (For IA32/X64 this isn't too hard -- it just needs to check a bit in >>> the local APIC. >> Still not trivial, as some DebugLib instances might want to target >> runtime drivers (or even SMM drivers). For runtime drivers the >> complication is that a runtime (virtual address) mapping for the LAPIC >> MMIO range would be needed (if I understand correctly anyway). And for >> both runtime and SMM drivers, it could be a problem that on physical >> hardware, the MMIO range of the LAPIC can be moved (reprogrammed) to a >> different base address, possibly by the OS too. >> I could be quite confused about this, of course; I don't eat LAPICs for >> breakfast :) I just recall an SMM firmware vulnerability that was in >> part based on moving the LAPIC base address elsewhere. Hm... googling >> suggests the attack was called "The Memory Sinkhole". > > I finally looked at the documentation (imagine!), and the BSP flag is bit 8 in the IA32_APIC_BASE (0x1B) MSR. So no local APIC access is necessary, only reading an MSR. This MSR has been present since the Pentium Pro (family/model 06_01H), and is considered "architectural." > Brian, But this is still only valid for x86 CPUs, so it is not generic to all supported CPUs. I guess we could add a library for "MP" debugging, or a couple of libraries as you may want a simple library that gives you info like are you running on the BSP, and maybe a more complex lib that has AP safe functions. For the check the BSP function you could have a MpServices version, x86 MSR version, and a NULL version. Given I did this once to debug some diagnostics code I'd not be opposed to having the simple lib and actually adding an ASSERT to every EFI Service in the DXE Core if it is called from the AP. People can turn it off via the NULL instance of the lib. I don't like the idea of using the MSR vs. the MpServices protocol, but I have to admit I've used that trick in JTAG debugger scripts to auto find the BSP. The MpServices API and even the BSP don't apply at runtime, but then again you can't run code on the APs at runtime. I'm not sure that SMM/MM really follows the same rules. Thanks, Andrew Fish >>> I have no idea about other architectures.) That wouldn't solve the >>> problem everywhere -- anyone using a custom DebugLib would have to >>> update it themselves. But it would solve it solidly in the majority >>> of cases. >>> >>> Thoughts? >> My fear might not be reasonable, but I feel quite uncomfortable about >> LAPIC accesses in DebugLib APIs. The information ("BSP or AP") is safer >> to determine at the call site, I think, even if it takes more human >> work. > > I understand the concern. DebugLib is exposed in a lot of awkward end cases and odd environments. Caution is warranted. But checking a bit in a MSR is about as simple a test as you can do. It sounds reasonable to me. Actually reading the local APIC would, as you said, be a lot more problematic. > > The observation remains that the problem is a lot bigger than DebugLib. We can't add AmIAnAP() calls on every single library interface. Are there any other spots people know of which tend to cause a lot of grief for APs? Maybe the memory allocation boot services? At least if there's a simple test for AP-ness which doesn't have additional dependencies, it's easy to add checks as needed. > > I don't know if this discussion will go anywhere... but it has been an interesting one to have. > >> I could very well be biased. In OvmfPkg we have a minuscule amount of >> code that runs on APs, and even that code is written with total >> minimalism in mind. >> Leaping to a different topic... Years ago I was tracking down an MTRR >> setup bug in the Xen hypervisor (as it was shipped as a part of RHEL5). >> It is necessary to setup MTRRs identically on all CPUs, plus it has to >> be done while all CPUs are in a "pen" doing nothing but setting up >> MTRRs. The bug was exactly in that part of the code (running >> simultaneously on more than a hundred CPUs). It was impossible to print >> anything to the serial console -- first because it would be unreadable >> for humans, and second because the delays would perturb the buggy >> behavior. In the end I had to introduce an array of per-CPU debug >> structures where each CPU would record its own view (snapshot) of a >> shared resource -- a shared resource that should have been protected by >> mutual exclusion between the CPUs. After the CPUs left the "pen" (with >> the invalid MTRR configuration established), I'd use the BSP to dump the >> array. That showed me that some CPUs had overlapping / inconsistent >> views of the shared resource between each other. This proved that the >> mutual exclusion primitive didn't work as expected -- it turns out that >> the semaphore (or spinlock, not sure) in question used an INT8 counter, >> which overflowed when more than 127 CPUs contended for the resource. > > I too have done battle with MTRR initialization, with over a thousand processors. I feel your pain. > >> I'm sure I'm misremembering parts of this story (from several years >> distance), the moral is that debugging in a multiprocessing environment >> may easily require its own dedicated infrastructure. In edk2, we don't >> have anything like that, I think. Could we build it, sufficiently >> generally? Like, prepare a log buffer for each CPU before calling >> StartupAllAps(), log only to that buffer during the concurrent >> execution, and finally dump the buffers? I guess if we don't *reach* the >> "finally" part, we could still dump the RAM and investigate the log >> buffers that way... Dumping the RAM is certainly an option for virtual >> machines, but it might be viable for physical setups too (JTAG, debug >> agent... dunno). >> Sorry about the wild speculation :) >> Thanks >> Laszlo > > > -- > > Brian > > -------------------------------------------------------------------- > > "Are we going to push the edge of the envelope, Brain?" > "No, Pinky, but we may get to the sticky part." > -- Quoted on the Net > > [-- Attachment #2: Type: text/html, Size: 26918 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-08-01 21:14 ` Andrew Fish @ 2019-08-02 21:45 ` Laszlo Ersek 0 siblings, 0 replies; 14+ messages in thread From: Laszlo Ersek @ 2019-08-02 21:45 UTC (permalink / raw) To: Andrew Fish, devel, brian.johnson; +Cc: Eric Dong, Ray Ni, Mike Kinney On 08/01/19 23:14, Andrew Fish wrote: > The MpServices API and even the BSP don't apply at runtime, but then > again you can't run code on the APs at runtime. My understanding is that the OS can invoke a runtime service on any processor. Plus, the UEFI spec classifies runtime services into groups, and while in any given group at most one runtime service may be executed at a time, a runtime service from another group can be invoked by the OS on another processor. See "Table 35. Rules for Reentry Into Runtime Services" (and the containing section "8.1 Runtime Services Rules and Restrictions") in UEFI-2.8. As an example, one CPU could activate GetVariable() while another CPU could activate GetTime(). If both services produce debug output (most likely on a device that is excluded from OS control), some coordination could be necessary. That said, a conservative OS would likely avoid calling any runtime service while another was busy (on any processor) :) Thanks Laszlo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-07-31 7:35 [Patch 0/2] UefiCpuPkg: Default avoid print Dong, Eric ` (2 preceding siblings ...) 2019-07-31 12:43 ` [Patch 0/2] UefiCpuPkg: " Laszlo Ersek @ 2019-08-01 7:51 ` Ni, Ray 2019-08-01 9:07 ` Dong, Eric 3 siblings, 1 reply; 14+ messages in thread From: Ni, Ray @ 2019-08-01 7:51 UTC (permalink / raw) To: Dong, Eric, devel@edk2.groups.io; +Cc: Laszlo Ersek Eric, My understanding to your patch is you want to avoid printing from AP because some platforms may choose wrong DebugLib instance for the CpuFeaturePei/Dxe driver, which leads to system hang. To make the platform happy and while the value of printing the debug message is less, the solution in your patch is to suppress the debug message unless a minor code change is made to turn on the debugging. Given that there are many comments/suggestions on how to implement a MP safe debugging infra, I don't think implementing such a MP safe debugging infra can be quickly done. How about directly remove the debug message instead of disabling in your V2 patch? The goal of MP safe debugging infra is to provide a mechanism to print debug message in MP environment that doesn't depend on platform DSC library instance mapping. Can you please submit a BZ to capture that and also remember to attach the URL to this mail thread in BZ? We could use future TianoCore Design Meeting to discuss about this. Thanks, Ray > -----Original Message----- > From: Dong, Eric > Sent: Wednesday, July 31, 2019 3:35 PM > To: devel@edk2.groups.io > Cc: Ni, Ray <ray.ni@intel.com>; Laszlo Ersek <lersek@redhat.com> > Subject: [Patch 0/2] UefiCpuPkg: Default avoid print. > > REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1984 > > Current debug message brings much restriction for the platform which use > this driver. > > For PEI and DXE phase, platform mush link base DebugLib (without using any > pei/dxe services, even for its dependent libraries). > > This patch default disable this debug message, only open it when need to > debug the related code. > > Signed-off-by: Eric Dong <eric.dong@intel.com> > Cc: Ray Ni <ray.ni@intel.com> > Cc: Laszlo Ersek <lersek@redhat.com> > > Eric Dong (2): > UefiCpuPkg/RegisterCpuFeaturesLib: Default avoid print. > UefiCpuPkg/PiSmmCpuDxeSmm: Default avoid print. > > .../Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 4 +++- > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 +++- > 2 files changed, 6 insertions(+), 2 deletions(-) > > -- > 2.21.0.windows.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch 0/2] UefiCpuPkg: Default avoid print. 2019-08-01 7:51 ` Ni, Ray @ 2019-08-01 9:07 ` Dong, Eric 0 siblings, 0 replies; 14+ messages in thread From: Dong, Eric @ 2019-08-01 9:07 UTC (permalink / raw) To: Ni, Ray, devel@edk2.groups.io; +Cc: Laszlo Ersek Hi Ray, Yes, I will remove debug code in v2 change. I have submitted feature request for new MP safe DebugLib: https://bugzilla.tianocore.org/show_bug.cgi?id=2043. Thanks, Eric > -----Original Message----- > From: Ni, Ray > Sent: Thursday, August 1, 2019 3:52 PM > To: Dong, Eric <eric.dong@intel.com>; devel@edk2.groups.io > Cc: Laszlo Ersek <lersek@redhat.com> > Subject: RE: [Patch 0/2] UefiCpuPkg: Default avoid print. > > Eric, > My understanding to your patch is you want to avoid printing from AP > because some platforms may choose wrong DebugLib instance for the > CpuFeaturePei/Dxe driver, which leads to system hang. > To make the platform happy and while the value of printing the debug > message is less, the solution in your patch is to suppress the debug message > unless a minor code change is made to turn on the debugging. > > Given that there are many comments/suggestions on how to implement a > MP safe debugging infra, I don't think implementing such a MP safe > debugging infra can be quickly done. > > How about directly remove the debug message instead of disabling in your > V2 patch? > > The goal of MP safe debugging infra is to provide a mechanism to print debug > message in MP environment that doesn't depend on platform DSC library > instance mapping. > > Can you please submit a BZ to capture that and also remember to attach the > URL to this mail thread in BZ? > > We could use future TianoCore Design Meeting to discuss about this. > > Thanks, > Ray > > > > -----Original Message----- > > From: Dong, Eric > > Sent: Wednesday, July 31, 2019 3:35 PM > > To: devel@edk2.groups.io > > Cc: Ni, Ray <ray.ni@intel.com>; Laszlo Ersek <lersek@redhat.com> > > Subject: [Patch 0/2] UefiCpuPkg: Default avoid print. > > > > REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1984 > > > > Current debug message brings much restriction for the platform which > > use this driver. > > > > For PEI and DXE phase, platform mush link base DebugLib (without using > > any pei/dxe services, even for its dependent libraries). > > > > This patch default disable this debug message, only open it when need > > to debug the related code. > > > > Signed-off-by: Eric Dong <eric.dong@intel.com> > > Cc: Ray Ni <ray.ni@intel.com> > > Cc: Laszlo Ersek <lersek@redhat.com> > > > > Eric Dong (2): > > UefiCpuPkg/RegisterCpuFeaturesLib: Default avoid print. > > UefiCpuPkg/PiSmmCpuDxeSmm: Default avoid print. > > > > .../Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 4 +++- > > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 +++- > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > -- > > 2.21.0.windows.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-08-02 21:45 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-07-31 7:35 [Patch 0/2] UefiCpuPkg: Default avoid print Dong, Eric 2019-07-31 7:35 ` [Patch 1/2] UefiCpuPkg/RegisterCpuFeaturesLib: " Dong, Eric 2019-07-31 7:35 ` [Patch 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Dong, Eric 2019-07-31 12:43 ` [Patch 0/2] UefiCpuPkg: " Laszlo Ersek 2019-07-31 16:34 ` [edk2-devel] " Brian J. Johnson 2019-07-31 18:06 ` Andrew Fish 2019-07-31 18:58 ` Laszlo Ersek 2019-07-31 22:04 ` Johnson, Michael 2019-08-02 0:12 ` Laszlo Ersek 2019-08-01 20:20 ` Brian J. Johnson 2019-08-01 21:14 ` Andrew Fish 2019-08-02 21:45 ` Laszlo Ersek 2019-08-01 7:51 ` Ni, Ray 2019-08-01 9:07 ` Dong, Eric
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox