On Aug 1, 2019, at 1:20 PM, Brian J. Johnson <brian.johnson@hpe.com> wrote:

On 7/31/19 1:58 PM, Laszlo Ersek wrote:
On 07/31/19 18:34, Brian J. Johnson wrote:
I do wonder if there would be a clean way to let a DebugLib instance
itself declare that AP_DEBUG() is safe.  That way a platform would
only need to override the DebugLib instance in the DSC file, rather
than both the instance and the PCD.  (I know, I'm nitpicking.)  A
library can't override PCDs in its calling modules, of course.  I
suppose the AP_DEBUG() macro could call a new DebugLib entry point to
test for AP safety before doing anything else, say
DebugPrintOnApIsSafe().  Or it could even be a global CONST BOOLEAN
defined by the library.  But that would require all DebugLib instances
to change, which is something you were trying to avoid.
That's right -- I tried to imagine some other approaches, but they'd
need new DebugLib functions, and likely force platforms to write new
code.
However, it's not always practical to track down all uses of DEBUG().
An AP can easily call a library routine which uses DEBUG() rather than
AP_DEBUG(), buried under several layers of transitive library
dependencies.  In other words, it's not always practical to determine
ahead of time if a given DEBUG() call may be done on an AP.
This problem is valid IMO, but I think its scope is a lot wider than
just DebugLib. Assume the programmer is looking at a function that may
be invoked on an AP, and they are about to call a function for taking
care of a specific sub-task. If the programmer cannot prove the
thread-safety of the *entire* call tree underneath the function call
they are about to add, they simply must not add the call. The
thread-safety of the DebugLib instance in use is just a part of the
thread-safety of said call tree.
Put differently, code that runs APs must be extremely self-contained;
I'd rule out any and all lib classes from direct use unless a specific
library instance, advertizing thread safety, would be chosen in the
platform DSC file. But, if we adopted this approach, we could even
introduce a new AP-oriented library *class* for debug messages, offer a
Null implementation in edk2, and ask platforms to bring their own.
I know that AP code runs in a very restricted environment and that
people who use MpServices are supposed to understand the
repercussions, but it gets very difficult when libraries are
involved.  :(
Exactly -- the first restriction people should understand is, "stay away
from libraries as much as you can".
So would a better solution be to modify the common unsafe DebugLib
instances to have DebugPrintEnabled() return FALSE on APs?  That would
probably require a new BaseLib interface to determine if the caller is
running on the BSP or an AP.
I agree that "AmIAnAP()" would be a pre-requisite.
(For IA32/X64 this isn't too hard -- it just needs to check a bit in
the local APIC.
Still not trivial, as some DebugLib instances might want to target
runtime drivers (or even SMM drivers). For runtime drivers the
complication is that a runtime (virtual address) mapping for the LAPIC
MMIO range would be needed (if I understand correctly anyway). And for
both runtime and SMM drivers, it could be a problem that on physical
hardware, the MMIO range of the LAPIC can be moved (reprogrammed) to a
different base address, possibly by the OS too.
I could be quite confused about this, of course; I don't eat LAPICs for
breakfast :) I just recall an SMM firmware vulnerability that was in
part based on moving the LAPIC base address elsewhere. Hm... googling
suggests the attack was called "The Memory Sinkhole".

I finally looked at the documentation (imagine!), and the BSP flag is bit 8 in the IA32_APIC_BASE (0x1B) MSR.  So no local APIC access is necessary, only reading an MSR.  This MSR has been present since the Pentium Pro (family/model 06_01H), and is considered "architectural."


Brian,

But this  is  still  only valid for x86 CPUs, so it is not generic to all supported CPUs. I guess we could add a library for "MP" debugging, or a couple of libraries as you may want a simple library that gives you info like are you running on the BSP, and maybe a more complex lib that has AP safe functions. For the check the BSP function you could have a MpServices version, x86 MSR version, and a NULL version. 

Given I did this once to debug some diagnostics code I'd not be opposed to having the simple lib and actually adding an ASSERT to every EFI Service in the DXE Core if it is called from the AP. People can turn it off via the NULL instance of the lib. 

I don't like the idea of using the MSR vs. the MpServices protocol, but I have to admit I've used that trick in JTAG debugger scripts to auto find the BSP. 

The MpServices API and even the BSP don't apply at runtime, but then again you can't run code on the APs at runtime. I'm not sure that SMM/MM really follows the same rules. 

Thanks,

Andrew Fish

I have no idea about other architectures.)  That wouldn't solve the
problem everywhere -- anyone using a custom DebugLib would have to
update it themselves.  But it would solve it solidly in the majority
of cases.

Thoughts?
My fear might not be reasonable, but I feel quite uncomfortable about
LAPIC accesses in DebugLib APIs. The information ("BSP or AP") is safer
to determine at the call site, I think, even if it takes more human
work.

I understand the concern.  DebugLib is exposed in a lot of awkward end cases and odd environments.  Caution is warranted.  But checking a bit in a MSR is about as simple a test as you can do. It sounds reasonable to me.  Actually reading the local APIC would, as you said, be a lot more problematic.

The observation remains that the problem is a lot bigger than DebugLib. We can't add AmIAnAP() calls on every single library interface.  Are there any other spots people know of which tend to cause a lot of grief for APs?  Maybe the memory allocation boot services?  At least if there's a simple test for AP-ness which doesn't have additional dependencies, it's easy to add checks as needed.

I don't know if this discussion will go anywhere... but it has been an interesting one to have.

I could very well be biased. In OvmfPkg we have a minuscule amount of
code that runs on APs, and even that code is written with total
minimalism in mind.
Leaping to a different topic... Years ago I was tracking down an MTRR
setup bug in the Xen hypervisor (as it was shipped as a part of RHEL5).
It is necessary to setup MTRRs identically on all CPUs, plus it has to
be done while all CPUs are in a "pen" doing nothing but setting up
MTRRs. The bug was exactly in that part of the code (running
simultaneously on more than a hundred CPUs). It was impossible to print
anything to the serial console -- first because it would be unreadable
for humans, and second because the delays would perturb the buggy
behavior. In the end I had to introduce an array of per-CPU debug
structures where each CPU would record its own view (snapshot) of a
shared resource -- a shared resource that should have been protected by
mutual exclusion between the CPUs. After the CPUs left the "pen" (with
the invalid MTRR configuration established), I'd use the BSP to dump the
array. That showed me that some CPUs had overlapping / inconsistent
views of the shared resource between each other. This proved that the
mutual exclusion primitive didn't work as expected -- it turns out that
the semaphore (or spinlock, not sure) in question used an INT8 counter,
which overflowed when more than 127 CPUs contended for the resource.

I too have done battle with MTRR initialization, with over a thousand processors.  I feel your pain.

I'm sure I'm misremembering parts of this story (from several years
distance), the moral is that debugging in a multiprocessing environment
may easily require its own dedicated infrastructure. In edk2, we don't
have anything like that, I think. Could we build it, sufficiently
generally? Like, prepare a log buffer for each CPU before calling
StartupAllAps(), log only to that buffer during the concurrent
execution, and finally dump the buffers? I guess if we don't *reach* the
"finally" part,  we could still dump the RAM and investigate the log
buffers that way... Dumping the RAM is certainly an option for virtual
machines, but it might be viable for physical setups too (JTAG, debug
agent... dunno).
Sorry about the wild speculation :)
Thanks
Laszlo


-- 

                                               Brian

--------------------------------------------------------------------

 "Are we going to push the edge of the envelope, Brain?"
 "No, Pinky, but we may get to the sticky part."
                                          -- Quoted on the Net