From: "Brian J. Johnson" <brian.johnson@hpe.com>
To: Laszlo Ersek <lersek@redhat.com>,
devel@edk2.groups.io, Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>, Michael Kinney <michael.d.kinney@intel.com>
Subject: Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print.
Date: Thu, 1 Aug 2019 15:20:25 -0500 [thread overview]
Message-ID: <b91c5f1a-1e05-b9df-c717-138c1e45b1f1@hpe.com> (raw)
In-Reply-To: <93d4a7e7-9b86-b3c9-a476-ac1a40dc4723@redhat.com>
On 7/31/19 1:58 PM, Laszlo Ersek wrote:
> On 07/31/19 18:34, Brian J. Johnson wrote:
>
>> I do wonder if there would be a clean way to let a DebugLib instance
>> itself declare that AP_DEBUG() is safe. That way a platform would
>> only need to override the DebugLib instance in the DSC file, rather
>> than both the instance and the PCD. (I know, I'm nitpicking.) A
>> library can't override PCDs in its calling modules, of course. I
>> suppose the AP_DEBUG() macro could call a new DebugLib entry point to
>> test for AP safety before doing anything else, say
>> DebugPrintOnApIsSafe(). Or it could even be a global CONST BOOLEAN
>> defined by the library. But that would require all DebugLib instances
>> to change, which is something you were trying to avoid.
>
> That's right -- I tried to imagine some other approaches, but they'd
> need new DebugLib functions, and likely force platforms to write new
> code.
>
>
>> However, it's not always practical to track down all uses of DEBUG().
>> An AP can easily call a library routine which uses DEBUG() rather than
>> AP_DEBUG(), buried under several layers of transitive library
>> dependencies. In other words, it's not always practical to determine
>> ahead of time if a given DEBUG() call may be done on an AP.
>
> This problem is valid IMO, but I think its scope is a lot wider than
> just DebugLib. Assume the programmer is looking at a function that may
> be invoked on an AP, and they are about to call a function for taking
> care of a specific sub-task. If the programmer cannot prove the
> thread-safety of the *entire* call tree underneath the function call
> they are about to add, they simply must not add the call. The
> thread-safety of the DebugLib instance in use is just a part of the
> thread-safety of said call tree.
>
> Put differently, code that runs APs must be extremely self-contained;
> I'd rule out any and all lib classes from direct use unless a specific
> library instance, advertizing thread safety, would be chosen in the
> platform DSC file. But, if we adopted this approach, we could even
> introduce a new AP-oriented library *class* for debug messages, offer a
> Null implementation in edk2, and ask platforms to bring their own.
>
>
>> I know that AP code runs in a very restricted environment and that
>> people who use MpServices are supposed to understand the
>> repercussions, but it gets very difficult when libraries are
>> involved. :(
>
> Exactly -- the first restriction people should understand is, "stay away
> from libraries as much as you can".
>
>
>> So would a better solution be to modify the common unsafe DebugLib
>> instances to have DebugPrintEnabled() return FALSE on APs? That would
>> probably require a new BaseLib interface to determine if the caller is
>> running on the BSP or an AP.
>
> I agree that "AmIAnAP()" would be a pre-requisite.
>
>
>> (For IA32/X64 this isn't too hard -- it just needs to check a bit in
>> the local APIC.
>
> Still not trivial, as some DebugLib instances might want to target
> runtime drivers (or even SMM drivers). For runtime drivers the
> complication is that a runtime (virtual address) mapping for the LAPIC
> MMIO range would be needed (if I understand correctly anyway). And for
> both runtime and SMM drivers, it could be a problem that on physical
> hardware, the MMIO range of the LAPIC can be moved (reprogrammed) to a
> different base address, possibly by the OS too.
>
> I could be quite confused about this, of course; I don't eat LAPICs for
> breakfast :) I just recall an SMM firmware vulnerability that was in
> part based on moving the LAPIC base address elsewhere. Hm... googling
> suggests the attack was called "The Memory Sinkhole".
>
I finally looked at the documentation (imagine!), and the BSP flag is
bit 8 in the IA32_APIC_BASE (0x1B) MSR. So no local APIC access is
necessary, only reading an MSR. This MSR has been present since the
Pentium Pro (family/model 06_01H), and is considered "architectural."
>
>> I have no idea about other architectures.) That wouldn't solve the
>> problem everywhere -- anyone using a custom DebugLib would have to
>> update it themselves. But it would solve it solidly in the majority
>> of cases.
>>
>> Thoughts?
>
> My fear might not be reasonable, but I feel quite uncomfortable about
> LAPIC accesses in DebugLib APIs. The information ("BSP or AP") is safer
> to determine at the call site, I think, even if it takes more human
> work.
>
I understand the concern. DebugLib is exposed in a lot of awkward end
cases and odd environments. Caution is warranted. But checking a bit
in a MSR is about as simple a test as you can do. It sounds reasonable
to me. Actually reading the local APIC would, as you said, be a lot
more problematic.
The observation remains that the problem is a lot bigger than DebugLib.
We can't add AmIAnAP() calls on every single library interface. Are
there any other spots people know of which tend to cause a lot of grief
for APs? Maybe the memory allocation boot services? At least if
there's a simple test for AP-ness which doesn't have additional
dependencies, it's easy to add checks as needed.
I don't know if this discussion will go anywhere... but it has been an
interesting one to have.
> I could very well be biased. In OvmfPkg we have a minuscule amount of
> code that runs on APs, and even that code is written with total
> minimalism in mind.
>
> Leaping to a different topic... Years ago I was tracking down an MTRR
> setup bug in the Xen hypervisor (as it was shipped as a part of RHEL5).
> It is necessary to setup MTRRs identically on all CPUs, plus it has to
> be done while all CPUs are in a "pen" doing nothing but setting up
> MTRRs. The bug was exactly in that part of the code (running
> simultaneously on more than a hundred CPUs). It was impossible to print
> anything to the serial console -- first because it would be unreadable
> for humans, and second because the delays would perturb the buggy
> behavior. In the end I had to introduce an array of per-CPU debug
> structures where each CPU would record its own view (snapshot) of a
> shared resource -- a shared resource that should have been protected by
> mutual exclusion between the CPUs. After the CPUs left the "pen" (with
> the invalid MTRR configuration established), I'd use the BSP to dump the
> array. That showed me that some CPUs had overlapping / inconsistent
> views of the shared resource between each other. This proved that the
> mutual exclusion primitive didn't work as expected -- it turns out that
> the semaphore (or spinlock, not sure) in question used an INT8 counter,
> which overflowed when more than 127 CPUs contended for the resource.
>
I too have done battle with MTRR initialization, with over a thousand
processors. I feel your pain.
> I'm sure I'm misremembering parts of this story (from several years
> distance), the moral is that debugging in a multiprocessing environment
> may easily require its own dedicated infrastructure. In edk2, we don't
> have anything like that, I think. Could we build it, sufficiently
> generally? Like, prepare a log buffer for each CPU before calling
> StartupAllAps(), log only to that buffer during the concurrent
> execution, and finally dump the buffers? I guess if we don't *reach* the
> "finally" part, we could still dump the RAM and investigate the log
> buffers that way... Dumping the RAM is certainly an option for virtual
> machines, but it might be viable for physical setups too (JTAG, debug
> agent... dunno).
>
> Sorry about the wild speculation :)
>
> Thanks
> Laszlo
>
--
Brian
--------------------------------------------------------------------
"Are we going to push the edge of the envelope, Brain?"
"No, Pinky, but we may get to the sticky part."
-- Quoted on the Net
next prev parent reply other threads:[~2019-08-01 20:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-31 7:35 [Patch 0/2] UefiCpuPkg: Default avoid print Dong, Eric
2019-07-31 7:35 ` [Patch 1/2] UefiCpuPkg/RegisterCpuFeaturesLib: " Dong, Eric
2019-07-31 7:35 ` [Patch 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Dong, Eric
2019-07-31 12:43 ` [Patch 0/2] UefiCpuPkg: " Laszlo Ersek
2019-07-31 16:34 ` [edk2-devel] " Brian J. Johnson
2019-07-31 18:06 ` Andrew Fish
2019-07-31 18:58 ` Laszlo Ersek
2019-07-31 22:04 ` Johnson, Michael
2019-08-02 0:12 ` Laszlo Ersek
2019-08-01 20:20 ` Brian J. Johnson [this message]
2019-08-01 21:14 ` Andrew Fish
2019-08-02 21:45 ` Laszlo Ersek
2019-08-01 7:51 ` Ni, Ray
2019-08-01 9:07 ` Dong, Eric
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b91c5f1a-1e05-b9df-c717-138c1e45b1f1@hpe.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox