public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Brian J. Johnson" <brian.johnson@hpe.com>
To: Laszlo Ersek <lersek@redhat.com>,
	devel@edk2.groups.io, Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>, Michael Kinney <michael.d.kinney@intel.com>
Subject: Re: [edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print.
Date: Thu, 1 Aug 2019 15:20:25 -0500	[thread overview]
Message-ID: <b91c5f1a-1e05-b9df-c717-138c1e45b1f1@hpe.com> (raw)
In-Reply-To: <93d4a7e7-9b86-b3c9-a476-ac1a40dc4723@redhat.com>

On 7/31/19 1:58 PM, Laszlo Ersek wrote:
> On 07/31/19 18:34, Brian J. Johnson wrote:
> 
>> I do wonder if there would be a clean way to let a DebugLib instance
>> itself declare that AP_DEBUG() is safe.  That way a platform would
>> only need to override the DebugLib instance in the DSC file, rather
>> than both the instance and the PCD.  (I know, I'm nitpicking.)  A
>> library can't override PCDs in its calling modules, of course.  I
>> suppose the AP_DEBUG() macro could call a new DebugLib entry point to
>> test for AP safety before doing anything else, say
>> DebugPrintOnApIsSafe().  Or it could even be a global CONST BOOLEAN
>> defined by the library.  But that would require all DebugLib instances
>> to change, which is something you were trying to avoid.
> 
> That's right -- I tried to imagine some other approaches, but they'd
> need new DebugLib functions, and likely force platforms to write new
> code.
> 
> 
>> However, it's not always practical to track down all uses of DEBUG().
>> An AP can easily call a library routine which uses DEBUG() rather than
>> AP_DEBUG(), buried under several layers of transitive library
>> dependencies.  In other words, it's not always practical to determine
>> ahead of time if a given DEBUG() call may be done on an AP.
> 
> This problem is valid IMO, but I think its scope is a lot wider than
> just DebugLib. Assume the programmer is looking at a function that may
> be invoked on an AP, and they are about to call a function for taking
> care of a specific sub-task. If the programmer cannot prove the
> thread-safety of the *entire* call tree underneath the function call
> they are about to add, they simply must not add the call. The
> thread-safety of the DebugLib instance in use is just a part of the
> thread-safety of said call tree.
> 
> Put differently, code that runs APs must be extremely self-contained;
> I'd rule out any and all lib classes from direct use unless a specific
> library instance, advertizing thread safety, would be chosen in the
> platform DSC file. But, if we adopted this approach, we could even
> introduce a new AP-oriented library *class* for debug messages, offer a
> Null implementation in edk2, and ask platforms to bring their own.
> 
> 
>> I know that AP code runs in a very restricted environment and that
>> people who use MpServices are supposed to understand the
>> repercussions, but it gets very difficult when libraries are
>> involved.  :(
> 
> Exactly -- the first restriction people should understand is, "stay away
> from libraries as much as you can".
> 
> 
>> So would a better solution be to modify the common unsafe DebugLib
>> instances to have DebugPrintEnabled() return FALSE on APs?  That would
>> probably require a new BaseLib interface to determine if the caller is
>> running on the BSP or an AP.
> 
> I agree that "AmIAnAP()" would be a pre-requisite.
> 
> 
>> (For IA32/X64 this isn't too hard -- it just needs to check a bit in
>> the local APIC.
> 
> Still not trivial, as some DebugLib instances might want to target
> runtime drivers (or even SMM drivers). For runtime drivers the
> complication is that a runtime (virtual address) mapping for the LAPIC
> MMIO range would be needed (if I understand correctly anyway). And for
> both runtime and SMM drivers, it could be a problem that on physical
> hardware, the MMIO range of the LAPIC can be moved (reprogrammed) to a
> different base address, possibly by the OS too.
> 
> I could be quite confused about this, of course; I don't eat LAPICs for
> breakfast :) I just recall an SMM firmware vulnerability that was in
> part based on moving the LAPIC base address elsewhere. Hm... googling
> suggests the attack was called "The Memory Sinkhole".
> 

I finally looked at the documentation (imagine!), and the BSP flag is 
bit 8 in the IA32_APIC_BASE (0x1B) MSR.  So no local APIC access is 
necessary, only reading an MSR.  This MSR has been present since the 
Pentium Pro (family/model 06_01H), and is considered "architectural."

> 
>> I have no idea about other architectures.)  That wouldn't solve the
>> problem everywhere -- anyone using a custom DebugLib would have to
>> update it themselves.  But it would solve it solidly in the majority
>> of cases.
>>
>> Thoughts?
> 
> My fear might not be reasonable, but I feel quite uncomfortable about
> LAPIC accesses in DebugLib APIs. The information ("BSP or AP") is safer
> to determine at the call site, I think, even if it takes more human
> work.
> 

I understand the concern.  DebugLib is exposed in a lot of awkward end 
cases and odd environments.  Caution is warranted.  But checking a bit 
in a MSR is about as simple a test as you can do.  It sounds reasonable 
to me.  Actually reading the local APIC would, as you said, be a lot 
more problematic.

The observation remains that the problem is a lot bigger than DebugLib. 
We can't add AmIAnAP() calls on every single library interface.  Are 
there any other spots people know of which tend to cause a lot of grief 
for APs?  Maybe the memory allocation boot services?  At least if 
there's a simple test for AP-ness which doesn't have additional 
dependencies, it's easy to add checks as needed.

I don't know if this discussion will go anywhere... but it has been an 
interesting one to have.

> I could very well be biased. In OvmfPkg we have a minuscule amount of
> code that runs on APs, and even that code is written with total
> minimalism in mind.
> 
> Leaping to a different topic... Years ago I was tracking down an MTRR
> setup bug in the Xen hypervisor (as it was shipped as a part of RHEL5).
> It is necessary to setup MTRRs identically on all CPUs, plus it has to
> be done while all CPUs are in a "pen" doing nothing but setting up
> MTRRs. The bug was exactly in that part of the code (running
> simultaneously on more than a hundred CPUs). It was impossible to print
> anything to the serial console -- first because it would be unreadable
> for humans, and second because the delays would perturb the buggy
> behavior. In the end I had to introduce an array of per-CPU debug
> structures where each CPU would record its own view (snapshot) of a
> shared resource -- a shared resource that should have been protected by
> mutual exclusion between the CPUs. After the CPUs left the "pen" (with
> the invalid MTRR configuration established), I'd use the BSP to dump the
> array. That showed me that some CPUs had overlapping / inconsistent
> views of the shared resource between each other. This proved that the
> mutual exclusion primitive didn't work as expected -- it turns out that
> the semaphore (or spinlock, not sure) in question used an INT8 counter,
> which overflowed when more than 127 CPUs contended for the resource.
> 

I too have done battle with MTRR initialization, with over a thousand 
processors.  I feel your pain.

> I'm sure I'm misremembering parts of this story (from several years
> distance), the moral is that debugging in a multiprocessing environment
> may easily require its own dedicated infrastructure. In edk2, we don't
> have anything like that, I think. Could we build it, sufficiently
> generally? Like, prepare a log buffer for each CPU before calling
> StartupAllAps(), log only to that buffer during the concurrent
> execution, and finally dump the buffers? I guess if we don't *reach* the
> "finally" part,  we could still dump the RAM and investigate the log
> buffers that way... Dumping the RAM is certainly an option for virtual
> machines, but it might be viable for physical setups too (JTAG, debug
> agent... dunno).
> 
> Sorry about the wild speculation :)
> 
> Thanks
> Laszlo
> 


-- 

                                                 Brian

--------------------------------------------------------------------

   "Are we going to push the edge of the envelope, Brain?"
   "No, Pinky, but we may get to the sticky part."
                                            -- Quoted on the Net

  parent reply	other threads:[~2019-08-01 20:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-31  7:35 [Patch 0/2] UefiCpuPkg: Default avoid print Dong, Eric
2019-07-31  7:35 ` [Patch 1/2] UefiCpuPkg/RegisterCpuFeaturesLib: " Dong, Eric
2019-07-31  7:35 ` [Patch 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Dong, Eric
2019-07-31 12:43 ` [Patch 0/2] UefiCpuPkg: " Laszlo Ersek
2019-07-31 16:34   ` [edk2-devel] " Brian J. Johnson
2019-07-31 18:06     ` Andrew Fish
2019-07-31 18:58     ` Laszlo Ersek
2019-07-31 22:04       ` Johnson, Michael
2019-08-02  0:12         ` Laszlo Ersek
2019-08-01 20:20       ` Brian J. Johnson [this message]
2019-08-01 21:14         ` Andrew Fish
2019-08-02 21:45           ` Laszlo Ersek
2019-08-01  7:51 ` Ni, Ray
2019-08-01  9:07   ` Dong, Eric

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b91c5f1a-1e05-b9df-c717-138c1e45b1f1@hpe.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox