public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: "Dong, Eric" <eric.dong@intel.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: "Ni, Ruiyu" <ruiyu.ni@intel.com>
Subject: Re: [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Date: Wed, 25 Jul 2018 17:35:26 +0200	[thread overview]
Message-ID: <9082dc2e-a239-18c5-14a7-e2bb18c0985b@redhat.com> (raw)
In-Reply-To: <ED077930C258884BBCB450DB737E66224AC5A5CE@shsmsx102.ccr.corp.intel.com>

On 07/25/18 13:35, Dong, Eric wrote:

>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>> Sent: Wednesday, July 25, 2018 6:14 PM
>> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
>> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>

>> On 07/25/18 05:50, Dong, Eric wrote:

>>> But some AP may wake up later and it failed to return from the
>>> procedure.
>>
>> Ah! So that explains another symptom I've since seen as well --
>> although *very* rarely. Namely, if an AP wakes up *after*
>> PiSmmCpuDxeSmm moves on, thinking that all APs are finished, the AP
>> can execute garbage in "no man's land" -- and that crashes the guest.
>> Basically, QEMU/KVM pause the guest with "emulation failure", and
>> QEMU dumps the VCPU register state to the standard error on the host
>> side. In particular, the register state indicates that the crashed
>> VCPU is *not* in SMM.
>
> Where to get these QEMU dumps info?

QEMU simply writes the register dump to stderr. If you use QEMU together
with libvirt, then QEMU's stderr is saved in:

  /var/log/libvirt/qemu/$DOMAIN.log

Here's an example:

  KVM internal error. Suberror: 1
  emulation failure
  RAX=0000000000000000 RBX=0000000000000005 RCX=000000007eab9828 RDX=0000000000000002
  RSI=000000000009f000 RDI=000000007eef2ae0 RBP=000000007eef2e40 RSP=000000007eef2a08
  R8 =0000000000000002 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
  R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
  RIP=000000007ea9cdb2 RFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
  SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  FS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  GS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
  TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
  GDT=     000000007ebed998 00000047
  IDT=     000000007e237280 00000fff
  CR0=c0010033 CR2=0000000000000000 CR3=000000007ec01000 CR4=00000668
  DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
  DR6=00000000ffff0ff0 DR7=0000000000000400
  EFER=0000000000000500
  Code=89 d0 5f c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 90 <c3> cc cc cc cc cc cc cc cc cc cc cc cc cc 53 89 c8 89 d1 50 0f a2 4c 8b 54 24 38 4d 85 d2

(Anyway, this is just for general discussion now -- your v3 series works
perfectly for me.)

> I'm not clear who calls StartAllAPs function when I send this mail, I
> just based on the debug message found StartAllAps trigged right after
> PiSmmCpuDxeSmm driver. but I think this not blocks my patches for this
> issue. So I sent the patches and back to dig it more.

Sure, I think your analysis was spot-on, regardless of how you produced
it.

The register dumps / emulation failures that I mentioned in my previous
email in this thread are not a new issue with your v3 posting. The v3
series is fine. Instead, these failures were a *further* (but less
frequent) symptom with the *original* series, and I encountered them
separately from the boot hangs that I reported at first.

> Now I found it's PiSmmIpl driver calls gDS->SetMemorySpaceAttributes
> in SmmIplDxeDispatchEventNotify function which finally calls the
> StartAllAps.
> The Ap procedure is SetMtrrsFromBuffer function in CpuDxe driver.  In
> SetMtrrsFromBuffer function, it just call MtrrSetAllMtrrs to update
> the MTRR, and it use local variable to save the MTRR settings.
>
> When the hang issue raised, in the time the AP begin to run the
> procedure, the BSP has leave current function. So the saved MTRR
> setting in the local variable is not valid anymore. So the
> MtrrSetAllMtrrs function use an invalid Mtrr Setttings to set mtrrs
> and never exit the procedure. Do you think it's reasonable?

Absolutely. Basically, if an AP is allowed to continue running the user
procedure after the BSP in MpInitLib thinks that the AP is finished,
anything at all can happen -- the escapes control, it may access memory
that has been freed, and perhaps it might even execute code that has
been freed and overwritten. The actual symptoms can vary wildly.

> I found till the ExitBootService point, the AP still in busy state. I
> check the ApWakeupFunction function, found between the AP procedure
> and set AP state to Idle, no other possible code exist. So I think the
> AP should not return back from procedure. I try to add debug message
> to know where the AP is stopped at. But after I add debug message in
> MtrrSetAllMtrrs, the issue can't reproduce anymore. I tried about 30
> times. Do you have other message to get to know where the AP is?

No, I don't have any other ideas at hand to deduce where the AP was
stuck -- as far as I'm concerned, it had just wandered off into the
weeds :)

I think your idea to check the AP state at the time of the
ExitBootServices handler hang was great.

So, again, in this thread I didn't mean to report a new issue, I just
meant to report another -- less frequent -- symptom of the same "ABA"
issue.

Everything is fine with your v3 posting as much as I can tell (just
please update the commit messages before you push).

Thanks!
Laszlo


      reply	other threads:[~2018-07-25 15:35 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29  3:20 [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter Eric Dong
2018-06-29 12:14 ` Laszlo Ersek
2018-07-18 12:59   ` Dong, Eric
2018-07-19 17:01     ` Laszlo Ersek
2018-07-20  6:53       ` Dong, Eric
2018-07-20 16:30         ` Laszlo Ersek
2018-07-25  3:50           ` Dong, Eric
2018-07-25 10:13             ` Laszlo Ersek
2018-07-25 11:35               ` Dong, Eric
2018-07-25 15:35                 ` Laszlo Ersek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9082dc2e-a239-18c5-14a7-e2bb18c0985b@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox