public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Dong, Eric" <eric.dong@intel.com>
To: Laszlo Ersek <lersek@redhat.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: "Ni, Ruiyu" <ruiyu.ni@intel.com>
Subject: Re: [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Date: Wed, 25 Jul 2018 11:35:58 +0000	[thread overview]
Message-ID: <ED077930C258884BBCB450DB737E66224AC5A5CE@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <bfa90459-fed3-cd1c-695e-ee6894df5164@redhat.com>

Hi Laszlo,


> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Wednesday, July 25, 2018 6:14 PM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>
> Subject: Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant
> parameter.
> 
> On 07/25/18 05:50, Dong, Eric wrote:
> > Hi Laszlo,
> >
> > I have root cause this issue, the AP hangs in the procedure when
> > PiSmmCpuDxeSmm driver start up trigged this issue.
> >
> > When PiSmmCpuDxeSmm driver start up, it will call StartAllAps to set
> > memory attribute.  In StartAllAps function, after call WakeUpAp to
> > start Aps, it calls CheckAllAps to wait all Aps finished the task. In
> > CheckAllAps function, it detect AP state to know whether the AP has
> > finished its task. In old code, it check whether the AP state is
> > CpuStateFinished to know whether AP has finished tasks. This state is
> > only set by AP when it truly finished task. In new logic,
> > CpuStateFinished been replace with CpuStateIdle. And CpuStateIdle is
> > also the begin state of the AP. AP will change state from CpuStateIdle
> > to CpuStateBusy when it start execute the procedure. And after it
> > finished the procedure, it will change state back to CpuStateIdle.
> >
> > So when the hang issue raised, AP state is not been changed to
> > CpuStateBusy when BSP calls CheckAllAps to check whether the AP has
> > finished its task. So the state for the AP still in CpuStateIdle, but
> > BSP think AP has finished its task. In this case, BSP think all the
> > Aps has finished their tasks and it continues boot.
> 
> Awesome analysis!
> 
> So, this looks like an "inverse" variant of the classic "ABA problem":
> 
> https://en.wikipedia.org/wiki/ABA_problem

Yes, it's an ABA problem. New patch keep the CpuStateReady to distinguish from the CpuStateIdle state.

> 
> > But some AP may wake up later and it failed to return from the
> > procedure.
> 
> Ah! So that explains another symptom I've since seen as well -- although
> *very* rarely. Namely, if an AP wakes up *after* PiSmmCpuDxeSmm moves
> on, thinking that all APs are finished, the AP can execute garbage in "no man's
> land" -- and that crashes the guest. Basically, QEMU/KVM pause the guest
> with "emulation failure", and QEMU dumps the VCPU register state to the
> standard error on the host side. In particular, the register state indicates that
> the crashed VCPU is *not* in SMM.

Where to get these QEMU dumps info?

I'm not clear who calls StartAllAPs function when I send this mail, I just based on the debug message found StartAllAps trigged right after PiSmmCpuDxeSmm driver. but I think this not blocks my patches for this issue. So I sent the patches and back to dig it more.

Now I found it's PiSmmIpl driver calls gDS->SetMemorySpaceAttributes in SmmIplDxeDispatchEventNotify function which finally calls the StartAllAps.
The Ap procedure is SetMtrrsFromBuffer function in CpuDxe driver.  In SetMtrrsFromBuffer function, it just call MtrrSetAllMtrrs to update the MTRR, and it use local variable to save the MTRR settings. 

When the hang issue raised, in the time the AP begin to run the procedure, the BSP has leave current function. So the saved MTRR setting in the local variable is not valid anymore. So the MtrrSetAllMtrrs function use an invalid Mtrr Setttings to set mtrrs and never exit the procedure. Do you think it's reasonable?

I found till the ExitBootService point, the AP still in busy state. I check the ApWakeupFunction function, found between the AP procedure and set AP state to Idle, no other possible code exist. So I think the AP should not return back from procedure. I try to add debug message to know where the AP is stopped at. But after I add debug message in MtrrSetAllMtrrs, the issue can't reproduce anymore. I tried about 30 times. Do you have other message to get to know where the AP is?

Thanks,
Eric

> 
> When I first encountered this symptom now, while playing some more with
> your patches, it reminded me of earlier problems with MpInitLib. And now
> your analysis makes perfect sense of this additional symptom!
> 
> > In this case, the AP state keeps at CpuStateBusy. So later in
> > ChangeApLoopCallback function, because this AP state still in
> > CpuStateBusy, this AP will not trig the procedure. But BSP wait all
> > APs to trig the procedure(BSP wait the Aps to reduce the
> > mNumberToFinish value in procedure to continue boot) to continue the
> > boot, so the hang occurred.
> 
> This completes the explanation.
> 
> > I think we should keep a middle state to let us know whether the AP
> > truly finished its task. I will send  another serial patch for this
> > issue. Please help to check the new patches.
> 
> Yes, I'll test them too.
> 
> Thanks!
> Laszlo

  reply	other threads:[~2018-07-25 11:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29  3:20 [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter Eric Dong
2018-06-29 12:14 ` Laszlo Ersek
2018-07-18 12:59   ` Dong, Eric
2018-07-19 17:01     ` Laszlo Ersek
2018-07-20  6:53       ` Dong, Eric
2018-07-20 16:30         ` Laszlo Ersek
2018-07-25  3:50           ` Dong, Eric
2018-07-25 10:13             ` Laszlo Ersek
2018-07-25 11:35               ` Dong, Eric [this message]
2018-07-25 15:35                 ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ED077930C258884BBCB450DB737E66224AC5A5CE@shsmsx102.ccr.corp.intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox