public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: "Dong, Eric" <eric.dong@intel.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: "Ni, Ruiyu" <ruiyu.ni@intel.com>
Subject: Re: [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter.
Date: Wed, 25 Jul 2018 12:13:35 +0200	[thread overview]
Message-ID: <bfa90459-fed3-cd1c-695e-ee6894df5164@redhat.com> (raw)
In-Reply-To: <ED077930C258884BBCB450DB737E66224AC5A453@shsmsx102.ccr.corp.intel.com>

On 07/25/18 05:50, Dong, Eric wrote:
> Hi Laszlo,
>
> I have root cause this issue, the AP hangs in the procedure when
> PiSmmCpuDxeSmm driver start up trigged this issue.
>
> When PiSmmCpuDxeSmm driver start up, it will call StartAllAps to set
> memory attribute.  In StartAllAps function, after call WakeUpAp to
> start Aps, it calls CheckAllAps to wait all Aps finished the task. In
> CheckAllAps function, it detect AP state to know whether the AP has
> finished its task. In old code, it check whether the AP state is
> CpuStateFinished to know whether AP has finished tasks. This state is
> only set by AP when it truly finished task. In new logic,
> CpuStateFinished been replace with CpuStateIdle. And CpuStateIdle is
> also the begin state of the AP. AP will change state from CpuStateIdle
> to CpuStateBusy when it start execute the procedure. And after it
> finished the procedure, it will change state back to CpuStateIdle.
>
> So when the hang issue raised, AP state is not been changed to
> CpuStateBusy when BSP calls CheckAllAps to check whether the AP has
> finished its task. So the state for the AP still in CpuStateIdle, but
> BSP think AP has finished its task. In this case, BSP think all the
> Aps has finished their tasks and it continues boot.

Awesome analysis!

So, this looks like an "inverse" variant of the classic "ABA problem":

https://en.wikipedia.org/wiki/ABA_problem

> But some AP may wake up later and it failed to return from the
> procedure.

Ah! So that explains another symptom I've since seen as well -- although
*very* rarely. Namely, if an AP wakes up *after* PiSmmCpuDxeSmm moves
on, thinking that all APs are finished, the AP can execute garbage in
"no man's land" -- and that crashes the guest. Basically, QEMU/KVM pause
the guest with "emulation failure", and QEMU dumps the VCPU register
state to the standard error on the host side. In particular, the
register state indicates that the crashed VCPU is *not* in SMM.

When I first encountered this symptom now, while playing some more with
your patches, it reminded me of earlier problems with MpInitLib. And now
your analysis makes perfect sense of this additional symptom!

> In this case, the AP state keeps at CpuStateBusy. So later in
> ChangeApLoopCallback function, because this AP state still in
> CpuStateBusy, this AP will not trig the procedure. But BSP wait all
> APs to trig the procedure(BSP wait the Aps to reduce the
> mNumberToFinish value in procedure to continue boot) to continue the
> boot, so the hang occurred.

This completes the explanation.

> I think we should keep a middle state to let us know whether the AP
> truly finished its task. I will send  another serial patch for this
> issue. Please help to check the new patches.

Yes, I'll test them too.

Thanks!
Laszlo


  reply	other threads:[~2018-07-25 10:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29  3:20 [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter Eric Dong
2018-06-29 12:14 ` Laszlo Ersek
2018-07-18 12:59   ` Dong, Eric
2018-07-19 17:01     ` Laszlo Ersek
2018-07-20  6:53       ` Dong, Eric
2018-07-20 16:30         ` Laszlo Ersek
2018-07-25  3:50           ` Dong, Eric
2018-07-25 10:13             ` Laszlo Ersek [this message]
2018-07-25 11:35               ` Dong, Eric
2018-07-25 15:35                 ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bfa90459-fed3-cd1c-695e-ee6894df5164@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox