public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Brian J. Johnson" <brian.johnson@hpe.com>
To: "Dong, Eric" <eric.dong@intel.com>,
	Laszlo Ersek <lersek@redhat.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: "Ni, Ruiyu" <ruiyu.ni@intel.com>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic.
Date: Thu, 26 Oct 2017 15:48:01 -0500	[thread overview]
Message-ID: <7c0023c7-01cb-e4a2-2c6a-c93372e651a4@hpe.com> (raw)
In-Reply-To: <ED077930C258884BBCB450DB737E66224AA3E322@shsmsx102.ccr.corp.intel.com>

On 10/25/2017 08:13 PM, Dong, Eric wrote:
> Laszlo,
> 
> 
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
>> Laszlo Ersek
>> Sent: Wednesday, October 25, 2017 11:08 PM
>> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
>> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>; Paolo Bonzini <pbonzini@redhat.com>
>> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
>> AP initialization logic.
>>
>> Hi Eric,
>>
>> On 10/25/17 07:42, Dong, Eric wrote:
>>> Hi Laszlo,
>>>
>>> I think I have clear about your raised issues for Ovmf platform. In this case, I
>> think your platform not suit for this code change.  How about I do below
>> change based on the new code:
>>>
>>> -      if (CpuMpData->CpuCount == 0) {
>>>          TimedWaitForApFinish (
>>>            CpuMpData,
>>>            PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
>>>            PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
>>>            );
>>> -      }
>>>
>>> After this change, for Ovmf can still set
>>> PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old
>> solution.
>>> For the real platform, it can set a small value to follow the new
>>> solution. For this new change, it just wait one more
>>> PcdCpuApInitTimeOutInMicroSeconds timeout, should not have
>> performance
>>> impact (This time may also been cost in later while
>>> (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have
>>> litter performance impact (not cover by the later loop).
>> The suggested change will work for OVMF, thanks!
>>
>> (Just please don't forget to un-indent the TimedWaitForApFinish() call that
>> will become unconditional again.)
>>
>> --o--
>>
>> Do you think Brian's concern should be investigated as well (perhaps in a
>> separate Bugzilla entry)?
> 
> As Jeff has mentioned, only Ovmf can know the exist Ap numbers before
> send the Init-sipi-sipi.  So we don't know how many bits needed for this bitmap.
> In case we can create the bitmap, we still don't know when all Aps have been
>   found(If the begin map bit value same as finish map bit value, we still can't get
> the conclusion that all Aps have been found, because maybe other Aps not
> started yet).
> 

Right, physical platforms don't usually know the number of CPUs up 
front, so they still need a timeout.  That's unavoidable.  But we've 
seen cases where interrupts aren't getting delivered reliably, so not 
all the expected CPUs start up (based on the core counts in the physical 
sockets, as known by the developing engineer, not the firmware.)  To 
debug this failure, it's very useful to have a list of exactly which 
CPUs did start.

Similarly, we've seen cases where a CPU starts, but crashes due to h/w 
or s/w bugs before signaling the BSP that it has finished.  With only a 
counter to indicate how many CPUs are still running, it's impossible to 
tell which CPUs failed.  That makes debugging much more difficult.

The bitmaps would need to be sized according to the maximum number of 
CPUs supported by the platform (or the maximum APIC ID, depending on how 
they are indexed), like other data structures in the MpCpu code.

Bitmaps aren't the only possible implementation of course....  My point 
is that it's useful to have a list of which CPUs started executing, and 
which finished.

> Thanks,
> Eric
>>
>> Because, I believe, the scheduling pattern that I described earlier could be
>> possible on physical platforms as well, at least in theory:
>>
>>>> (2) After at least one AP has started running, the logic expects
>>>> "NumApsExecuting" to monotonically grow for a while, and then to
>>>> monotonically decrease, back to zero. For example, if we have 1 BSP
>>>> and
>>>> 7 APs, the BSP logic more or less expects the following values in
>>>> "NumApsExecuting":
>>>>
>>>> 1; 2; 3; 4; 5; 6; 7;
>>>> 6; 5; 4; 3; 2; 1; 0
>>>>
>>>>
>>>> While this may be a valid expectation for physical processors (which
>>>> actually run in parallel, in the physical world), in a virtual machine, it is not
>> guaranteed.
>>>> Dependent on hypervisor scheduling artifacts, it is possible that,
>>>> say, three APs start up *and finish* before the remaining four APs
>>>> start up *at all*. The INIT-SIPI-SIPI marks all APs for execution /
>>>> scheduling in the hypervisor, yes, but the actual code execution may
>>>> commence a lot later. For example, the BSP may witness the following
>> series of values in "NumApsExecuting":
>>>>
>>>> 1; 2; 3;
>>>> 2; 1; 0;
>>>> 1; 2; 3; 4;
>>>> 3; 2; 1; 0
>>>>
>>>> and the BSP could think that there are 3 APs only, when it sees the
>>>> first 0 value.
>>
>> I don't know if such a pattern is "likely", "unlikely", or "extremely unlikely" on
>> physical platforms. I think the pattern is "theoretically possible" at least.
>>
>> I suggest that we go ahead with the small patch for the OVMF use case first,
>> and then open a new BZ if Brian thinks there's a real safety problem with the
>> code.
>>
> 
> We also notice this theoretical problem when we implement this change, but
> We think this is rarely to happen on physical platforms. We think the value for
> the change is much bigger than not do this change because of this theoretical
> problem.

I strongly disagree.  Any chance for it to happen on physical platforms 
is too large of a chance.  There's simply too much variance between 
physical platforms to make assumptions about interrupt delivery and the 
interleaving of atomic operations among dozens (or thousands!) of CPUs.

With the latest change from Eric above, we can restore the old behavior 
by setting PcdCpuApInitTimeOutInMicroSeconds large enough to cover all 
APs.  So the new behavior is just an optimization which a platform can 
enable if its timing characteristics are suitable for it.  That should work.

Thanks,
Brian

> 
> Thanks,
> Eric
>> ... I should note that, with the small patch that's going to fix the OVMF use
>> case, the previous behavior will be restored for *all* platforms that continue
>> using their previous PCD values.
>>
>> The cumulative effect of commit 0594ec417c89 and the upcoming patch will
>> be that a platform may significantly decrease
>> "PcdCpuApInitTimeOutInMicroSeconds", if it chooses to, and then the
>> "NumApsExecuting" loop will take the role of the preexisting
>> TimedWaitForApFinish() call. If a platform doesn't want that, then it should
>> keep its "PcdCpuApInitTimeOutInMicroSeconds" as high as before, and then
>> any implementation details in the "NumApsExecuting" loop will be irrelevant.
>>
>> Thanks!
>> Laszlo
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 


-- 

                                                 Brian

--------------------------------------------------------------------

   "This isn't a design, it's a hack."
                                            -- Stephen Gunn


  reply	other threads:[~2017-10-26 20:45 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-23  7:22 [Patch 0/2] Enhance collect AP Count logic Eric Dong
2017-10-23  7:22 ` [Patch 1/2] UefiCpuPkg/MpInitLib: Change AP Index variable name Eric Dong
2017-10-24  6:00   ` Ni, Ruiyu
2017-10-23  7:22 ` [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic Eric Dong
2017-10-24  6:02   ` Ni, Ruiyu
2017-10-24  6:27     ` 答复: " Fan Jeff
2017-10-24  7:18       ` Ni, Ruiyu
2017-10-24  7:32         ` 答复: " Fan Jeff
2017-10-24 10:15   ` Laszlo Ersek
2017-10-24 14:24     ` 答复: " Fan Jeff
2017-10-24 16:29       ` Laszlo Ersek
2017-10-24 15:23     ` Dong, Eric
2017-10-24 15:40       ` Dong, Eric
2017-10-24 17:40       ` Laszlo Ersek
2017-10-24 22:30         ` Brian J. Johnson
2017-10-25  5:35           ` 答复: " Fan Jeff
2017-10-25  5:32         ` Fan Jeff
2017-10-25  5:42         ` Dong, Eric
2017-10-25 15:07           ` Laszlo Ersek
2017-10-26  1:13             ` Dong, Eric
2017-10-26 20:48               ` Brian J. Johnson [this message]
2017-10-27  1:31                 ` Dong, Eric
2017-10-24  7:31 ` 答复: [Patch 0/2] Enhance collect AP Count logic Fan Jeff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c0023c7-01cb-e4a2-2c6a-c93372e651a4@hpe.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox