From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.120; helo=mga04.intel.com; envelope-from=eric.dong@intel.com; receiver=edk2-devel@lists.01.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id EACFD2034A877 for ; Thu, 26 Oct 2017 18:27:42 -0700 (PDT) Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Oct 2017 18:31:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,302,1505804400"; d="scan'208";a="167670466" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga005.fm.intel.com with ESMTP; 26 Oct 2017 18:31:29 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 26 Oct 2017 18:31:28 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.175]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.218]) with mapi id 14.03.0319.002; Fri, 27 Oct 2017 09:31:26 +0800 From: "Dong, Eric" To: "Brian J. Johnson" , Laszlo Ersek , "edk2-devel@lists.01.org" CC: "Ni, Ruiyu" , Paolo Bonzini Thread-Topic: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic. Thread-Index: AQHTS8/RPtqizoqgTUaTHjY9+C766aLyRGEAgADUE/D//6gkgIABS5EggAAcHwCAASo9QIAAxyuAgADT9MA= Date: Fri, 27 Oct 2017 01:31:26 +0000 Message-ID: References: <1508743358-3640-1-git-send-email-eric.dong@intel.com> <1508743358-3640-3-git-send-email-eric.dong@intel.com> <4fe39a52-0cd3-de2e-84f2-7363823a1b60@redhat.com> <1329f926-4c33-aaca-108a-7d15ae0503bc@redhat.com> <94c27429-4b7d-6c21-c1e5-e07f3e9a5556@redhat.com> <7c0023c7-01cb-e4a2-2c6a-c93372e651a4@hpe.com> In-Reply-To: <7c0023c7-01cb-e4a2-2c6a-c93372e651a4@hpe.com> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Subject: Re: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Oct 2017 01:27:43 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Brian, > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of > Brian J. Johnson > Sent: Friday, October 27, 2017 4:48 AM > To: Dong, Eric ; Laszlo Ersek ; > edk2-devel@lists.01.org > Cc: Ni, Ruiyu ; Paolo Bonzini > Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for > AP initialization logic. >=20 > On 10/25/2017 08:13 PM, Dong, Eric wrote: > > Laszlo, > > > > > >> -----Original Message----- > >> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf > >> Of Laszlo Ersek > >> Sent: Wednesday, October 25, 2017 11:08 PM > >> To: Dong, Eric ; edk2-devel@lists.01.org > >> Cc: Ni, Ruiyu ; Paolo Bonzini > >> > >> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting > >> for AP initialization logic. > >> > >> Hi Eric, > >> > >> On 10/25/17 07:42, Dong, Eric wrote: > >>> Hi Laszlo, > >>> > >>> I think I have clear about your raised issues for Ovmf platform. In > >>> this case, I > >> think your platform not suit for this code change. How about I do > >> below change based on the new code: > >>> > >>> - if (CpuMpData->CpuCount =3D=3D 0) { > >>> TimedWaitForApFinish ( > >>> CpuMpData, > >>> PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, > >>> PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) > >>> ); > >>> - } > >>> > >>> After this change, for Ovmf can still set > >>> PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old > >> solution. > >>> For the real platform, it can set a small value to follow the new > >>> solution. For this new change, it just wait one more > >>> PcdCpuApInitTimeOutInMicroSeconds timeout, should not have > >> performance > >>> impact (This time may also been cost in later while > >>> (CpuMpData->MpCpuExchangeInfo->NumApsExecuting !=3D 0) loop) or > have > >>> litter performance impact (not cover by the later loop). > >> The suggested change will work for OVMF, thanks! > >> > >> (Just please don't forget to un-indent the TimedWaitForApFinish() > >> call that will become unconditional again.) > >> > >> --o-- > >> > >> Do you think Brian's concern should be investigated as well (perhaps > >> in a separate Bugzilla entry)? > > > > As Jeff has mentioned, only Ovmf can know the exist Ap numbers before > > send the Init-sipi-sipi. So we don't know how many bits needed for thi= s > bitmap. > > In case we can create the bitmap, we still don't know when all Aps have > been > > found(If the begin map bit value same as finish map bit value, we > > still can't get the conclusion that all Aps have been found, because > > maybe other Aps not started yet). > > >=20 > Right, physical platforms don't usually know the number of CPUs up front,= so > they still need a timeout. That's unavoidable. But we've seen cases whe= re > interrupts aren't getting delivered reliably, so not all the expected CPU= s start > up (based on the core counts in the physical sockets, as known by the > developing engineer, not the firmware.) To debug this failure, it's very > useful to have a list of exactly which CPUs did start. >=20 > Similarly, we've seen cases where a CPU starts, but crashes due to h/w or > s/w bugs before signaling the BSP that it has finished. With only a coun= ter to > indicate how many CPUs are still running, it's impossible to tell which C= PUs > failed. That makes debugging much more difficult. >=20 > The bitmaps would need to be sized according to the maximum number of > CPUs supported by the platform (or the maximum APIC ID, depending on > how they are indexed), like other data structures in the MpCpu code. >=20 > Bitmaps aren't the only possible implementation of course.... My point i= s > that it's useful to have a list of which CPUs started executing, and whic= h > finished. >=20 Seems this is not a must have item for this patch related issue. I think yo= u can submit A bugz for this debug feature. Thanks, Eric > > Thanks, > > Eric > >> > >> Because, I believe, the scheduling pattern that I described earlier > >> could be possible on physical platforms as well, at least in theory: > >> > >>>> (2) After at least one AP has started running, the logic expects > >>>> "NumApsExecuting" to monotonically grow for a while, and then to > >>>> monotonically decrease, back to zero. For example, if we have 1 BSP > >>>> and > >>>> 7 APs, the BSP logic more or less expects the following values in > >>>> "NumApsExecuting": > >>>> > >>>> 1; 2; 3; 4; 5; 6; 7; > >>>> 6; 5; 4; 3; 2; 1; 0 > >>>> > >>>> > >>>> While this may be a valid expectation for physical processors > >>>> (which actually run in parallel, in the physical world), in a > >>>> virtual machine, it is not > >> guaranteed. > >>>> Dependent on hypervisor scheduling artifacts, it is possible that, > >>>> say, three APs start up *and finish* before the remaining four APs > >>>> start up *at all*. The INIT-SIPI-SIPI marks all APs for execution / > >>>> scheduling in the hypervisor, yes, but the actual code execution > >>>> may commence a lot later. For example, the BSP may witness the > >>>> following > >> series of values in "NumApsExecuting": > >>>> > >>>> 1; 2; 3; > >>>> 2; 1; 0; > >>>> 1; 2; 3; 4; > >>>> 3; 2; 1; 0 > >>>> > >>>> and the BSP could think that there are 3 APs only, when it sees the > >>>> first 0 value. > >> > >> I don't know if such a pattern is "likely", "unlikely", or "extremely > >> unlikely" on physical platforms. I think the pattern is "theoretically > possible" at least. > >> > >> I suggest that we go ahead with the small patch for the OVMF use case > >> first, and then open a new BZ if Brian thinks there's a real safety > >> problem with the code. > >> > > > > We also notice this theoretical problem when we implement this change, > > but We think this is rarely to happen on physical platforms. We think > > the value for the change is much bigger than not do this change > > because of this theoretical problem. >=20 > I strongly disagree. Any chance for it to happen on physical platforms i= s too > large of a chance. There's simply too much variance between physical > platforms to make assumptions about interrupt delivery and the interleavi= ng > of atomic operations among dozens (or thousands!) of CPUs. >=20 > With the latest change from Eric above, we can restore the old behavior b= y > setting PcdCpuApInitTimeOutInMicroSeconds large enough to cover all APs. > So the new behavior is just an optimization which a platform can enable i= f its > timing characteristics are suitable for it. That should work. >=20 > Thanks, > Brian >=20 > > > > Thanks, > > Eric > >> ... I should note that, with the small patch that's going to fix the > >> OVMF use case, the previous behavior will be restored for *all* > >> platforms that continue using their previous PCD values. > >> > >> The cumulative effect of commit 0594ec417c89 and the upcoming patch > >> will be that a platform may significantly decrease > >> "PcdCpuApInitTimeOutInMicroSeconds", if it chooses to, and then the > >> "NumApsExecuting" loop will take the role of the preexisting > >> TimedWaitForApFinish() call. If a platform doesn't want that, then it > >> should keep its "PcdCpuApInitTimeOutInMicroSeconds" as high as > >> before, and then any implementation details in the "NumApsExecuting" > loop will be irrelevant. > >> > >> Thanks! > >> Laszlo > >> _______________________________________________ > >> edk2-devel mailing list > >> edk2-devel@lists.01.org > >> https://lists.01.org/mailman/listinfo/edk2-devel > > _______________________________________________ > > edk2-devel mailing list > > edk2-devel@lists.01.org > > https://lists.01.org/mailman/listinfo/edk2-devel > > >=20 >=20 > -- >=20 > Brian >=20 > -------------------------------------------------------------------- >=20 > "This isn't a design, it's a hack." > -- Stephen Gunn > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel