From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.groups.io with SMTP id smtpd.web10.5724.1570632958467507145 for ; Wed, 09 Oct 2019 07:55:58 -0700 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EE37C300DA6E; Wed, 9 Oct 2019 14:55:57 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-11.rdu2.redhat.com [10.10.120.11]) by smtp.corp.redhat.com (Postfix) with ESMTP id C99A15C1D6; Wed, 9 Oct 2019 14:55:56 +0000 (UTC) Subject: Re: [edk2-devel] [PATCH 1/4] UefiCpuPkg/MpInitLib: honor the platform's boot CPU count in AP detection To: "Dong, Eric" , "devel@edk2.groups.io" Cc: Igor Mammedov , "Ni, Ray" References: <20191008112714.6215-1-lersek@redhat.com> <20191008112714.6215-2-lersek@redhat.com> From: "Laszlo Ersek" Message-ID: <3c535d1a-d977-4543-b419-f4442e4a3cff@redhat.com> Date: Wed, 9 Oct 2019 16:55:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Wed, 09 Oct 2019 14:55:58 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 10/09/19 02:57, Dong, Eric wrote: > Hi Laszlo, > >> -----Original Message----- >> From: devel@edk2.groups.io [mailto:devel@edk2.groups.io] On Behalf Of >> Laszlo Ersek >> Sent: Tuesday, October 8, 2019 7:27 PM >> To: edk2-devel-groups-io >> Cc: Dong, Eric ; Igor Mammedov >> ; Ni, Ray >> Subject: [edk2-devel] [PATCH 1/4] UefiCpuPkg/MpInitLib: honor the platform's >> boot CPU count in AP detection >> >> If a platform boots with a CPU topology that is not fully populated (that >> is, the boot CPU count is smaller than PcdCpuMaxLogicalProcessorNumber), >> then the platform cannot use the "fast AP detection" logic added in commit >> 6e1987f19af7. Said logic depends on the boot CPU count being equal to >> PcdCpuMaxLogicalProcessorNumber. >> >> The platform may not be able to use the variant added in commit >> 0594ec417c89 either, for different reasons; see for example commit >> 861218740d6d. >> >> Allow platforms to specify the exact boot CPU count, independently of >> PcdCpuMaxLogicalProcessorNumber. >> >> Cc: Eric Dong >> Cc: Igor Mammedov >> Cc: Ray Ni >> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515 >> Signed-off-by: Laszlo Ersek >> --- >> UefiCpuPkg/UefiCpuPkg.dec | 11 +++++ >> UefiCpuPkg/UefiCpuPkg.uni | 4 ++ >> UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 + >> UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 3 +- >> UefiCpuPkg/Library/MpInitLib/MpLib.c | 42 ++++++++++++-------- >> 5 files changed, 43 insertions(+), 18 deletions(-) >> >> diff --git a/UefiCpuPkg/UefiCpuPkg.dec b/UefiCpuPkg/UefiCpuPkg.dec >> index 031a2ccd680a..d6b33fd9b465 100644 >> --- a/UefiCpuPkg/UefiCpuPkg.dec >> +++ b/UefiCpuPkg/UefiCpuPkg.dec >> @@ -227,6 +227,17 @@ [PcdsFixedAtBuild, PcdsPatchableInModule, >> PcdsDynamic, PcdsDynamicEx] >> ## Specifies timeout value in microseconds for the BSP to detect all APs for >> the first time. >> # @Prompt Timeout for the BSP to detect all APs for the first time. >> >> gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000|UI >> NT32|0x00000004 >> + ## Specifies the number of Logical Processors that are available in the >> + # preboot environment after platform reset, including BSP and APs. Possible >> + # values:

>> + # zero (default) - This PCD is ignored, and >> + # PcdCpuApInitTimeOutInMicroSeconds limits the initial AP >> + # detection by the BSP.
>> + # nonzero - PcdCpuApInitTimeOutInMicroSeconds is ignored. The initial >> + # AP detection finishes when the detected CPU count (BSP >> + # plus APs) reaches the value of this PCD.
>> + # @Prompt Number of Logical Processors available after platform reset. >> + >> gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0|UINT32 >> |0x00000008 >> ## Specifies the base address of the first microcode Patch in the microcode >> Region. >> # @Prompt Microcode Region base address. >> >> gUefiCpuPkgTokenSpaceGuid.PcdCpuMicrocodePatchAddress|0x0|UINT64|0x >> 00000005 >> diff --git a/UefiCpuPkg/UefiCpuPkg.uni b/UefiCpuPkg/UefiCpuPkg.uni >> index fbf768072668..a7e279c5cb14 100644 >> --- a/UefiCpuPkg/UefiCpuPkg.uni >> +++ b/UefiCpuPkg/UefiCpuPkg.uni >> @@ -37,6 +37,10 @@ >> >> #string >> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuApInitTimeOutInMicroSeconds_HEL >> P #language en-US "Specifies timeout value in microseconds for the BSP to >> detect all APs for the first time." >> >> +#string >> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuBootLogicalProcessorNumber_PRO >> MPT #language en-US "Number of Logical Processors available after platform >> reset." >> + >> +#string >> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuBootLogicalProcessorNumber_HELP >> #language en-US "Specifies the number of Logical Processors that are available >> in the preboot environment after platform reset, including BSP and APs." >> + >> #string >> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuMicrocodePatchAddress_PROMPT >> #language en-US "Microcode Region base address." >> >> #string >> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuMicrocodePatchAddress_HELP >> #language en-US "Specifies the base address of the first microcode Patch in >> the microcode Region." >> diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf >> b/UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf >> index 37b3f64e578a..cd912ab0c5ee 100644 >> --- a/UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf >> +++ b/UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf >> @@ -61,6 +61,7 @@ [Guids] >> >> [Pcd] >> gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber ## >> CONSUMES >> + gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber ## >> CONSUMES >> gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds ## >> SOMETIMES_CONSUMES >> gUefiCpuPkgTokenSpaceGuid.PcdCpuApStackSize ## CONSUMES >> gUefiCpuPkgTokenSpaceGuid.PcdCpuMicrocodePatchAddress ## >> CONSUMES >> diff --git a/UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf >> b/UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf >> index 82b77b63ea87..1538185ef99a 100644 >> --- a/UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf >> +++ b/UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf >> @@ -53,7 +53,8 @@ [LibraryClasses] >> >> [Pcd] >> gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber ## >> CONSUMES >> - gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds ## >> CONSUMES >> + gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber ## >> CONSUMES >> + gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds ## >> SOMETIMES_CONSUMES >> gUefiCpuPkgTokenSpaceGuid.PcdCpuApStackSize ## CONSUMES >> gUefiCpuPkgTokenSpaceGuid.PcdCpuMicrocodePatchAddress ## >> CONSUMES >> gUefiCpuPkgTokenSpaceGuid.PcdCpuMicrocodePatchRegionSize ## >> CONSUMES >> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c >> b/UefiCpuPkg/Library/MpInitLib/MpLib.c >> index d6f84c6f45c0..f1bf8a7ba7cf 100644 >> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c >> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c >> @@ -1044,24 +1044,32 @@ WakeUpAP ( >> SendInitSipiSipiAllExcludingSelf ((UINT32) ExchangeInfo->BufferStart); >> } >> if (CpuMpData->InitFlag == ApInitConfig) { >> - // >> - // Here support two methods to collect AP count through adjust >> - // PcdCpuApInitTimeOutInMicroSeconds values. >> - // >> - // one way is set a value to just let the first AP to start the >> - // initialization, then through the later while loop to wait all Aps >> - // finsh the initialization. >> - // The other way is set a value to let all APs finished the initialzation. >> - // In this case, the later while loop is useless. >> - // >> - TimedWaitForApFinish ( >> - CpuMpData, >> - PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, >> - PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) >> - ); >> + if (PcdGet32 (PcdCpuBootLogicalProcessorNumber) > 0) { >> + TimedWaitForApFinish ( >> + CpuMpData, >> + PcdGet32 (PcdCpuBootLogicalProcessorNumber) - 1, >> + MAX_UINT32 // approx. 71 minutes > > Do you think if the PcdCpuBootLogicalProcessorNumber is not zero, driver should always wait for all the processors specified by PcdCpuBootLogicalProcessorNumber ready? Yes, it absolutely must. Control must never advance past this point if at least one of the CPUs announced in PcdCpuBootLogicalProcessorNumber fails to check in, regardless of time limit. > I just have little concern about the MAX_UINT32 usage. I not sure whether there has a case that some processors can't wake up? In fact, I would have preferred "absolute infinity" over MAX_UINT32 here; MAX_UINT32 (~71 minutes) is just a substitute that's "good enough" in practice. So, to confirm: if a platform sets PcdCpuBootLogicalProcessorNumber to a nonzero value, then the platform *must not boot* at all, if at least one processor (from the BSP and APs together) fails to wake, from that number. Speaking generally, if you encounter a timeout situation -- you give up waiting for the result of a particular action that you initiated --, you don't know what the end result is going to be. The action may never finish, or it may very well finish *after* you stop waiting. This is a general characteristic of all timeouts. In this specific case, hanging the boot forever (= failing to boot) is *much* better than having a stray processor wake up after we stop waiting, and then run amok in no man's land. It is my explicit goal to remove the timeout condition from this logic, and to make it block *forever* if at least one CPU fails to check in. If this policy is not suitable for a platform, then it should not set the new PCD to a nonzero value. (The default value is zero.) For OVMF running on QEMU/KVM, this is the right policy however. If one of your VCPUs fails to check in, then the virtualization stack (QEMU and/or KVM) under OVMF is busted, and the guest should *not* continue booting. Continuing would only lead to more misery down the line. Thanks, Laszlo > > Thanks, > Eric > >> + ); >> + } else { >> + // >> + // Here support two methods to collect AP count through adjust >> + // PcdCpuApInitTimeOutInMicroSeconds values. >> + // >> + // one way is set a value to just let the first AP to start the >> + // initialization, then through the later while loop to wait all Aps >> + // finsh the initialization. >> + // The other way is set a value to let all APs finished the >> + // initialzation. In this case, the later while loop is useless. >> + // >> + TimedWaitForApFinish ( >> + CpuMpData, >> + PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, >> + PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) >> + ); >> >> - while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) { >> - CpuPause(); >> + while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) { >> + CpuPause(); >> + } >> } >> } else { >> // >> -- >> 2.19.1.3.g30247aa5d201 >> >> >> >> -=-=-=-=-=-= >> Groups.io Links: You receive all messages sent to this group. >> >> View/Reply Online (#48563): https://edk2.groups.io/g/devel/message/48563 >> Mute This Topic: https://groups.io/mt/34441228/1768733 >> Group Owner: devel+owner@edk2.groups.io >> Unsubscribe: https://edk2.groups.io/g/devel/unsub [eric.dong@intel.com] >> -=-=-=-=-=-= >