From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=lersek@redhat.com; receiver=edk2-devel@lists.01.org Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2DC70202E6160 for ; Tue, 24 Oct 2017 10:36:40 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 24DCEC07015F; Tue, 24 Oct 2017 17:40:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 24DCEC07015F Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lersek@redhat.com Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-52.rdu2.redhat.com [10.10.120.52]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8C6685DA60; Tue, 24 Oct 2017 17:40:22 +0000 (UTC) To: "Dong, Eric" , "edk2-devel@lists.01.org" Cc: "Ni, Ruiyu" , Paolo Bonzini , Jeff Fan References: <1508743358-3640-1-git-send-email-eric.dong@intel.com> <1508743358-3640-3-git-send-email-eric.dong@intel.com> <4fe39a52-0cd3-de2e-84f2-7363823a1b60@redhat.com> From: Laszlo Ersek Message-ID: <1329f926-4c33-aaca-108a-7d15ae0503bc@redhat.com> Date: Tue, 24 Oct 2017 19:40:21 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 24 Oct 2017 17:40:24 +0000 (UTC) Subject: Re: [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for AP initialization logic. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Oct 2017 17:36:41 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Hi Eric, On 10/24/17 17:23, Dong, Eric wrote: > Laszlo, > >> -----Original Message----- >> From: Laszlo Ersek [mailto:lersek@redhat.com] >> Sent: Tuesday, October 24, 2017 6:16 PM >> To: Dong, Eric ; edk2-devel@lists.01.org >> Cc: Ni, Ruiyu ; Paolo Bonzini >> Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for >> AP initialization logic. >> >> CC Paolo >> >> On 10/23/17 09:22, Eric Dong wrote: >>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc >>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc >>> index 976af1f..bdfe0d3 100644 >>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc >>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc >>> @@ -40,4 +40,5 @@ EnableExecuteDisableLocation equ LockLocation >> + 30h >>> Cr3Location equ LockLocation + 34h >>> InitFlagLocation equ LockLocation + 38h >>> CpuInfoLocation equ LockLocation + 3Ch >>> +NumApsExecutingLocation equ LockLocation + 40h >>> >>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm >>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm >>> index 1b9c6a6..2b6c27d 100644 >>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm >>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm >>> @@ -86,6 +86,12 @@ Flat32Start: ; protected mode >> entry point >>> >>> mov esi, ebx >>> >>> + ; Increment the number of APs executing here as early as possible >>> + ; This is decremented in C code when AP is finished executing >>> + mov edi, esi >>> + add edi, NumApsExecutingLocation >>> + lock inc dword [edi] >>> + >>> mov edi, esi >>> add edi, EnableExecuteDisableLocation >>> cmp byte [edi], 0 >>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c >>> b/UefiCpuPkg/Library/MpInitLib/MpLib.c >>> index db923c9..48f930b 100644 >>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c >>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c >>> @@ -662,6 +662,7 @@ ApWakeupFunction ( >>> // AP finished executing C code >>> // >>> InterlockedIncrement ((UINT32 *) &CpuMpData->FinishedCount); >>> + InterlockedDecrement ((UINT32 *) >>> + &CpuMpData->MpCpuExchangeInfo->NumApsExecuting); >>> >>> // >>> // Place AP is specified loop mode @@ -765,6 +766,7 @@ >>> FillExchangeInfoData ( >>> >>> ExchangeInfo->CFunction = (UINTN) ApWakeupFunction; >>> ExchangeInfo->ApIndex = 0; >>> + ExchangeInfo->NumApsExecuting = 0; >>> ExchangeInfo->InitFlag = (UINTN) CpuMpData->InitFlag; >>> ExchangeInfo->CpuInfo = (CPU_INFO_IN_HOB *) (UINTN) >> CpuMpData->CpuInfoInHob; >>> ExchangeInfo->CpuMpData = CpuMpData; >>> @@ -934,13 +936,19 @@ WakeUpAP ( >>> } >>> if (CpuMpData->InitFlag == ApInitConfig) { >>> // >>> - // Wait for all potential APs waken up in one specified period >>> + // Wait for one potential AP waken up in one specified period >>> // >>> - TimedWaitForApFinish ( >>> - CpuMpData, >>> - PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, >>> - PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) >>> - ); >>> + if (CpuMpData->CpuCount == 0) { >>> + TimedWaitForApFinish ( >>> + CpuMpData, >>> + PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1, >>> + PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds) >>> + ); >>> + } >> >> I don't understand this change. The new comment says, >> >> Wait for *one* potential AP waken up in one specified period >> >> However, the second parameter of TimedWaitForApFinish(), namely >> "FinishedApLimit", gets the same value as before: >> >> PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1 >> >> It means that all of the (possible) APs are waited-for, just the same >> as before. > > [[Eric]] This patch changes the collect AP count logic, original > solution always waits for a specific time to let all APs start up. If > the input CPU number (PcdGet32(PcdCpuMaxLogicalProcessorNumber) - 1) > have been found or after a specific time(PcdGet32 > (PcdCpuApInitTimeOutInMicroSeconds)). BSP will not wait anymore and use > CpuMpData->CpuCount as the found AP count. > > New logic also wait for a specific time, but this time is smaller than > the original one. It just wait for the first AP(any AP) begin to do the > initialization( do CpuMpData->MpCpuExchangeInfo->NumApsExecuting++ means > it begin to do the initialization). When Ap finishes initialization, it > will do CpuMpData->MpCpuExchangeInfo->NumApsExecuting--. So after BSP > waits for a specific time at first, it just needs to check whether > CpuMpData->MpCpuExchangeInfo->NumApsExecuting == 0 to know whether all > Aps have finished initialization. Here we still use the original PCD > (PcdCpuApInitTimeOutInMicroSeconds) for the new time value. > > When one AP do the initialization, it will also do > CpuMpData->CpuCount++. So here I check whether CpuMpData->CpuCount != 0 > to know whether APs already begin to do the initialization. If yes, I > not need to do the time out waiting anymore, just check the > CpuMpData->MpCpuExchangeInfo->NumApsExecuting to know whether all Aps > have finished initialization. Thanks for the explanation. The "NumApsExecuting" increment / decrement logic in this patch expects that the APs work as follows: (1) After the INIT-SIPI-SIPI, there may be a delay before the APs start running. During this delay, the BSP may or may not reach the code in question. The (CpuMpData->CpuCount != 0) check is supposed to take this into account. (2) After at least one AP has started running, the logic expects "NumApsExecuting" to monotonically grow for a while, and then to monotonically decrease, back to zero. For example, if we have 1 BSP and 7 APs, the BSP logic more or less expects the following values in "NumApsExecuting": 1; 2; 3; 4; 5; 6; 7; 6; 5; 4; 3; 2; 1; 0 While this may be a valid expectation for physical processors (which actually run in parallel, in the physical world), in a virtual machine, it is not guaranteed. Dependent on hypervisor scheduling artifacts, it is possible that, say, three APs start up *and finish* before the remaining four APs start up *at all*. The INIT-SIPI-SIPI marks all APs for execution / scheduling in the hypervisor, yes, but the actual code execution may commence a lot later. For example, the BSP may witness the following series of values in "NumApsExecuting": 1; 2; 3; 2; 1; 0; 1; 2; 3; 4; 3; 2; 1; 0 and the BSP could think that there are 3 APs only, when it sees the first 0 value. Now, let me get back to the use case that actually matters to OVMF and QEMU. As I mentioned earlier, OVMF knows from QEMU the exact number of APs. If there is a way for OVMF to tell MpInitLib to wait for this exact number of APs -- no matter how long it takes --, then that's what I would like to use. Please see the original discussion around OVMF commit 45a70db3c3a59: * In version 1, I introduced a new PCD called PcdCpuKnownLogicalProcessorNumber and I modified MpInitLib to wait for this AP number, ignoring timeout completely, if the PCD was set: https://lists.01.org/pipermail/edk2-devel/2016-November/005076.html However, Jeff suggested to use the preexistent PCD "PcdCpuMaxLogicalProcessorNumber" for this purpose: https://lists.01.org/pipermail/edk2-devel/2016-November/005092.html * In version 2, I used the PCD suggested by Jeff, but I also introduced a new special value for the timeout. Timeout=0 would mean "infinity": https://lists.01.org/pipermail/edk2-devel/2016-November/005209.html Jeff didn't like the special value, and suggested that OVMF simply use MAX_UINT32 microseconds (approx. 71 minutes) instead of "infinity": https://lists.01.org/pipermail/edk2-devel/2016-November/005266.html * In v3, I implemented that, and that was pushed as: - UefiCpuPkg commit 6e1987f19af7 ("UefiCpuPkg/MpInitLib: wait no longer than necessary for initial AP startup", 2016-11-24) - and OvmfPkg commit 45a70db3c3a5 ("OvmfPkg/PlatformPei: take VCPU count from QEMU and configure MpInitLib", 2016-11-24). So, again, the use case that I would like to cover is: * the exact number of APs is known at boot, to OvmfPkg/PlatformPei, * after the very first INIT-SIPI-SIPI, MpInitLib should wait for exactly this number of APs to "report in", regardless of: - how long it takes, - in what order / sequence the APs report in. (Again, please remember that some APs may complete the initialization before other APs execute their very first instruction.) * Preferably, the case should be handled when the processor count grows from cold boot to S3 resume (VCPU hotplug). TianoCore BZ#251 used to track this use case separately. ( Jeff closed BZ#251 today, with the argument that commit 0594ec417c89 (this patch) finds the CPU count dynamically anyway, so a platform can simply set "PcdCpuMaxLogicalProcessorNumber" to the maximum possible value. This argument does not work in a virtual machine, because commit 0594ec417c89 (this patch) may in fact not find the VCPU count correctly -- in a VM, (NumApsExecuting==0) does not imply that all APs have finished. ) Thanks! Laszlo