From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.groups.io with SMTP id smtpd.web12.8541.1570794643149163172 for ; Fri, 11 Oct 2019 04:50:43 -0700 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AD17F309BDA1; Fri, 11 Oct 2019 11:50:42 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-177.rdu2.redhat.com [10.10.120.177]) by smtp.corp.redhat.com (Postfix) with ESMTP id 66EBF5DA8D; Fri, 11 Oct 2019 11:50:41 +0000 (UTC) Subject: Re: [edk2-devel] OVMF is crashing for me in master To: devel@edk2.groups.io, liming.gao@intel.com, "afish@apple.com" , Pete Batard References: <4A89E2EF3DFEDB4C8BFDE51014F606A14E515A05@SHSMSX104.ccr.corp.intel.com> From: "Laszlo Ersek" Message-ID: <0411bb4e-7d90-df7e-8fa0-5072275f9c62@redhat.com> Date: Fri, 11 Oct 2019 13:50:40 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <4A89E2EF3DFEDB4C8BFDE51014F606A14E515A05@SHSMSX104.ccr.corp.intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Fri, 11 Oct 2019 11:50:42 +0000 (UTC) Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit Hi, On 10/11/19 03:19, Liming Gao wrote: > Andrew: > I verify the change (2de1f611be06ded3a59726a4052a9039be7d459b > MdeModulePkg/BdsDxe: Also call PlatformBootManagerWaitCallback on 0) > in Emulator. > It works, because PCD value is set to 10 in Emulator. > > Before this change, if TimeOut PCD is zero, BdsEntry doesn't call > PlatformBootManagerWaitCallback(). > After this change, if TimeOut PCD is zero, BdsEntry still call > PlatformBootManagerWaitCallback(). So, it trigs this issue. I agree > your fix. > > Pete: > Will you contribute the patch to fix this hang issue in OVMF? So, when Pete submitted the BDS patch, I didn't test it, I only looked at the patch email. See my feedback here: http://mid.mail-archive.com/8bacbb53-98df-724e-b704-b5059d11e378@redhat.com https://edk2.groups.io/g/devel/message/48133 It seemed OK to me. In particular, I reasoned about (TimeoutRemain==0), and the patch seemed to do the right thing for that. Unfortunately, what I missed was the following case: - if PcdPlatformBootTimeOut is zero, then the patch by Pete does not simply make one more call to PlatformBootManagerWaitCallback(), with argument zero, but it makes the *ONLY* call to PlatformBootManagerWaitCallback(). I missed this because (like normal) the patch was sent with a 3-line context, and I didn't think it necessary to review the full function context. Here is the commit, with full function context: > diff --git a/MdeModulePkg/Universal/BdsDxe/BdsEntry.c b/MdeModulePkg/Universal/BdsDxe/BdsEntry.c > index f3d5e5ac0615..7968a58f3454 100644 > --- a/MdeModulePkg/Universal/BdsDxe/BdsEntry.c > +++ b/MdeModulePkg/Universal/BdsDxe/BdsEntry.c > @@ -310,47 +310,48 @@ VOID > BdsWait ( > IN EFI_EVENT HotkeyTriggered > ) > { > EFI_STATUS Status; > UINT16 TimeoutRemain; > > DEBUG ((EFI_D_INFO, "[Bds]BdsWait ...Zzzzzzzzzzzz...\n")); > > TimeoutRemain = PcdGet16 (PcdPlatformBootTimeOut); > while (TimeoutRemain != 0) { > DEBUG ((EFI_D_INFO, "[Bds]BdsWait(%d)..Zzzz...\n", (UINTN) TimeoutRemain)); > PlatformBootManagerWaitCallback (TimeoutRemain); > > BdsReadKeys (); // BUGBUG: Only reading can signal HotkeyTriggered > // Can be removed after all keyboard drivers invoke callback in timer callback. > > if (HotkeyTriggered != NULL) { > Status = BdsWaitForSingleEvent (HotkeyTriggered, EFI_TIMER_PERIOD_SECONDS (1)); > if (!EFI_ERROR (Status)) { > break; > } > } else { > gBS->Stall (1000000); > } > > // > // 0xffff means waiting forever > // BDS with no hotkey provided and 0xffff as timeout will "hang" in the loop > // > if (TimeoutRemain != 0xffff) { > TimeoutRemain--; > } > } > + PlatformBootManagerWaitCallback (0); > DEBUG ((EFI_D_INFO, "[Bds]Exit the waiting!\n")); > } We can see that the function used to handle the PcdPlatformBootTimeOut==0 situation well, and make *no* call to PlatformBootManagerWaitCallback(). The patch broke that behavior. Therefore, I disagree with the idea to fix this in OvmfPkg. Any number of platforms may exist that rely on PlatformBootManagerWaitCallback() not being called when PcdPlatformBootTimeOut is zero. (To name just one other example, ArmVirtPkg too is affected.) And for OVMF and ArmVirtPkg, it is definitely correct to *not* display any progress bar if PcdPlatformBootTimeOut is 0. Please consult the definition of the PCD in "MdePkg/MdePkg.dec": > ## The number of seconds that the firmware will wait before initiating the original default boot selection. > # A value of 0 indicates that the default boot selection is to be initiated immediately on boot. > # The value of 0xFFFF then firmware will wait for user input before booting. > # @Prompt Boot Timeout (s) > gEfiMdePkgTokenSpaceGuid.PcdPlatformBootTimeOut|0xffff|UINT16|0x0000002c Also see the specification of PlatformBootManagerWaitCallback(), in "MdeModulePkg/Include/Library/PlatformBootManagerLib.h": > /** > This function is called each second during the boot manager waits the timeout. > > @param TimeoutRemain The remaining timeout. > **/ > VOID > EFIAPI > PlatformBootManagerWaitCallback ( > UINT16 TimeoutRemain > ); According to the DEC file, there is no waiting if the PCD is zero. Therefore, if the PCD is zero, the function should not be called at all. There is another complication, namely, if the PCD is 0xFFFF. In this case, the loop goes on until a hotkey is pressed. In every iteration (that is, every second), PlatformBootManagerWaitCallback() is called with argument 0xFFFF, invariably. Clearly, this *cannot* cause a platform to display any progress *changes*. In the most common callback implementation, namely in the formula (Timeout - TimeoutRemain) * 100 / Timeout, it will mean (0xFFFF - 0xFFFF) * 100 / 0xFFFF, that is, "zero percent", in every iteration. Based on that, it looks wrong to suddenly call PlatformBootManagerWaitCallback() with a zero argument ("hundred percent completion"), after it's been called, let's say for a minute or more, with a constant 0xFFFF argument ("zero percent completion"), until the user finally presses a hotkey. Jumping from zero to 100% is discontiguous and doesn't appear helpful to me. Further yet, what if there is an actual progress bar and timeout (like 10 seconds), and the user presses a hotkey after 5 seconds? Should the progress bar jump to 100% at once? I wouldn't think so. Therefore, I claim that the right fix is: > diff --git a/MdeModulePkg/Universal/BdsDxe/BdsEntry.c b/MdeModulePkg/Universal/BdsDxe/BdsEntry.c > index 7968a58f3454..d12829afeb8b 100644 > --- a/MdeModulePkg/Universal/BdsDxe/BdsEntry.c > +++ b/MdeModulePkg/Universal/BdsDxe/BdsEntry.c > @@ -341,7 +341,16 @@ BdsWait ( > TimeoutRemain--; > } > } > - PlatformBootManagerWaitCallback (0); > + // > + // If the platform configured a nonzero and finite time-out, and we have > + // actually reached that, report 100% completion to the platform. > + // > + // Note that the (TimeoutRemain == 0) condition excludes > + // PcdPlatformBootTimeOut=0xFFFF, and that's deliberate. > + // > + if (PcdGet16 (PcdPlatformBootTimeOut) != 0 && TimeoutRemain == 0) { > + PlatformBootManagerWaitCallback (0); > + } > DEBUG ((EFI_D_INFO, "[Bds]Exit the waiting!\n")); > } > Thanks! Laszlo