From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 97A8181D07 for ; Fri, 4 Nov 2016 09:15:48 -0700 (PDT) Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 29E59C0A9CF4; Fri, 4 Nov 2016 16:15:50 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-156.phx2.redhat.com [10.3.116.156]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA4GFl6k025054; Fri, 4 Nov 2016 12:15:47 -0400 To: "Yao, Jiewen" References: <1478156028-21572-1-git-send-email-jiewen.yao@intel.com> <74D8A39837DF1E4DA445A8C0B3885C50386BDEF5@shsmsx102.ccr.corp.intel.com> <74D8A39837DF1E4DA445A8C0B3885C50386BDFD3@shsmsx102.ccr.corp.intel.com> Cc: "Kinney, Michael D" , "Tian, Feng" , =?UTF-8?B?UmFkaW0gS3I/bcOhPw==?= , "edk2-devel@ml01.01.org" , Paolo Bonzini , "Fan, Jeff" , "Zeng, Star" From: Laszlo Ersek Message-ID: Date: Fri, 4 Nov 2016 17:15:46 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <74D8A39837DF1E4DA445A8C0B3885C50386BDFD3@shsmsx102.ccr.corp.intel.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 04 Nov 2016 16:15:50 +0000 (UTC) Subject: Re: [PATCH 0/6] Enable SMM page level protection. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Nov 2016 16:15:48 -0000 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit On 11/04/16 02:15, Yao, Jiewen wrote: > Thank you, Mike. > Yes, I reproduced the issue and found out what is wrong. Here is detail: > > It seems OVMF never puts AP in SMM mode, even *before my patch series*. > I rollback all my update and use e9d093. > > I add some debug message in PerformRemainingTasks(), I found below: > *(mSmmMpSyncData->CpuData[0].Present) = 1 > *(mSmmMpSyncData->CpuData[1].Present) = 0 > *(mSmmMpSyncData->CpuData[2].Present) = 0 > *(mSmmMpSyncData->CpuData[3].Present) = 0 > *(mSmmMpSyncData->CpuData[4].Present) = 0 > *(mSmmMpSyncData->CpuData[5].Present) = 0 > *(mSmmMpSyncData->CpuData[6].Present) = 0 > *(mSmmMpSyncData->CpuData[7].Present) = 0 > It is never exposed before, because there is no code to call SmmStartupAp. > > I assume all APs should be present, so I added ASSERT. > Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message. > > Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU. > Anyone can confirm that? Thank you very much for narrowing this down. It is a QEMU platform peculiarity that raising an SMI, with EFI_SMM_CONTROL2_PROTOCOL.Trigger(), puts *only* the processor that is executing Trigger() into SMM. The other processors are not affected. In PiSmmCpuDxeSmm, there is BSP code that waits for the APs to show up in SMM, for a specific amount of time. (See PcdCpuSmmSyncMode and PcdCpuSmmApSyncTimeout.) If the APs don't show up in the allotted time, due to the original SMI, then the BSP pulls them in with directed LAPIC writes, if I remember correctly. (We discussed this at great length when we were enabling SMM in OVMF -- I think I may have forgotten most of the details by now.) On the bare metal, the LAPIC writes are just a fallback, and the timeouts are not expected in practice. On QEMU however, those timeouts are the norm (because only the processor calling Trigger() enters SMM). This caused very long waits when using the variable services (with SMM) from the runtime OS, even if all of the related code (OS and firmware alike) were executed only on the BSP (VCPU#0). In order to mitigate this, we set PcdCpuSmmSyncMode to 1 (relaxed), in the following patch: commit 9b1e378811ff730fe4e5bbbba294ea74ef76a915 Author: Paolo Bonzini Date: Mon Nov 30 18:46:46 2015 +0000 OvmfPkg: use relaxed AP SMM synchronization mode Port 0xb2 on QEMU only sends an SMI to the currently executing processor. The SMI handler, however, and in particular SmmWaitForApArrival, currently expects that SmmControl2DxeTrigger triggers an SMI IPI on all processors rather than just the BSP. Thus all SMM invocations loop for a second (the default value of PcdCpuSmmApSyncTimeout) before SmmWaitForApArrival sends another SMI IPI to the APs. With the default SmmCpuFeaturesLib, 32-bit machines must broadcast SMIs because 32-bit machines must reset the MTRRs on each entry to system management modes (they have no SMRRs). However, our virtual platform does not have problems with cacheability of SMRAM, so we can use "directed" SMIs instead. To do this, just set gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode to 1 (aka SmmCpuSyncModeRelaxedAp). This fixes SMM on multiprocessor virtual machines. Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Paolo Bonzini Reviewed-by: Michael Kinney git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19058 6f19259b-4bc3-4df7-8a09-765794883524 Another problem remained though: if the SMI was raised by an AP in the first place -- which is possible when the runtime OS calls the variable services --, then the code still waited for the sync timeout to elapse. This is very-very easy to reproduce with the following command in a Linux guest: taskset -c 1 efibootmgr The "efibootmgr" command issues a big bunch of variable service calls (with the help of the guest kernel, of course), and the "taskset -c 1" command ensures that all of the user space, kernel, and firmware code involved in "efibootmgr" is executed exclusively on VCPU#1 (that is, an AP). (Side remark: invoking efibootmgr without any options causes it to simply print the BootCurrent, Timeout, BootOrder and Boot#### variables. Probably BootNext too, if it exists.) Initially, this command would take up to 35-40 *seconds* to complete on my laptop, even with Paolo's 9b1e378811ff in place. Count - 1 second busy-looping per SMI raised, - possibly several SMIs per variable service call (I don't know how many times the unprivileged part of the variable driver calls into the privileged part, per external service call), - and several variable service calls issued by efibootmgr. This was clearly intolerable, especially because in a multiprocessor linux guest, the efibootmgr program (and the initial variable service calls issued by the kernel during boot) can be scheduled to any VCPU -- the "taskset -c 1" command only synthesizes the worst case. For mitigating *that*, we set PcdCpuSmmApSyncTimeout to 1/10th of a second (from its original 1 second value), in: commit bb0f18b0bce682f6780af997de45bcef146c11ca Author: Laszlo Ersek Date: Mon Nov 30 18:46:50 2015 +0000 OvmfPkg: any AP in SMM should not wait for the BSP for more than 100 ms This patch complements the previous one, "OvmfPkg: use relaxed AP SMM synchronization mode". While that patch focuses on the case when the SMI is raised synchronously by the BSP, on the BSP: BSPHandler() [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] SmmWaitForApArrival() [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] IsSyncTimerTimeout() [UefiCpuPkg/PiSmmCpuDxeSmm/SyncTimer.c] this patch concerns itself with the case when it is one of the APs that raises (and sees delivered) the synchronous SMI: APHandler() [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] IsSyncTimerTimeout() [UefiCpuPkg/PiSmmCpuDxeSmm/SyncTimer.c] Namely, in APHandler() the AP waits for the BSP to enter SMM regardless of PcdCpuSmmSyncMode, for PcdCpuSmmApSyncTimeout microseconds (the default value is 1 second). If the BSP doesn't show up in SMM within that interval, then the AP brings it in with a directed SMI, and waits for the BSP again for PcdCpuSmmApSyncTimeout microseconds. Although during boot services, SmmControl2DxeTrigger() is only called by the BSP, at runtime the OS can invoke runtime services from an AP (it can even be forced with "taskset -c 1 efibootmgr"). Because on QEMU SmmControl2DxeTrigger() only raises the SMI for the calling processor (BSP and AP alike), the first interval above times out invariably in such cases -- the BSP never shows up before the AP calls it in. In order to mitigate the performance penalty, decrease PcdCpuSmmApSyncTimeout to one tenth of its default value: 100 ms. (For comparison, Vlv2TbltDevicePkg sets 1 ms.) NOTE: once QEMU becomes capable of synchronous broadcast SMIs, this patch and the previous one ("OvmfPkg: use relaxed AP SMM synchronization mode") should be reverted, and SmmControl2DxeTrigger() should be adjusted instead. Cc: Paolo Bonzini Cc: Jordan Justen Cc: Michael Kinney Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Laszlo Ersek Reviewed-by: Michael Kinney git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19059 6f19259b-4bc3-4df7-8a09-765794883524 With this patch in place, the "taskset -c 1 efibootmgr" command now "only" takes ~4 seconds when executed in a guest running on my laptop. (Again, for comparison, Vlv2TbltDevicePkg sets a hundred times stricter timeout.) This is the status we have to this day. The NOTE at the end of the commit message, about QEMU gaining the ability to raise synchronous broadcast SMIs, has not realized yet. Prototyping that change for QEMU wasn't hard actually; I did it and posted the patch for review: http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html See especially the second link for a (subjective) summary of the opinions around the change. There was a lot of discussion on edk2-devel too (I don't think I'll try to look up the messages now.) Ultimately, we couldn't agree on a solution, so we continue to have only unicast SMIs available for Trigger(). Regarding your question whether this is a bug or feature in QEMU... hard to say. As long as only SeaBIOS was using QEMU's SMI, I guess it qualified as a platform feature. For edk2's purposes, we could call it a bug. The behavior could be changed in QEMU without regressing SeaBIOS's usage pattern (for example by setting a specific APM_STS value from OVMF that is known not to be used by SeaBIOS, before writing to APM_CNT), but we couldn't agree on a solution. I do see that I ultimately withdrew my QEMU patch: http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html so maybe Paolo or someone else convinced me that the approach was wrong. I don't remember. Thanks Laszlo