From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 9149981E34 for ; Mon, 14 Nov 2016 17:30:40 -0800 (PST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D4CAEC056793; Tue, 15 Nov 2016 01:30:44 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-50.phx2.redhat.com [10.3.116.50]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAF1UggR025011; Mon, 14 Nov 2016 20:30:43 -0500 To: "Fan, Jeff" , Paolo Bonzini References: <20161111054545.19616-1-jeff.fan@intel.com> <542CF652F8836A4AB8DBFAAD40ED192A4A2DB4F5@shsmsx102.ccr.corp.intel.com> <00b6828b-78c5-af4f-ab98-de4460b1b8ec@redhat.com> <4dc14e5c-9b43-4338-c7a5-9750e8a9547a@redhat.com> <3e61ffc4-9eaf-0015-11a7-e2d698624acb@redhat.com> <648314a4-6c17-7c88-7e47-98c4de95fe2d@redhat.com> <7bca5070-be51-aa99-bdb6-9fcc03086430@redhat.com> <542CF652F8836A4AB8DBFAAD40ED192A4A2DD0A3@shsmsx102.ccr.corp.intel.com> Cc: "edk2-devel@ml01.01.org" , "Yao, Jiewen" From: Laszlo Ersek Message-ID: <1afe7e7a-709c-87d4-0664-2b61e8c2110f@redhat.com> Date: Tue, 15 Nov 2016 02:30:42 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <542CF652F8836A4AB8DBFAAD40ED192A4A2DD0A3@shsmsx102.ccr.corp.intel.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 15 Nov 2016 01:30:44 +0000 (UTC) Subject: Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Nov 2016 01:30:40 -0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 11/15/16 02:19, Fan, Jeff wrote: > Laszlo, > > Have you tried to decrease PcdCpuInitIpiDelayInMicroSeconds? > > For PCD PcdCpuInitIpiDelayInMicroSeconds before two SIPIs, we introduced this PCD is just to do customization. > MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds)); > > Per our experience, we could decrease this PCD value to 10us (microsecond) on some platforms/processors. You may try it. I didn't try to decrease it, but experimenting with it should no longer be necessary. (See my other email.) Moving to the broadcast SMI pattern and the traditional AP sync method *really* fixes things (together with this series of yours). Thanks! Laszlo > -----Original Message----- > From: Laszlo Ersek [mailto:lersek@redhat.com] > Sent: Tuesday, November 15, 2016 7:56 AM > To: Paolo Bonzini; Fan, Jeff > Cc: edk2-devel@ml01.01.org; Yao, Jiewen > Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path > > On 11/14/16 19:13, Paolo Bonzini wrote: >> >> >> On 14/11/2016 19:07, Laszlo Ersek wrote: >>> On 11/14/16 13:00, Paolo Bonzini wrote: >>>> >>>> >>>> On 14/11/2016 12:27, Laszlo Ersek wrote: >>>>> Well... >>>>> >>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.ht >>>>> ml >>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.ht >>>>> ml >>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.ht >>>>> ml >>>>> >>>>> Are you suggesting that I resurrect this patch? That would be my >>>>> pleasure. Please say yes. >>>> >>>> It's hard to say no when someone has written the code already. :) >>> >>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes >>> just more precise commit messages). Unfortunately, quite a few things >>> seem broken, although these patches worked a year ago. >>> >>> My QEMU base commit is current master 83c83f9a5266. My host kernel is >>> 3.10.0-514.el7.x86_64. >>> >>> *** So, when I test these two patches, based on edk2 master (no >>> on-list patches), Ia32 target, my boot hangs (spins) with the log ending in: >>> >>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0 >>> >>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes. >>> "info cpus" prints: >>> >>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395 >>> CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396 >>> CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397 >>> CPU #3: pc=0x00000000fffffff0 thread_id=17398 >>> >>> and I've also seen a case where all the APs were stuck at the reset >>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They >>> don't spin, they're just stuck. The spinning comes from CPU#0, >>> apparently in MpInitChangeApLoopCallback. >>> >>> *** I flipped the AP sync mode to traditional (considering the >>> relaxed mode shouldn't be required with the broadcast SMIs). This >>> time the log ends with: >>> >>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0 >>>> MpInitChangeApLoopCallback() done! >>> >>> but then QEMU abort()s: >>> >>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists >>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed >>> >>> I see some ioeventfd stuff in the recent QEMU history; do you think >>> it's related? >> >> Yes, just try 2.7 for now or disable vhost. > > (1) I think I have some new results. > > I used the gdbserver built into QEMU, and I (sort of) single-stepped the > MpInitChangeApLoopCallback() function in "UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called. > For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI patches were applied to both QEMU and OVMF. > > I set a breakpoint on RelocateApLoop(), so that when an AP would start up, gdb would switch to it automatically (and it did in fact). > > I liberally used "info cpus" from a separate terminal, and also "info thread" from within gdb (which gives an incredibly cool insight into the VCPU states!) > > (2) Here's a few interesting results (strictly empirically): > > * when I was stepping through the SendInitSipiSipiAllExcludingSelf() > function *real slow*, manually, suddenly things worked > > * I noticed that, while stepping through the INIT-SIPI-SIPI sequence in > the above-mentioned BSP function, the APs switched from "halted" to > "running" after the *first* SIPI. Not the second SIPI, the first one. > > Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My explanation is terribly inexact, but: > > * __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and > APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and KVM_APIC_SIPI) > > * this function also calls kvm_vcpu_kick() > > * the kvm_apic_accept_events() function processes the pending events. > The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be > called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state. > > And, in that state, a pending KVM_APIC_SIPI event moves the AP to > KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result > that only one SIPI is awaited after the INIT before launching the APs. > > * the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a > smp_send_reschedule() call, if the kicker and the "kickee" are > different processors (for example, when the BSP kicks the AP) > > This seemed important because it suggested that host kernel > scheduling jitter could delay the delivery (reception) of the INIT on > the AP after the BSP sent it. If the BSP sent the first (and second) > SIPI really fast after the INIT, then those SIPIs could be missed, > and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state. > > And this state (remember kvm_vcpu_reset() from above!) is consistent > with the RIP being stuck at the reset vector address, with the AP > neither running nor being halted. > > (3) So, I looked at the Intel SDM. It says in the sysprog book (yes, yes, I should be using the humongous fused edition), > > 8.4.4.1 Typical BSP Initialization Sequence > > 15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them > up and initialize them: > > MOV ESI, ICR_LOW; Load address of ICR low dword into ESI. > MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI > ; to all APs into EAX. > MOV [ESI], EAX; Broadcast INIT IPI to all APs > > ; 10-millisecond delay loop. > > MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP > ; to all APs into EAX, where xx is the vector > ; computed in step 10. > MOV [ESI], EAX; Broadcast SIPI IPI to all APs > > ; 200-microsecond delay loop > > MOV [ESI], EAX; Broadcast second SIPI IPI to all APs > > ; 200-microsecond delay loop > > In the SendInitSipiSipiAllExcludingSelf(), we have a > > MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds)); > > between the INIT and the first SIPI, and a > > MicroSecondDelay (200); > > between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in > UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation. > > I think this could be too low for KVM. (It's telling that the value is a PCD in the first place.) > > (4) The circumstances where the "AP is lost" -- due to the missed > SIPI(s) I believe -- vary, interestingly. These results are from my testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD support. Jeff's v2 (== this series) is applied invariably as a basis (because we agree that it fixes bugs). > > * The "AP lost" issue persists at S3 resume even if I raise > PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced > at the 43th resume.) > > If I set the kernel's "cpu_init_udelay" parameter similarly in > addition, then that seems to make the symptom more frequent, not > less, which is completely counter-intuitive :( > > * If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2), > then I can't even boot; the AP (or some APs) regularly lose the SIPI > and remain stuck in "INIT received" (I think) when started from > MpInitChangeApLoopCallback(). Hence the boot never completes. > > Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the > boot. (And, the original goal of the broadcast SMI patches is > achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.) > However, an AP still gets stuck during S3 occasionally :( > > And, cpu_init_udelay=100000 for the guest kernel makes it only worse. > > Ultimately, I couldn't find a way to make S3 work reliably in the Ia32 SMM build. This v2 series doesn't help, broadcast SMIs don't help, raising the INIT<->SIPI delay in the firmware doesn't help (it just mitigates the bad effects of the broadcast SMIs :/), and raising the same delay in the guest kernel only makes things worse. > > Jeff, I think you should go ahead and commit this series. Paolo reviewed patch #3 and I hope Mike or Jiewen can review the first two patches. For OVMF they improve things (no more emulation failures), and I guess we can figure out the lost AP issue later. > > Thanks > Laszlo >