From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7F36C81E30 for ; Mon, 14 Nov 2016 17:04:53 -0800 (PST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP; 14 Nov 2016 17:04:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,640,1473145200"; d="scan'208";a="1059495380" Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204]) by orsmga001.jf.intel.com with ESMTP; 14 Nov 2016 17:04:57 -0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by FMSMSX106.amr.corp.intel.com (10.18.124.204) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 14 Nov 2016 17:04:56 -0800 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.239]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.138]) with mapi id 14.03.0248.002; Tue, 15 Nov 2016 09:04:55 +0800 From: "Fan, Jeff" To: Laszlo Ersek , Paolo Bonzini CC: "edk2-devel@ml01.01.org" , "Yao, Jiewen" Thread-Topic: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Thread-Index: AQHSO97k043lSFBUy0mbX04flay71KDTq3eAgAMz0ICAAMHjAIAACSgAgAAeoYCAAAhqAIAABPsAgAAJRwCAAGZ0AIAAAaeAgABf0gCAAJPNIP//fu+AgACGVwA= Date: Tue, 15 Nov 2016 01:04:55 +0000 Message-ID: <542CF652F8836A4AB8DBFAAD40ED192A4A2DCE84@shsmsx102.ccr.corp.intel.com> References: <20161111054545.19616-1-jeff.fan@intel.com> <542CF652F8836A4AB8DBFAAD40ED192A4A2DB4F5@shsmsx102.ccr.corp.intel.com> <00b6828b-78c5-af4f-ab98-de4460b1b8ec@redhat.com> <4dc14e5c-9b43-4338-c7a5-9750e8a9547a@redhat.com> <3e61ffc4-9eaf-0015-11a7-e2d698624acb@redhat.com> <648314a4-6c17-7c88-7e47-98c4de95fe2d@redhat.com> <7bca5070-be51-aa99-bdb6-9fcc03086430@redhat.com> <542CF652F8836A4AB8DBFAAD40ED192A4A2DCDC0@shsmsx102.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMWZjYzFiZGUtMTk0NS00ODk5LWE1ZGQtZDg4ZTM2YTUyZWE3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IlA1YzNIUjc1UlRUakZIYzBcL0NwelVBTHRlODRKRWhUZndMWitsNmlWODdFPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Subject: Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Nov 2016 01:04:53 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Sure. Please add it. :-) -----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Lasz= lo Ersek Sent: Tuesday, November 15, 2016 9:03 AM To: Fan, Jeff; Paolo Bonzini Cc: edk2-devel@ml01.01.org; Yao, Jiewen Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 pat= h On 11/15/16 01:47, Fan, Jeff wrote: > Laszlo, >=20 > I file https://bugzilla.tianocore.org/show_bug.cgi?id=3D230 to handle AP = lost issue. >=20 > I will push the v2 serial of patches, Paolo has already added r-b signatu= re on the first 2 patches in v1 serials. Please add my Tested-by: Laszlo Ersek to the first and the third patch. If you can wait a bit, I might be able to extend my T-b to the second patch= as well. (More details to follow soon.) Thanks! Laszlo > -----Original Message----- > From: Laszlo Ersek [mailto:lersek@redhat.com] > Sent: Tuesday, November 15, 2016 7:56 AM > To: Paolo Bonzini; Fan, Jeff > Cc: edk2-devel@ml01.01.org; Yao, Jiewen > Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on=20 > S3 path >=20 > On 11/14/16 19:13, Paolo Bonzini wrote: >> >> >> On 14/11/2016 19:07, Laszlo Ersek wrote: >>> On 11/14/16 13:00, Paolo Bonzini wrote: >>>> >>>> >>>> On 14/11/2016 12:27, Laszlo Ersek wrote: >>>>> Well... >>>>> >>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.h >>>>> t >>>>> ml >>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.h >>>>> t >>>>> ml >>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.h >>>>> t >>>>> ml >>>>> >>>>> Are you suggesting that I resurrect this patch? That would be my=20 >>>>> pleasure. Please say yes. >>>> >>>> It's hard to say no when someone has written the code already. :) >>> >>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes=20 >>> just more precise commit messages). Unfortunately, quite a few=20 >>> things seem broken, although these patches worked a year ago. >>> >>> My QEMU base commit is current master 83c83f9a5266. My host kernel=20 >>> is 3.10.0-514.el7.x86_64. >>> >>> *** So, when I test these two patches, based on edk2 master (no=20 >>> on-list patches), Ia32 target, my boot hangs (spins) with the log endin= g in: >>> >>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0 >>> >>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes= . >>> "info cpus" prints: >>> >>> * CPU #0: pc=3D0x000000007f1f7763 thread_id=3D17395 >>> CPU #1: pc=3D0x000000007f2ce01e (halted) thread_id=3D17396 >>> CPU #2: pc=3D0x000000007f2ce01e (halted) thread_id=3D17397 >>> CPU #3: pc=3D0x00000000fffffff0 thread_id=3D17398 >>> >>> and I've also seen a case where all the APs were stuck at the reset=20 >>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They=20 >>> don't spin, they're just stuck. The spinning comes from CPU#0,=20 >>> apparently in MpInitChangeApLoopCallback. >>> >>> *** I flipped the AP sync mode to traditional (considering the=20 >>> relaxed mode shouldn't be required with the broadcast SMIs). This=20 >>> time the log ends with: >>> >>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0 >>>> MpInitChangeApLoopCallback() done! >>> >>> but then QEMU abort()s: >>> >>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists >>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=3Dcrashed >>> >>> I see some ioeventfd stuff in the recent QEMU history; do you think=20 >>> it's related? >> >> Yes, just try 2.7 for now or disable vhost. >=20 > (1) I think I have some new results. >=20 > I used the gdbserver built into QEMU, and I (sort of) single-stepped=20 > the > MpInitChangeApLoopCallback() function in "UefiCpuPkg/Library/MpInitLib/Dx= eMpLib.c", and whatever else it called. > For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI pat= ches were applied to both QEMU and OVMF. >=20 > I set a breakpoint on RelocateApLoop(), so that when an AP would start up= , gdb would switch to it automatically (and it did in fact). >=20 > I liberally used "info cpus" from a separate terminal, and also "info=20 > thread" from within gdb (which gives an incredibly cool insight into=20 > the VCPU states!) >=20 > (2) Here's a few interesting results (strictly empirically): >=20 > * when I was stepping through the SendInitSipiSipiAllExcludingSelf() > function *real slow*, manually, suddenly things worked >=20 > * I noticed that, while stepping through the INIT-SIPI-SIPI sequence in > the above-mentioned BSP function, the APs switched from "halted" to > "running" after the *first* SIPI. Not the second SIPI, the first one. >=20 > Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My expla= nation is terribly inexact, but: >=20 > * __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and > APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and=20 > KVM_APIC_SIPI) >=20 > * this function also calls kvm_vcpu_kick() >=20 > * the kvm_apic_accept_events() function processes the pending events. > The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be > called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state. >=20 > And, in that state, a pending KVM_APIC_SIPI event moves the AP to > KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result > that only one SIPI is awaited after the INIT before launching the APs. >=20 > * the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a > smp_send_reschedule() call, if the kicker and the "kickee" are > different processors (for example, when the BSP kicks the AP) >=20 > This seemed important because it suggested that host kernel > scheduling jitter could delay the delivery (reception) of the INIT on > the AP after the BSP sent it. If the BSP sent the first (and second) > SIPI really fast after the INIT, then those SIPIs could be missed, > and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state. >=20 > And this state (remember kvm_vcpu_reset() from above!) is consistent > with the RIP being stuck at the reset vector address, with the AP > neither running nor being halted. >=20 > (3) So, I looked at the Intel SDM. It says in the sysprog book (yes,=20 > yes, I should be using the humongous fused edition), >=20 > 8.4.4.1 Typical BSP Initialization Sequence >=20 > 15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them > up and initialize them: >=20 > MOV ESI, ICR_LOW; Load address of ICR low dword into ESI. > MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI > ; to all APs into EAX. > MOV [ESI], EAX; Broadcast INIT IPI to all APs >=20 > ; 10-millisecond delay loop. >=20 > MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP > ; to all APs into EAX, where xx is the vector > ; computed in step 10. > MOV [ESI], EAX; Broadcast SIPI IPI to all APs >=20 > ; 200-microsecond delay loop >=20 > MOV [ESI], EAX; Broadcast second SIPI IPI to all APs >=20 > ; 200-microsecond delay loop >=20 > In the SendInitSipiSipiAllExcludingSelf(), we have a >=20 > MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds)); >=20 > between the INIT and the first SIPI, and a >=20 > MicroSecondDelay (200); >=20 > between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in > UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation. >=20 > I think this could be too low for KVM. (It's telling that the value is=20 > a PCD in the first place.) >=20 > (4) The circumstances where the "AP is lost" -- due to the missed > SIPI(s) I believe -- vary, interestingly. These results are from my testi= ng with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD support. Jeff= 's v2 (=3D=3D this series) is applied invariably as a basis (because we agr= ee that it fixes bugs). >=20 > * The "AP lost" issue persists at S3 resume even if I raise > PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced > at the 43th resume.) >=20 > If I set the kernel's "cpu_init_udelay" parameter similarly in > addition, then that seems to make the symptom more frequent, not > less, which is completely counter-intuitive :( >=20 > * If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2), > then I can't even boot; the AP (or some APs) regularly lose the SIPI > and remain stuck in "INIT received" (I think) when started from > MpInitChangeApLoopCallback(). Hence the boot never completes. >=20 > Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the > boot. (And, the original goal of the broadcast SMI patches is > achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.) > However, an AP still gets stuck during S3 occasionally :( >=20 > And, cpu_init_udelay=3D100000 for the guest kernel makes it only worse. >=20 > Ultimately, I couldn't find a way to make S3 work reliably in the Ia32 SM= M build. This v2 series doesn't help, broadcast SMIs don't help, raising th= e INIT<->SIPI delay in the firmware doesn't help (it just mitigates the bad= effects of the broadcast SMIs :/), and raising the same delay in the guest= kernel only makes things worse. >=20 > Jeff, I think you should go ahead and commit this series. Paolo reviewed = patch #3 and I hope Mike or Jiewen can review the first two patches. For OV= MF they improve things (no more emulation failures), and I guess we can fig= ure out the lost AP issue later. >=20 > Thanks > Laszlo >=20 _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel