From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id A136681CE7 for ; Mon, 14 Nov 2016 02:39:37 -0800 (PST) Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 99D527F7C0; Mon, 14 Nov 2016 10:39:41 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-50.phx2.redhat.com [10.3.116.50]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAEAddKe002755; Mon, 14 Nov 2016 05:39:40 -0500 To: Paolo Bonzini , "Fan, Jeff" References: <20161111054545.19616-1-jeff.fan@intel.com> <542CF652F8836A4AB8DBFAAD40ED192A4A2DB4F5@shsmsx102.ccr.corp.intel.com> <00b6828b-78c5-af4f-ab98-de4460b1b8ec@redhat.com> <4dc14e5c-9b43-4338-c7a5-9750e8a9547a@redhat.com> Cc: "edk2-devel@ml01.01.org" , "Yao, Jiewen" From: Laszlo Ersek Message-ID: Date: Mon, 14 Nov 2016 11:39:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <4dc14e5c-9b43-4338-c7a5-9750e8a9547a@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 14 Nov 2016 10:39:41 +0000 (UTC) Subject: Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Nov 2016 10:39:37 -0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 11/14/16 09:50, Paolo Bonzini wrote: > > > On 14/11/2016 09:17, Laszlo Ersek wrote: >> On 11/13/16 13:51, Fan, Jeff wrote: >>> Laszlo, >>> >>> Thanks your testing. It seems that there is still some unknown issue existing. >>> >>> I suggest to push this serial of patches firstly, because they have >>> big progress to solve the AP crashed issue in >>> https://bugzilla.tianocore.org/show_bug.cgi?id=216. >> >> Sounds good to me. >> >>> I could submit another bug to handle "AP lost" issue. >> >> I hope that Paolo can continue to help us with the KVM trace analysis. > > I will, but it will take a few days. In the meanwhile it would be nice > if you could take a look at using SendSmiIpiAllExcludingSelf() to bridge > the difference between 0xb2 on QEMU and on real hardware. You've tried that: https://www.mail-archive.com/edk2-devel@lists.01.org/msg02840.html https://www.mail-archive.com/edk2-devel@lists.01.org/msg02923.html Do you suggest to make the LocalApicLib instances usable at runtime? For that I think we'll need to cover the LAPIC address range with a runtime-marked EfiMemoryMappedIO area. This can be done in "OvmfPkg/SmmControl2Dxe". Also, we'll need a LocalApicLib instance that registers a callback for SetVirtualAddressMap() and converts the LAPIC base address pointer. Currently BaseXApicX2ApicLib.c's GetLocalApicBaseAddress() function uses the MSR_IA32_APIC_BASE register if it's available -- based on CPUID --, and falls back to PcdCpuLocalApicBaseAddress otherwise. And only PcdCpuLocalApicBaseAddress is what we could replace with the virtual pointer. We can't accommodate a guest OS that reprograms the LAPIC base address. Jeff, what do you think? Anyway, I believe KVM doesn't support moving the LAPIC window; is that right? (Independently, I seem to recall an attack that stole SMRAM accesses by hiding SMRAM with the LAPIC window.) Thanks Laszlo >>> Thus, JIewen's >>> or others' patches could be push as long as they have no additional >>> issue except for "AP Lost:". >> >> I haven't gotten around testing Jiewen's v3 series yet. I think it would >> be best if I could test Jiewen's v3 after this v2 series of yours is >> committed. I'll report back with results. >> >> Thanks >> Laszlo >> >>> >>> I could follow up to fix "AP Lost" issue. >>> >>> Thanks! >>> Jeff >>> >>> >>> -----Original Message----- >>> From: Laszlo Ersek [mailto:lersek@redhat.com] >>> Sent: Saturday, November 12, 2016 3:49 AM >>> To: Fan, Jeff >>> Cc: edk2-devel@ml01.01.org; Yao, Jiewen; Paolo Bonzini >>> Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path >>> >>> On 11/11/16 06:45, Jeff Fan wrote: >>>> On S3 path, we will wake up APs to restore CPU context in >>>> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit >>>> from hlt state and execute the instruction after HLT instruction. >>>> >>>> But APs are not running on safe code, it leads OVMF S3 boot unstable. >>>> >>>> https://bugzilla.tianocore.org/show_bug.cgi?id=216 >>>> >>>> I tested real platform with 64bit DXE. >>>> >>>> v2: >>>> 1. Make stack alignment per Laszlo's comment. >>>> 2. Trim whitespace at end of end per Laszlo's comment. >>>> 3. Update year mark in file header. >>>> 4. Enhancement on InterlockedDecrement() per Paolo's comment. >>>> >>>> Jeff Fan (3): >>>> UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path >>>> UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path >>>> UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code >>>> >>>> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 33 +++++++++++++- >>>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++- >>>> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 15 +++++++ >>>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 63 >>>> ++++++++++++++++++++++++++- >>>> 4 files changed, 136 insertions(+), 4 deletions(-) >>>> >>> >>> Applied this locally to master (ffd6b0b1b65e) for testing. I tested the series with a suspend-resume loop -- not a busy loop, just manually. (So there was always one second or so between adjacent steps.) >>> >>> No crashes or emulation failures, but the "AP going lost" issue remains present -- sometimes Linux cannot bring up one of the four VCPUs after resume. >>> >>> In the Ia32 case, this "AP lost" symptom surfaced after the 6th resume. >>> >>> In the Ia32X64 case, I experienced the symptom after the 89th resume. >>> >>> Thanks >>> Laszlo >>>