From: Laszlo Ersek <lersek@redhat.com>
To: Jeff Fan <jeff.fan@intel.com>
Cc: edk2-devel@ml01.01.org, Jiewen Yao <jiewen.yao@intel.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
Date: Thu, 10 Nov 2016 11:41:27 +0100 [thread overview]
Message-ID: <0528a12e-3755-99cb-861a-ac927d484ec1@redhat.com> (raw)
In-Reply-To: <20161110060708.13932-1-jeff.fan@intel.com>
On 11/10/16 07:07, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
> execute the instruction after HLT instruction.
>
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>
> I tested real platform with 64bit DXE.
>
> Jeff Fan (2):
> UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
> UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>
> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 +++++++++++++++++++++++++++
> 4 files changed, 128 insertions(+)
>
I applied this on top of Jiewen's v2, for testing.
This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot.
The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS:
# this works, quickly
taskset -c 0 efibootmgr
# this fails
taskset -c 1 efibootmgr
taskset: failed to set pid 0's affinity: Invalid argument
# these work again, albeit more slowly (as expected)
taskset -c 2 efibootmgr
taskset -c 3 efibootmgr
I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied).
If I run the "info cpus" QEMU command, I get:
* CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745
CPU #1: pc=0x00000000fffffff0 thread_id=22746
CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747
CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748
The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)?
The gust kernel dmesg contains the following messages:
> [ 55.805153] PM: Restoring platform NVS memory
> [ 55.805153] Enabling non-boot CPUs ...
> [ 55.805153] x86: Booting SMP configuration:
> [ 55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [ 65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE
> [ 65.816738] Error taking CPU1 up: -5
> [ 65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2
> [ 65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock
> [ 65.817029] kvm: enabling virtualization on CPU2
> [ 65.832296] KVM setup async PF for cpu 2
> [ 65.832607] kvm-stealtime: cpu 2, msr 17fd0e100
> [ 65.833031] CPU2 is up
> [ 65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock
> [ 65.833229] kvm: enabling virtualization on CPU3
> [ 65.848594] KVM setup async PF for cpu 3
> [ 65.848940] kvm-stealtime: cpu 3, msr 17fd8e100
> [ 65.849393] CPU3 is up
> [ 65.849722] ACPI: Waking up from system sleep state S3
Note the 10 second gap where I put the marker (and the error message itself, too).
Here's an excerpt from the KVM trace:
> CPU-23509 [002] 8406.908787: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000
> CPU-23509 [002] 8406.908836: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000
> CPU-23510 [003] 8406.908850: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000
> CPU-23510 [003] 8406.908881: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000
> CPU-23511 [001] 8406.908908: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000
> CPU-23511 [001] 8406.908941: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000
> CPU-23508 [005] 8406.908951: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000
> CPU-23508 [005] 8406.908989: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
> CPU-23511 [001] 8406.920215: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000
> CPU-23509 [002] 8406.920225: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000
> CPU-23510 [003] 8406.920225: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000
> CPU-23508 [005] 8406.920227: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000
> CPU-23508 [005] 8406.920262: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
> CPU-23511 [001] 8406.920263: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000
> CPU-23508 [005] 8407.020292: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000
> CPU-23509 [006] 8407.020338: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000
> CPU-23510 [003] 8407.020338: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000
> CPU-23508 [005] 8407.020338: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM.
So this series is a clear improvement, but something else remains amiss.
If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist:
- With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume,
- With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume.
Thanks
Laszlo
next prev parent reply other threads:[~2016-11-10 10:41 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-10 6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan
2016-11-10 6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
2016-11-10 8:50 ` Laszlo Ersek
2016-11-10 9:00 ` Fan, Jeff
2016-11-10 9:30 ` Laszlo Ersek
2016-11-10 6:07 ` [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan
2016-11-10 8:56 ` [PATCH 0/2] Put AP into safe hlt-loop code " Laszlo Ersek
2016-11-10 9:59 ` Paolo Bonzini
2016-11-11 6:32 ` Fan, Jeff
2016-11-10 10:41 ` Laszlo Ersek [this message]
2016-11-10 11:17 ` Yao, Jiewen
2016-11-10 12:08 ` Laszlo Ersek
2016-11-10 20:45 ` Laszlo Ersek
2016-11-10 12:26 ` Paolo Bonzini
2016-11-10 13:33 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0528a12e-3755-99cb-861a-ac927d484ec1@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox