From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx5-phx2.redhat.com (mx5-phx2.redhat.com [209.132.183.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 6C3D681C8A for ; Wed, 9 Nov 2016 03:23:32 -0800 (PST) Received: from zmail13.collab.prod.int.phx2.redhat.com (zmail13.collab.prod.int.phx2.redhat.com [10.5.83.15]) by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA9BNXFv048390; Wed, 9 Nov 2016 06:23:33 -0500 Date: Wed, 9 Nov 2016 06:23:31 -0500 (EST) From: Paolo Bonzini To: Laszlo Ersek Cc: Jiewen Yao , edk2-devel@ml01.01.org, Michael D Kinney , Feng Tian , Jeff Fan , Star Zeng Message-ID: <867102584.11653909.1478690611541.JavaMail.zimbra@redhat.com> In-Reply-To: <08406bf5-4377-63a1-8dd9-34479c015d4b@redhat.com> References: <1478251854-14660-1-git-send-email-jiewen.yao@intel.com> <08406bf5-4377-63a1-8dd9-34479c015d4b@redhat.com> MIME-Version: 1.0 X-Originating-IP: [10.4.164.1, 10.5.101.130] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - FF49 (Linux)/8.0.6_GA_5922) Thread-Topic: Enable SMM page level protection. Thread-Index: KQWMDEodUYoSJIG8Qu8AtnU/w9DBUQ== Subject: Re: [PATCH V2 0/6] Enable SMM page level protection. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2016 11:23:32 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit > * Second, the instruction that causes things to blow up is <0f aa>, > i.e., RSM. I have absolutely no clue why RSM is executed: It's probably not RSM. RSM is probably the last instruction executed before, and it's still in the buffer because, as you said, there's no way that you can fetch an instruction while CR3 points into SMM. My first thought was that the MMU is for some reason out of contact with reality, but actually the CR3 write is correct: CPU-24446 [002] 39841.871040: kvm_exit: reason CR_ACCESS rip 0x9f05e info 103 0 CPU-24446 [002] 39841.871040: kvm_cr: cr_write 3 = 0x7ff7f000 and it's coming from the stub as well. So the second thought was that the wakeup buffer has the wrong CR3 put into the wakeup buffer's Cr3 location. I'm glad I kept looking because it was much more entertaining. Especially knowing that I (probably) will not have to fix it. :) The basic idea for debugging was to look for interesting events and use 0x402 writes to correlate them to the debug log. For example, most accesses to 0x9f??? are obviously not traced by KVM, but the first ones are: 31519- CPU-24444 [006] 39841.783344: kvm_exit: reason EPT_VIOLATION rip 0x855d82 info 181 0 31520: CPU-24444 [006] 39841.783344: kvm_page_fault: address 9f000 error_code 181 280224- CPU-24444 [006] 39841.860940: kvm_exit: reason EPT_VIOLATION rip 0x7ffd0d15 info 182 0 280225: CPU-24444 [006] 39841.860940: kvm_page_fault: address 9f000 error_code 182 (The number is just the line number in the trace). Luckily your machine didn't have EPT accessed/dirty bits, so KVM trapped both the first read and the first write. The read is at WakeupBufferStart = 9F000, WakeupBufferSize = 1000 but it's not too interesting. The second is a good one to start debugging because it's from SMRAM (though not from SMM, since the first kvm_enter_smm happens later at 305930). So it makes sense that it writes an SMRAM CR3. There is a write to the debug log just before, at 279993, and it writes "SmmRestoreCpu()". As expected, the write is followed by a flurry of MSR writes, the APIC programming at 280131, so I am pretty sure that the write to mExchangeInfo->Cr3 comes from PrepareApStartupVector. FWIW, I first looked at the call chain up from BackupAndPrepareWakeupBuffer, but that led me nowhere for an hour. So I was a bit lucky indeed. :) Anyhow, SmmRestoreCpu is the SmmS3ResumeEntryPoint for S3Resume2Pei, and indeed, earlier in the log you have this debugging output from S3Resume2Pei: SMM S3 CR3 = 7FF7F000 Doh, maybe I should have looked at the log before the trace. Who knows. Anyway, the SMM_S3_RESUME_STATE is initialized by InitSmmS3ResumeState, so the CR3 is the one that is initialized by InitSmmS3Cr3 in UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c. At this point I was still thinking that this CR3 was wrong, but by looking at the places where SMM is entered, and correlating that with debug log writes, the puzzle was relatively easy to solve: 1) SMBASE relocation, done by SmmRestoreCpu: 305930: CPU-24445 [005] 39841.871264: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000 306000: CPU-24445 [005] 39841.871318: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 306051: CPU-24446 [002] 39841.871349: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000 306108: CPU-24446 [002] 39841.871390: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 306161: CPU-24447 [004] 39841.871421: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000 306218: CPU-24447 [004] 39841.871463: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 306254: CPU-24444 [006] 39841.871473: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000 306311: CPU-24444 [006] 39841.871512: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 2) S3ResumeExecuteBootScript (again, the previous 0x402 write ends at 334597 and promptly gives us a clue): 334698: CPU-24445 [005] 39841.882706: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000 334699: CPU-24447 [004] 39841.882706: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000 334741: CPU-24444 [006] 39841.882723: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 334742: CPU-24446 [002] 39841.882724: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000 334875: CPU-24444 [006] 39841.882755: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 Here I think that it's where things go awry. The lines after S3ResumeExecuteBootScript() are Close all SMRAM regions before executing boot script Lock all SMRAM regions before executing boot script and indeed the first is at 334898, immediately after VCPU0 leaves SMM. But, closing and locking of SMRAM happens while the APs are still in SMM! The BSP instead goes on merrily and, after the debug log has "PeiMpInitLib: CpuMpEndOfPeiCallback () invoked" (0x402 write ends at 364869) we have another access to 0x9f000, this time a write. It's RestoreWakeupBuffer: 364908- CPU-24444 [006] 39841.890320: kvm_exit: reason EPT_VIOLATION rip 0x855d82 info 182 0 364909: CPU-24444 [006] 39841.890320: kvm_page_fault: address 9f000 error_code 182 Again VCPUs 1..3 are still in SMM, but the BSP couldn't care less. :) We're only 35% through the trace but we're actually close to the end. At 365704 OVMF says it's transferring control to the Linux's wakeup vector, and Linux takes control real soon: 365805: CPU-24444 [006] 39841.890477: kvm_exit: reason CR_ACCESS rip 0x9aec5 info 4 0 365807: CPU-24444 [006] 39841.890477: kvm_cr: cr_write 4 = 0xb0 365817: CPU-24444 [006] 39841.890479: kvm_entry: vcpu 0 We don't even need to look closer at what happens after this point, as we can imagine that the APs are just waiting for something to happen. But if you do look, all you see is reads to the PMTimer, which makes sense. And a while after, once they are fed up, they bring VCPU 0 back to SMM: 994855 CPU-24446 [000] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200 994856 CPU-24447 [002] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200 994857 CPU-24445 [005] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200 994858 CPU-24446 [000] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst) 994859 CPU-24445 [005] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst) 994860 CPU-24447 [002] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst) 994861 CPU-24446 [000] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge) 994862 CPU-24445 [005] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge) 994863 CPU-24447 [002] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge) The rendezvous completes, the APs can finally leave SMM but all they can do is meet their fate and crash horribly: 994869 CPU-24444 [006] 39841.982776: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a9548 info 0 800000fd ... 994880 CPU-24444 [006] 39841.982777: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 995135: CPU-24444 [006] 39841.982821: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 995136: CPU-24445 [005] 39841.982821: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 995137: CPU-24446 [000] 39841.982821: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 995138: CPU-24447 [002] 39841.982821: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 995148: CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL 995152: CPU-24446 [000] 39841.982828: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL I hope you enjoyed it more than the poor APs. :) Paolo > (a) in the area that used to host the AP startup routine for the MP > services PPI -- note that we also have "Transfer to 16bit OS waking > vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed > area completeley! --, > > (b) and why *after* all four VCPUs have just left SMM, together. > > * The RSM instruction is handled successfully elsewhere, for example > when all four VCPUs leave SMM, at the bottom of the diagram above: > > > CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa > > CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa > > CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa > > CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa > > * The guest-phys address 7ff7f000 that we see just before the error: > > > CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 > > error_code 83 > > CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 > > error_code 83 > > CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa > > CPU-24444 [006] 39841.982827: kvm_exit: reason > > EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc > > CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL > > CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason > > KVM_EXIT_INTERNAL_ERROR (17) > > can be found higher up in the trace; namely, it is written to CR3 > several times. It's the root of the page tables. > > * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM. > > * I also tried the "info tlb" monitor command, via "virsh > qemu-monitor-command --hmp", while the guest was auto-paused after the > crash. > > I cannot provide results: QEMU appeared to return a message that would > be longer than 16MB after encoding by libvirt, and libvirt rejected > that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX). > > Anyway, the KVM trace, and the QEMU register dump, look consistent > with what Paolo said about "Code=?? ?? ??...": > > The question marks usually mean that the page tables do not map a > page at that address. > > CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM > (SMM=0). We can't translate *any* guest-virtual address, as we can't > even begin walking the page tables. > > Thanks > Laszlo >