From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 395CC81CC5 for ; Wed, 9 Nov 2016 14:28:33 -0800 (PST) Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BD80910799; Wed, 9 Nov 2016 22:28:35 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-54.phx2.redhat.com [10.3.116.54]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA9MSVEk027992; Wed, 9 Nov 2016 17:28:32 -0500 To: Paolo Bonzini , "Yao, Jiewen" References: <1478251854-14660-1-git-send-email-jiewen.yao@intel.com> <08406bf5-4377-63a1-8dd9-34479c015d4b@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C50386C0CB8@shsmsx102.ccr.corp.intel.com> <74D8A39837DF1E4DA445A8C0B3885C50386C10BD@shsmsx102.ccr.corp.intel.com> <3be2f1bf-8c0a-e470-a5c0-a6130b159da5@redhat.com> Cc: "Kinney, Michael D" , "Tian, Feng" , "edk2-devel@ml01.01.org" , "Zeng, Star" , "Fan, Jeff" From: Laszlo Ersek Message-ID: <0e1380dc-85e6-324b-4614-10785d24f499@redhat.com> Date: Wed, 9 Nov 2016 23:28:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <3be2f1bf-8c0a-e470-a5c0-a6130b159da5@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 09 Nov 2016 22:28:36 +0000 (UTC) Subject: Re: [PATCH V2 0/6] Enable SMM page level protection. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2016 22:28:33 -0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit On 11/09/16 16:54, Paolo Bonzini wrote: > > > On 09/11/2016 16:01, Yao, Jiewen wrote: >> 1) CpuS3.c – EarlyInitializeCpu() >> 2) CpuS3.c – SmmRelocateBases() >> 3) CpuS3.c – InitializeCpu() >> 4) S3Resume.c – SendSmiIpiAllExcludingSelf() >> >> I believe we can guarantee 1/2/3 is good, because I found we check BSP >> check mNumberToFinish. >> >> 4 is a risk, because there is no AP finish check. If the AP is in below >> 1M with CR3 in SMRAM, it will be a trouble. >> >> Once the AP executes RSM and return to non-SMM, the CR3 is no longer >> valid and AP must be crashed immediately. WoW! >> >> The fix, I believe, is same. >> >> We should make 1) AP is in above 1M reserved memory, > > Is this because of the NMI case? > >> and 2) AP is in protected mode with paging disabled. > > It is not clear to me what the (4) SIPI done is there for, After reading through your great analysis with a keen focus :), I wanted to ask the exact same thing. I managed to follow / recall the control flow mostly, but when I saw that SMI, I didn't (and don't) understand that it was (is) good for. After all, we're not setting up any request parameters etc. for the processors to handle in SMM. What's happening there? Another question I have -- and I feel I should really know it, but I don't... -- is *why* the APs are executing code from the page at 0x9f000. When the BSP exits SMM, replays the S3 boot script, and finally finishes off the PEI phase and restores the page at 0x9f000, the APs seem to be affected -- but why do they care about that page at all? That page never belonged to PiSmmCpuSmmDxe, it belongs CpuMpPei. I do understand that the CR3 registers for the APs point into SMRAM, while they wait for the BSP in SMM. Thus, the BSP closing/locking down SMRAM, in S3ResumeExecuteBootScript(), breaks the APs -- that's understandable. What I don't get is, again: (1) why S3ResumeExecuteBootScript() raises SMIs at all, before locking down SMRAM, (2) what the AP SMM routine (from PiSmmCpuDxeSmm) has to do with the Wakeup buffer that is allocated and used *solely* by CpuMpPei. I could be utterly and inexcusably wrong, but I think that the RIP=0x9f0fd symptom is a red herring. I wrote, > vcpu#0 vcpu#1 vcpu#2 vcpu#3 > ------ ------ ------ ------ > enter > | > leave > > enter > | > leave > > enter > | > leave > > enter > | > leave > > enter enter > enter | enter | > | | | | > leave | | | > <--------------------------- BAD > enter | | | > | | | | > leave leave leave leave Thanks to Paolo's analysis, we now know where that gap comes from and what it does (so I marked it with BAD now) -- in the gap, the BSP leaves SMM alone, closes/locks SMRAM, finishes off the PEI phase, restores the contents of the borrowed wakeup buffer of CpuMpPei, and even transfers control to Linux's S3 resume vector. I don't understand why we don't get horrible faults on the APs *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM, page tables, executable code, everything, will read as 0xff on QEMU. How can the APs continue in SMM long enough to (a) time out and pull the BSP back into SMM, (b) complete the rendezvous and exit SMM? ... Anyway, I think I do have an idea for question (2). Namely, when the BSP starts executing S3ResumeExecuteBootScript(), in "UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" -- for which the cue is ultimately given by the DXE IPL PEIM, as the last action in PEI --, CpuMpPei has been dispatched already! And, CpuMpPei has placed all the APs into their comfy HLT loops, so that the MP services PPI could serve multiprocessing requests. Thus, the APs are executing code (the HLT loop) from CpuMpPei's wakeup buffer on page 0x9f000 as *normal business*. That is where the SMI, raised by the BSP in S3ResumeExecuteBootScript(), rips them out of. And that's also where KVM tries to return them to, once they finish in SMM and execute RSM. Too bad by the time KVM returns them there, the wakeup page has been restored by the BSP. In other words, the address RIP=0x9f0fd *is* a red herring, that's simply where the APs happened to be when the SMI was raised, and where KVM remembers to return the APs to, once the APs execute RSM. I think I sort of answered question (2). (Apologies if Paolo and Jiewen explained the exact same thing before; I had to spell it out for myself.) That leaves question (1) open. Why enter SMM in S3ResumeExecuteBootScript() at all? Anyway, I think if the BSP and the APs are properly synchronized around the SMI injections in S3ResumeExecuteBootScript(), then this bug is fixed. In that case, the APs' RSMs will restore the full context for the APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup buffer -- but the APs will sleep on), and then Linux will bring up the APs, after taking control. Thanks Laszlo > and why it is > triggered in S3Resume.c rather than CpuS3.c. And why does it take so > much for APs to complete it? > > That said, by the time you close and lock SMRAM, you aren't even sure > that you have reached the cli;hlt loop in the rendezvous funnel. In > practice you will be there, but there is still a theoretical race. > > InterlockedDecrement (&mNumberToFinish) should be moved from > EarlyMPRendezvousProcedure/MPRendezvousProcedure to GoToSleep, and > GoToSleep should leave 64-bit mode before doing it. This will fix the > S3 bug as well. It's only needed for 64-bit mode, but it is doable for > the Ia32 version as well. > > Perhaps EarlyMPRendezvousProcedure and MPRendezvousProcedure can return > &mNumberToFinish; what do you think? > > Paolo > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel >