From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx6-phx2.redhat.com (mx6-phx2.redhat.com [209.132.183.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 1A63081CC5 for ; Wed, 9 Nov 2016 14:59:35 -0800 (PST) Received: from zmail13.collab.prod.int.phx2.redhat.com (zmail13.collab.prod.int.phx2.redhat.com [10.5.83.15]) by mx6-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA9Mxajf062018; Wed, 9 Nov 2016 17:59:36 -0500 Date: Wed, 9 Nov 2016 17:59:36 -0500 (EST) From: Paolo Bonzini To: Laszlo Ersek Cc: Jiewen Yao , Michael D Kinney , Feng Tian , edk2-devel@ml01.01.org, Star Zeng , Jeff Fan Message-ID: <805535301.11770261.1478732376111.JavaMail.zimbra@redhat.com> In-Reply-To: <0e1380dc-85e6-324b-4614-10785d24f499@redhat.com> References: <1478251854-14660-1-git-send-email-jiewen.yao@intel.com> <08406bf5-4377-63a1-8dd9-34479c015d4b@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C50386C0CB8@shsmsx102.ccr.corp.intel.com> <74D8A39837DF1E4DA445A8C0B3885C50386C10BD@shsmsx102.ccr.corp.intel.com> <3be2f1bf-8c0a-e470-a5c0-a6130b159da5@redhat.com> <0e1380dc-85e6-324b-4614-10785d24f499@redhat.com> MIME-Version: 1.0 X-Originating-IP: [10.4.164.1, 10.5.101.130] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - FF49 (Linux)/8.0.6_GA_5922) Thread-Topic: Enable SMM page level protection. Thread-Index: ImMBHFN9hpS/rsXSuXFY3yCZE/5mFQ== Subject: Re: [PATCH V2 0/6] Enable SMM page level protection. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2016 22:59:35 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit > Another question I have -- and I feel I should really know it, but I > don't... -- is *why* the APs are executing code from the page at > 0x9f000. This I can answer. :) The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP loop. When the AP exits SMM, it is in the JMP instruction. As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_ in the 0-640K area (perhaps it could be in what your doc calls the "permanent PEI memory for the S3 resume path"?). After thinking a bit more about it, it seems simplest to me if CpuS3.c just uses SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure, to jump to a small stub like POP EAX ; pop return address POP EAX ; pop Context1 which is &mNumberToFinish DEC [EAX] 1: CLI HLT JMP 1 > I could be utterly and inexcusably wrong, but I think that the > RIP=0x9f0fd symptom is a red herring. I wouldn't call it a red herring. After all, CR3 points to SMM exactly because the CR3 that was set up for the 0x9f000 stub is CpuS3.c's SMRAM page table root. What _is_ a red herring is KVM's trace showing a RSM instruction at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last instruction executed _before_ getting to that RIP. > > vcpu#0 vcpu#1 vcpu#2 vcpu#3 > > ------ ------ ------ ------ > > enter enter > > enter | enter | > > | | | | > > leave | | | > > <--------------------------- BAD > > enter | | | > > | | | | > > leave leave leave leave > > I don't understand why we don't get horrible faults on the APs > *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM, > page tables, executable code, everything, will read as 0xff on QEMU. How > can the APs continue in SMM long enough to > > (a) time out and pull the BSP back into SMM, > (b) complete the rendezvous and exit SMM? Because the "0xff" only applies when you're out of SMM. The three states (open, closed, closed/locked) only apply when you're not in SMM. While the AP is in SMM they are executing in a separate address space where SMRAM is "not closed". (In QEMU that's a separate AddressSpace struct, smram_address_space in target-i386/kvm.c). > I think I sort of answered question (2). (Apologies if Paolo and Jiewen > explained the exact same thing before; I had to spell it out for > myself.) That leaves question (1) open. Why enter SMM in > S3ResumeExecuteBootScript() at all? > > Anyway, I think if the BSP and the APs are properly synchronized around > the SMI injections in S3ResumeExecuteBootScript(), then this bug is > fixed. In that case, the APs' RSMs will restore the full context for the > APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup > buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup > buffer -- but the APs will sleep on), and then Linux will bring up the > APs, after taking control. Agreed. Paolo