From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <lersek@redhat.com>
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 69A6A81C93
 for <edk2-devel@ml01.01.org>; Wed,  9 Nov 2016 15:27:10 -0800 (PST)
Received: from int-mx14.intmail.prod.int.phx2.redhat.com
 (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id 48A7864C3;
 Wed,  9 Nov 2016 23:27:13 +0000 (UTC)
Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-54.phx2.redhat.com
 [10.3.116.54])
 by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id
 uA9NRAB0010847; Wed, 9 Nov 2016 18:27:11 -0500
To: Paolo Bonzini <pbonzini@redhat.com>
References: <1478251854-14660-1-git-send-email-jiewen.yao@intel.com>
 <08406bf5-4377-63a1-8dd9-34479c015d4b@redhat.com>
 <74D8A39837DF1E4DA445A8C0B3885C50386C0CB8@shsmsx102.ccr.corp.intel.com>
 <b49610a8-07c0-a2c2-9ee6-73b644983529@redhat.com>
 <74D8A39837DF1E4DA445A8C0B3885C50386C10BD@shsmsx102.ccr.corp.intel.com>
 <3be2f1bf-8c0a-e470-a5c0-a6130b159da5@redhat.com>
 <0e1380dc-85e6-324b-4614-10785d24f499@redhat.com>
 <805535301.11770261.1478732376111.JavaMail.zimbra@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>,
 Michael D Kinney <michael.d.kinney@intel.com>,
 Feng Tian <feng.tian@intel.com>, edk2-devel@ml01.01.org,
 Star Zeng <star.zeng@intel.com>, Jeff Fan <jeff.fan@intel.com>
From: Laszlo Ersek <lersek@redhat.com>
Message-ID: <b8c19b4d-9d1a-c358-47b9-73e957d8798d@redhat.com>
Date: Thu, 10 Nov 2016 00:27:10 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <805535301.11770261.1478732376111.JavaMail.zimbra@redhat.com>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.5.110.38]); Wed, 09 Nov 2016 23:27:13 +0000 (UTC)
Subject: Re: [PATCH V2 0/6] Enable SMM page level protection.
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Nov 2016 23:27:10 -0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

On 11/09/16 23:59, Paolo Bonzini wrote:
> 
>> Another question I have -- and I feel I should really know it, but I
>> don't... -- is *why* the APs are executing code from the page at
>> 0x9f000.
> 
> This I can answer. :)
> 
> The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
> loop.  When the AP exits SMM, it is in the JMP instruction.
> 
> As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
> in the 0-640K area (perhaps it could be in what your doc calls the
> "permanent PEI memory for the S3 resume path"?).  After thinking a
> bit more about it, it seems simplest to me if CpuS3.c just uses
> SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
> to jump to a small stub like
> 
>     POP EAX   ; pop return address
>     POP EAX   ; pop Context1 which is &mNumberToFinish
>     DEC [EAX]
>  1: CLI
>     HLT
>     JMP 1
> 
>> I could be utterly and inexcusably wrong, but I think that the
>> RIP=0x9f0fd symptom is a red herring.
> 
> I wouldn't call it a red herring.  After all, CR3 points to SMM
> exactly because the CR3 that was set up for the 0x9f000 stub is
> CpuS3.c's SMRAM page table root.

Hrmpf. The stub at 0x9f000 does not belong to PiSmmCpuDxeSmm. Regardless
of the boot path (normal boot or S3 resume), it belongs to CpuMpPei, and
it partakes in the implementation of the MP services PPI. It is
practically the "parking lot" for the APs when they are not executing
any MP job, submitted by an MP services PPI client.

So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).

(*) For example, OVMF's PlatformPei uses this service to program
MSR_IA32_FEATURE_CONTROL from fw_cfg. On the resume path too, that
occurs before we do the SMBASE relocation.

(I.e., before S3RestoreConfig2() in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" calls
SmmRestoreCpu() in "UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c", via
SmmS3ResumeState->SmmS3ResumeEntryPoint.)

When an AP executes RSM, its CR3 should automatically be restored to the
original (non-SMM) value, should it not? I mean I do remember the CR3
value from the QEMU register dump, but now I don't understand how that's
possible with SMM=0.

Sorry if I'm being dense :)

> What _is_ a red herring is KVM's trace showing a RSM instruction
> at RIP=0x9f0fd.  That is clearly bogus, RSM was rather the last
> instruction executed _before_ getting to that RIP.
> 
>>>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>>>       ------  ------  ------  ------
>>>               enter           enter
>>>        enter    |     enter     |
>>>          |      |       |       |
>>>        leave    |       |       |
>>>             <--------------------------- BAD
>>>        enter    |       |       |
>>>          |      |       |       |
>>>        leave  leave   leave   leave
>>
>> I don't understand why we don't get horrible faults on the APs
>> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
>> page tables, executable code, everything, will read as 0xff on QEMU. How
>> can the APs continue in SMM long enough to
>>
>> (a) time out and pull the BSP back into SMM,
>> (b) complete the rendezvous and exit SMM?
> 
> Because the "0xff" only applies when you're out of SMM.  The three
> states (open, closed, closed/locked) only apply when you're not in SMM.
> While the AP is in SMM they are executing in a separate address space
> where SMRAM is "not closed".  (In QEMU that's a separate AddressSpace
> struct, smram_address_space in target-i386/kvm.c).

Sigh, in retrospect, this should have been obvious. :) Thanks for
pointing it out!

Laszlo

>> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
>> explained the exact same thing before; I had to spell it out for
>> myself.) That leaves question (1) open. Why enter SMM in
>> S3ResumeExecuteBootScript() at all?
>>
>> Anyway, I think if the BSP and the APs are properly synchronized around
>> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
>> fixed. In that case, the APs' RSMs will restore the full context for the
>> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
>> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
>> buffer -- but the APs will sleep on), and then Linux will bring up the
>> APs, after taking control.
> 
> Agreed.
> 
> Paolo
>