From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: mx.groups.io;
 dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com)
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by groups.io with SMTP; Thu, 03 Oct 2019 02:06:06 -0700
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id BE51C3090FDE;
	Thu,  3 Oct 2019 09:06:05 +0000 (UTC)
Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-154.rdu2.redhat.com [10.10.120.154])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 66B21608A5;
	Thu,  3 Oct 2019 09:06:03 +0000 (UTC)
Subject: Re: [edk2-devel] [RFC PATCH v2 10/44] OvmfPkg: A per-CPU variable area for #VC usage
To: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>,
 "devel@edk2.groups.io" <devel@edk2.groups.io>,
 "Singh, Brijesh" <brijesh.singh@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Michael D Kinney <michael.d.kinney@intel.com>,
 Liming Gao <liming.gao@intel.com>, Eric Dong <eric.dong@intel.com>,
 Ray Ni <ray.ni@intel.com>
References: <cover.1568922728.git.thomas.lendacky@amd.com>
 <a37a04dda4b3913ede0db67bf0e5f0cb70343a39.1568922728.git.thomas.lendacky@amd.com>
 <280a8459-6258-5b04-8ecc-125d7d991d21@redhat.com>
 <83eb0051-9cc1-bd53-933b-2bce2e7fd826@amd.com>
From: "Laszlo Ersek" <lersek@redhat.com>
Message-ID: <400325a4-f4ba-255f-c4fa-b84bcaa65584@redhat.com>
Date: Thu, 3 Oct 2019 11:06:02 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <83eb0051-9cc1-bd53-933b-2bce2e7fd826@amd.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Thu, 03 Oct 2019 09:06:05 +0000 (UTC)
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit

On 10/02/19 18:06, Lendacky, Thomas wrote:
> On 10/2/19 6:51 AM, Laszlo Ersek wrote:

>> ... Side question: actually, do we support S3 with SEV enabled, at the
>> moment? Last week or so I tried to test it, and it didn't work. I don't
>> remember if we *intended* to support S3 in SEV guests at all. If we
>> never cared, then we should document that, plus I shouldn't make the
>> SEV-ES work needlessly difficult with S3 remarks... Brijesh, what's your
>> recollection?
>>
>> If the intent has always been to ignore S3 in SEV guests, then we should
>> modify the S3Verification() function to catch QEMU configs where both
>> features are enabled, and force the user to disable at least one of
>> them. Otherwise, the user might suspend the OS to S3, and then lose data
>> when resume fails. In such cases, the user should be forced -- during
>> early boot -- to explicitly disable S3 on the QEMU cmdline, and to
>> re-launch the guest. And then the OS won't ever attempt S3.
>>
>> Hm.... I've now found some internal correspondence at Red Hat, from Aug
>> 2017. I wrote,
>>
>>> With SEV enabled, the S3 boot script would have to manipulate page
>>> tables (which might require more memory pre-allocation), in order to
>>> continue using the currently pre-reserved memory areas for guest-host
>>> communication during S3 resume.
> 
> I guess I need to understand more about this. Does the page table
> manipulation occur in the guest or hypervisor? If in the guest, then that
> is ok. But the page tables can't be successfully manipulated by the
> hypervisor.

It's all on the guest side. That's not the issue, the issue is (again)
complexity, and possibly also the limited expressiveness of the S3 boot
script "language" (the set of opcodes).

Roughly, this is the story: during normal boot, DXE drivers locate the
S3 Save State protocol, and call it to append a number of opcodes to the
S3 boot script. These opcodes allow platform device drivers to "stash"
various chipset programming actions for S3 resume time.

At a certain point in BDS (when the DXE SMM Ready To Lock protocol is
installed by Platform BDS), the boot script is saved into secure storage
(a "lock box" in SMRAM). Furthermore, BootScriptExecutorDxe saves itself
(the executable) into another lock box.

At S3 resume time, at the end of the PEI phase, S3Resume2Pei restores
BootScriptExecutorDxe from SMRAM, restores the boot script, and invokes
BootScriptExecutorDxe to execute the boot script; thereby re-programming
various chipset registers (as queued by platform DXE drivers during
normal boot). Finally control is transferred to the OS waking vector
(per ACPI FACS).

A number of platform drivers in OVMF queue boot script opcodes such that
those opcodes implement fw_cfg actions (fw_cfg DMA transfers) during S3.
Queueing these opcodes is very messy, therefore OVMF has a helper
library for that, QemuFwCfgS3Lib.

For the fw_cfg DMA transfers, the underlying pages need to be decrypted
& re-encrypted, as always. During normal boot, QemuFwCfgLib handles this:

- In the SEC and PEI phases, QemuFwCfgLib uses the IO port access
method, which is slower, and does not support fw_cfg writes. But for
SEC/PEI, that's enough.

- In DXE, QemuFwCfgLib uses the DMA access method, which is faster,
supports fw_cfg writes. It is SEV-aware, and uses the IOMMU protocol for
decrypting / encrypting the relevant pages.

The S3 boot script opcodes saved by QemuFwCfgS3Lib must use fw_cfg DMA
(because they need fw_cfg writes too, which are only supported by the
DMA method), and so they'd need extra page table actions in SEV guests.
But those page table actions appear difficult to express through S3 boot
script opcodes.

Anyway, this is just some background info; I'm certainly not suggesting
that we spend *any* resources on enabling S3 for SEV. SEV (not SEV-ES)
has been available in OVMF for a good while now, and we've seen no
reports related to S3. For another data point, S3 has been en-bloc
unsupported in RHEL downstream, regardless of SEV (it is disabled in
downstream QEMU by default, and if you force-enable it, that will
"taint" the domain). Mainly due to S3 depending very much on guest
driver cooperation (primarily video drivers), and that has been brittle,
in our experience.

So in the end I think we should update S3Verification() to catch S3+SEV
configs.

> 
>>>
>>> This kind of page table manipulation is very difficult to do with the
>>> currently specified / standardized boot script opcodes.
>>> EFI_BOOT_SCRIPT_DISPATCH_2_OPCODE *might* prove usable to call custom
>>> code during S3 resume, from the boot script, but the callee seems to
>>> need a custom assembly trampoline, and likely some magic for code
>>> relocation too (or the code must be position independent). One example
>>> seems to exist in the edk2 tree, but for OVMF this is uncharted
>>> territory.
>>
>> And then the participants in that discussion seemed to set S3+SEV aside,
>> indefinitely.
>>
>> ... I've also found some S3 references in the following blurb:
>>
>>
>> http://mid.mail-archive.com/1499351394-1175-1-git-send-email-brijesh.singh@amd.com
>>
>> We ended up not adding any SEV-related code to
>> "OvmfPkg/Library/QemuFwCfgS3Lib", so I think S3 must have remained out
>> of scope.
> 
> Brijesh commented in the referenced link that he was able to do
> suspend/resume successfully. It's possible that some later changes caused
> that to fail?

It's possible that the basic S3 resume machinery, described above,
works, as long as you don't try to set up fw_cfg DMA through boot script
opcodes. However, those fw_cfg actions are important; for example,
"broadcast SMI" is configured through them.

> Maybe we need to understand how you did your S3 test vs. how Brijesh did
> his.
> 
>>
>> If we agree now that S3 is out of scope (for both SEV and SEV-ES), then:
>>
>> - I think we should ignore all S3-related code paths in this series,
>>
>> - we should drop patches already written for S3 (sorry about that!),
>>
>> - we should extend S3Verification() like described above.
> 
> It's probably worth doing this as the only S3-related patch in this series
> until we understand the complete SEV-ES / S3 requirements.

I agree.

> I'm a bit
> hesitant to include base SEV in this until we discuss some more.

If I understand correctly, you suggest to check SEV-ES enablement in
S3Verification(), but not SEV enablement. Is that right?

I would disagree about that. Broadcast SMI negotiation through fw_cfg is
important. It has been enabled since QEMU 2.9. I strongly recommend
using OVMF only like that, when built with -D SMM_REQUIRE.

If OVMF is built without -D SMM_REQUIRE, then the most important fw_cfg
DMA transfer is out of the picture (see above), and I'm slightly
inclined to agree with you. However, at least one other use case
remains, for fw_cfg DMA, at S3 resume.

Namely, the ACPI linker/loader script has a command type called
QEMU_LOADER_WRITE_POINTER. (See the documentation in
"AcpiPlatformDxe/QemuLoader.h".) Minimally the "vmgenid" platform device
of QEMU depends on the firmware executing this ACPI linker/loader
command, and the command has to be re-run at S3 resume too.

OvmfPkg/AcpiPlatformDxe runs these commands first at normal boot, like
all the other ACPI linker/loader commands. But, specifically for
QEMU_LOADER_WRITE_POINTER, the driver creates a "condensed"
representation too, which is then replayed at S3 resume time, through
boot script opcodes that use fw_cfg DMA.

>> I apologize if my reviews are a bit incoherent; I can track only so many
>> things in parallel :(
> 
> No worries, they're not!

Thank you :)
Laszlo