From: "Laszlo Ersek" <lersek@redhat.com>
To: tobin@linux.ibm.com
Cc: devel@edk2.groups.io, dovmurik@linux.vnet.ibm.com,
Dov.Murik1@il.ibm.com, ashish.kalra@amd.com,
brijesh.singh@amd.com, tobin@ibm.com, david.kaplan@amd.com,
jon.grimm@amd.com, thomas.lendacky@amd.com, jejb@linux.ibm.com,
frankeh@us.ibm.com,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [edk2-devel] RFC: Fast Migration for SEV and SEV-ES - blueprint and proof of concept
Date: Tue, 3 Nov 2020 15:59:38 +0100 [thread overview]
Message-ID: <933a5d2b-a495-37b9-fe8b-243f9bae24d5@redhat.com> (raw)
In-Reply-To: <c5d3f84e-2c11-3e49-4ab2-f4d2c2b095d4@linux.ibm.com>
Hi Tobin,
(keeping full context -- I'm adding Dave)
On 10/28/20 20:31, Tobin Feldman-Fitzthum wrote:
> Hello,
>
> Dov Murik. James Bottomley, Hubertus Franke, and I have been working on
> a plan for fast live migration of SEV and SEV-ES (and SEV-SNP when it's
> out and even hopefully Intel TDX) VMs. We have developed an approach
> that we believe is feasible and a demonstration that shows our solution
> to the most difficult part of the problem. In short, we have implemented
> a UEFI Application that can resume from a VM snapshot. We think this is
> the crux of SEV-ES live migration. After describing the context of our
> demo and how it works, we explain how it can be extended to a full
> SEV-ES migration. Our goal is to show that fast SEV and SEV-ES live
> migration can be implemented in OVMF with minimal kernel changes. We
> provide a blueprint for doing so.
>
> Typically the hypervisor facilitates live migration. AMD SEV excludes
> the hypervisor from the trust domain of the guest. When a hypervisor
> (HV) examines the memory of an SEV guest, it will find only a
> ciphertext. If the HV moves the memory of an SEV guest, the ciphertext
> will be invalidated. Furthermore, with SEV-ES the hypervisor is largely
> unable to access guest CPU state. Thus, fast migration of SEV VMs
> requires support from inside the trust domain, i.e. the guest.
>
> One approach is to add support for SEV Migration to the Linux kernel.
> This would allow the guest to encrypt/decrypt its own memory with a
> transport key. This approach has met some resistance. We propose a
> similar approach implemented not in Linux, but in firmware, specifically
> OVMF. Since OVMF runs inside the guest, it has access to the guest
> memory and CPU state. OVMF should be able to perform the manipulations
> required for live migration of SEV and SEV-ES guests.
>
> The biggest challenge of this approach involves migrating the CPU state
> of an SEV-ES guest. In a normal (non-SEV migration) the HV sets the CPU
> state of the target before the target begins executing. In our approach,
> the HV starts the target and OVMF must resume to whatever state the
> source was in. We believe this to be the crux (or at least the most
> difficult part) of live migration for SEV and we hope that by
> demonstrating resume from EFI, we can show that our approach is
> generally feasible.
>
> Our demo can be found at <https://github.com/secure-migration>. The
> tooling repository is the best starting point. It contains documentation
> about the project and the scripts needed to run the demo. There are two
> more repos associated with the project. One is a modified edk2 tree that
> contains our modified OVMF. The other is a modified qemu, that has a
> couple of temporary changes needed for the demo. Our demonstration is
> aimed only at resuming from a VM snapshot in OVMF. We provide the source
> CPU state and source memory to the destination using temporary plumbing
> that violates the SEV trust model. We explain the setup in more depth in
> README.md. We are showing only that OVMF can resume from a VM snapshot.
> At the end we will describe our plan for transferring CPU state and
> memory from source to guest. To be clear, the temporary tooling used for
> this demo isn't built for encrypted VMs, but below we explain how this
> demo applies to and can be extended to encrypted VMs.
>
> We Implemented our resume code in a very similar fashion to the
> recommended S3 resume code. When the HV sets the CPU state of a guest,
> it can do so when the guest is not executing. Setting the state from
> inside the guest is a delicate operation. There is no way to atomically
> set all of the CPU state from inside the guest. Instead, we must set
> most registers individually and account for changes in control flow that
> doing so might cause. We do this with a three-phase trampoline. OVMF
> calls phase 1, which runs on the OVMF map. Phase 1 sets up phase 2 and
> jumps to it. Phase 2 switches to an intermediate map that reconciles the
> OVMF map and the source map. Phase 3 switches to the source map,
> restores the registers, and returns into execution of the source. We
> will go backwards through these phases in more depth.
>
> The last thing that resume to EFI does is return. Specifically, we use
> IRETQ, which reads the values of RIP, CS, RFLAGS, RSP, and SS from a
> temporary stack and restores them atomically, thus returning to source
> execution. Prior to returning, we must manually restore most other
> registers to the values they had on the source. One particularly
> significant register is CR3. When we return to Linux, CR3 must be set to
> the source CR3 or the first instruction executed in Linux will cause a
> page fault. The code that we use to restore the registers and return
> must be mapped in the source page table or we would get a page fault
> executing the instructions prior to returning into Linux. The value of
> CR3 is so significant, that it defines the three phases of the
> trampoline. Phase 3 begins when CR3 is set to the source CR3. After
> setting CR3, we set all the other registers and return.
>
> Phase 2 mainly exists to setup phase 3. OVMF uses a 1-1 mapping, meaning
> that virtual addresses are the same as physical addresses. The kernel
> page table uses an offset mapping, meaning that virtual addresses differ
> from physical addresses by a constant (for the most part). Crucially,
> this means that the virtual address of the page that is executed by
> phase 3 differs between the OVMF map and the source map. If we are
> executing code mapped in OVMF and we change CR3 to point to the source
> map, although the page may be mapped in the source map, the virtual
> address will be different, and we will face undefined behavior. To fix
> this, we construct intermediate page tables that map the pages for phase
> 2 and 3 to the virtual address expected in OVMF and to the virtual
> address expected in the source map. Thus, we can switch CR3 from OVMF's
> map to the intermediate map and then from the intermediate map to the
> source map. Phase 2 is much shorter than phase 3. Phase 2 is mainly
> responsible for switching to the intermediate map, flushing the TLB, and
> jumping to phase 3.
>
> Fortunately phase 1 is even simpler than phase 2. Phase 1 has two
> duties. First, since phase 2 and 3 operate without a stack and can't
> access values defined in OVMF (such as the addresses of the pages
> containing phase 2 and 3), phase 1 must pass these values to phase 2 by
> putting them in registers. Second, phase 1 must start phase 2 by jumping
> to it.
>
> Given that we can resume to a snapshot in OVMF, we should be able to
> migrate an SEV guest as long as we can securely communicate the VM
> snapshot from source to destination. For our demo, we do this with a
> handful of QMP commands. More sophisticated methods are required for a
> production implementation.
>
> When we refer to a snapshot, what we really mean is the device state,
> memory, and CPU state of a guest. In live migration this is transmitted
> dynamically as opposed to being saved and restored. Device state is not
> protected by SEV and can be handled entirely by the HV. Memory, on the
> other hand, cannot be handled only by the HV. As mentioned previously,
> memory needs to be encrypted with a transport key. A Migration Handler
> on the source will coordinate with the HV to encrypt pages and transmit
> them to the destination. The destination HV will receive the pages over
> the network and pass them to the Migration Handler in the target VM so
> they can be decrypted. This transmission will occur continuously until
> the memory of the source and target converges.
>
> Plain SEV does not protect the CPU state of the guest and therefore does
> not require any special mechanism for transmission of the CPU state. We
> plan to implement an end-to-end migration with plain SEV first. In
> SEV-ES, the PSP (platform security processor) encrypts CPU state on each
> VMExit. The encrypted state is stored in memory. Normally this memory
> (known as the VMSA) is not mapped into the guest, but we can add an
> entry to the nested page tables that will expose the VMSA to the guest.
> This means that when the guest VMExits, the CPU state will be saved to
> guest memory. With the CPU state in guest memory, it can be transmitted
> to the target using the method described above.
>
> In addition to the changes needed in OVMF to resume the VM, the
> transmission of the VM from source to target will require a new code
> path in the hypervisor. There will also need to be a few minor changes
> to Linux (adding a mapping for our Phase 3 pages). Despite all the
> moving pieces, we believe that this is a feasible approach for
> supporting live migration for SEV and SEV-ES.
>
> For the sake of brevity, we have left out a few issues, including SMP
> support, generation of the intermediate mappings, and more. We have
> included some notes about these issues in the COMPLICATIONS.md file. We
> also have an outline of an end-to-end implementation of live migration
> for SEV-ES in END-TO-END.md. See README.md for info on how to run the
> demo. While this is not a full migration, we hope to show that fast live
> migration with SEV and SEV-ES is possible without major kernel changes.
>
> -Tobin
the one word that comes to my mind upon reading the above is,
"overwhelming".
(I have not been addressed directly, but:
- the subject says "RFC",
- and the documentation at
https://github.com/secure-migration/resume-from-edk2-tooling#what-changes-did-we-make
states that AmdSevPkg was created for convenience, and that the feature
could be integrated into OVMF. (Paraphrased.)
So I guess it's tolerable if I make a comment: )
I've checked out the "mh-state-dev" branch of
<https://github.com/secure-migration/resume-from-efi-edk2.git>. It has
80 commits on top of edk2 master (base commit: d5339c04d7cd,
"UefiCpuPkg/MpInitLib: Add missing explicit PcdLib dependency",
2020-04-23).
These commits were authored over the 6-7 months since April. It's
obviously huge work. To me, most of these commits clearly aim at getting
the demo / proof-of-concept functional, rather than guiding (more
precisely: hand-holding) reviewers through the construction of the feature.
In my opinion, the series is not upstreamable in its current format
(which is presently not much more readable than a single-commit code
drop). Upstreaming is probably not your intent, either, at this time.
I agree that getting feedback ("buy-in") at this level of maturity is
justified from your POV, before you invest more work into cleaning up /
restructuring the series.
My problem is that "hand-holding" is exactly what I'd need -- I cannot
dedicate one or two weeks, as an indivisible block, to understanding
your design. Nor can I approach the series patch-wise in its current
format. Personally I would need the patch series to lead me through the
whole design with baby steps ("ELI5"), meaning small code changes and
detailed commit messages. I'd *also* need the more comprehensive
guide-like documentation, as background material.
Furthermore, I don't have an environment where I can test this
proof-of-concept (and provide you with further incentive for cleaning up
the series, by reporting success).
So I hope others can spend the time discussing the design with you, and
testing / repeating the demo. For me to review the patches, the patches
should condense and replay your thinking process from the last 7 months,
in as small as possible logical steps. (On the list.)
I really don't want to be the bottleneck here, which is why I would
support introducing this feature as a separate top-level package
(AmdSevPkg).
Thanks
Laszlo
next prev parent reply other threads:[~2020-11-03 14:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-28 19:31 RFC: Fast Migration for SEV and SEV-ES - blueprint and proof of concept Tobin Feldman-Fitzthum
2020-10-29 17:06 ` Ashish Kalra
2020-10-29 20:36 ` tobin
2020-10-30 18:35 ` Ashish Kalra
2020-11-03 14:59 ` Laszlo Ersek [this message]
2020-11-04 18:27 ` [edk2-devel] " Tobin Feldman-Fitzthum
2020-11-06 15:45 ` Laszlo Ersek
2020-11-06 20:03 ` Tobin Feldman-Fitzthum
2020-11-06 16:38 ` Dr. David Alan Gilbert
2020-11-06 21:48 ` Tobin Feldman-Fitzthum
2020-11-06 22:17 ` Ashish Kalra
2020-11-09 20:27 ` Tobin Feldman-Fitzthum
2020-11-09 20:34 ` Kalra, Ashish
2020-11-09 19:56 ` Dr. David Alan Gilbert
2020-11-09 22:37 ` Tobin Feldman-Fitzthum
2020-11-09 23:44 ` James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=933a5d2b-a495-37b9-fe8b-243f9bae24d5@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox