From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from ma1-aaemail-dr-lapp01.apple.com (ma1-aaemail-dr-lapp01.apple.com [17.171.2.60])
 by mx.groups.io with SMTP id smtpd.web10.9441.1582862646001007139
 for <devel@edk2.groups.io>;
 Thu, 27 Feb 2020 20:04:06 -0800
Authentication-Results: mx.groups.io;
 dkim=pass header.i=@apple.com header.s=20180706 header.b=oCnFE4tN;
 spf=pass (domain: apple.com, ip: 17.171.2.60, mailfrom: afish@apple.com)
Received: from pps.filterd (ma1-aaemail-dr-lapp01.apple.com [127.0.0.1])
	by ma1-aaemail-dr-lapp01.apple.com (8.16.0.27/8.16.0.27) with SMTP id 01S42FVf035279;
	Thu, 27 Feb 2020 20:04:04 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from :
 message-id : content-type : mime-version : subject : date : in-reply-to :
 cc : to : references; s=20180706;
 bh=sogiF3MOgiVYTzCyPQlgB56fCHd1LMGlZUMDFcQdbi4=;
 b=oCnFE4tNKlrUsoveFRL6Os8+3UmJ8+JszSyxAC/NTiV0AFly0pOwhbbzD5wXo6gmCD50
 93XqVxFrCYpW3egcLoL69IqXgeUJxcKbv6cahZZRDRf6t70Ydqk80MQSJQU8/imTmHMb
 ikkOdZwXIlEGcQII2urDfj4R3lZKpCPMgqr4eRDvYXbqpg1Vlq/nMg4Nx8lypRdRQJ3b
 amsFYIE6LobuyUTID86rFLAH670sixl9B55wna12sQfjAqHXabI7SzyFzOUmNGULkZ1N
 8kfDE5Hmt9x683i/o/rIygD+/nOCFWAQxdD+L92+96/yiyl9Nod9ApAfUgdEMT9xAQ94 yg== 
Received: from rn-mailsvcp-mta-lapp02.rno.apple.com (rn-mailsvcp-mta-lapp02.rno.apple.com [10.225.203.150])
	by ma1-aaemail-dr-lapp01.apple.com with ESMTP id 2yepth55hn-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO);
	Thu, 27 Feb 2020 20:04:04 -0800
Received: from rn-mailsvcp-mmp-lapp02.rno.apple.com
 (rn-mailsvcp-mmp-lapp02.rno.apple.com [17.179.253.15])
 by rn-mailsvcp-mta-lapp02.rno.apple.com
 (Oracle Communications Messaging Server 8.1.0.1.20190704 64bit (built Jul  4
 2019)) with ESMTPS id <0Q6E00BUU8MR5V30@rn-mailsvcp-mta-lapp02.rno.apple.com>;
 Thu, 27 Feb 2020 20:04:03 -0800 (PST)
Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp02.rno.apple.com by
 rn-mailsvcp-mmp-lapp02.rno.apple.com
 (Oracle Communications Messaging Server 8.1.0.1.20190704 64bit (built Jul  4
 2019)) id <0Q6E000008KRDP00@rn-mailsvcp-mmp-lapp02.rno.apple.com>; Thu,
 27 Feb 2020 20:04:03 -0800 (PST)
X-Va-A: 
X-Va-T-CD: 08777febe38bb384cc57fda39d0586b7
X-Va-E-CD: 74fbc9fcbd3d4b0e941105e5641a1eeb
X-Va-R-CD: 7f28ace2b24f1e656a1dc26e6a401e24
X-Va-CD: 0
X-Va-ID: a3b58b51-851a-456c-9a0b-8d01320425de
X-V-A: 
X-V-T-CD: 08777febe38bb384cc57fda39d0586b7
X-V-E-CD: 74fbc9fcbd3d4b0e941105e5641a1eeb
X-V-R-CD: 7f28ace2b24f1e656a1dc26e6a401e24
X-V-CD: 0
X-V-ID: 130bf62f-9d7e-48f3-bcc6-5289b1940415
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572
 definitions=2020-02-27_08:2020-02-26,2020-02-27 signatures=0
Received: from [17.235.11.246] by rn-mailsvcp-mmp-lapp02.rno.apple.com
 (Oracle Communications Messaging Server 8.1.0.1.20190704 64bit (built Jul  4
 2019))
 with ESMTPSA id <0Q6E00NYI8MPIM10@rn-mailsvcp-mmp-lapp02.rno.apple.com>; Thu,
 27 Feb 2020 20:04:02 -0800 (PST)
Sender: afish@apple.com
From: "Andrew Fish" <afish@apple.com>
Message-id: <284BFC25-8534-4147-8616-DE7C410DB681@apple.com>
MIME-version: 1.0 (Mac OS X Mail 13.0 \(3594.4.17\))
Subject: Re: [edk2-devel] A problem with live migration of UEFI virtual machines
Date: Thu, 27 Feb 2020 20:04:00 -0800
In-reply-to: <6666a886-720d-1ead-8f7e-13e65dcaaeb4@redhat.com>
Cc: wuchenye1995 <wuchenye1995@gmail.com>, zhoujianjay <zhoujianjay@gmail.com>,
        =?utf-8?Q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>,
        berrange@redhat.com, "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
        qemu-devel@nongnu.org, discuss <discuss@edk2.groups.io>
To: devel@edk2.groups.io, lersek@redhat.com
References: <tencent_BC7FD00363690990994E90F8@qq.com>
 <87sgjhxbtc.fsf@zen.linaroharston> <20200224152810.GX635661@redhat.com>
 <8b0ec286-9322-ee00-3729-6ec7ee8260a6@redhat.com>
 <3E8BB07B-8730-4AB8-BCB6-EA183FB589C5@apple.com>
 <465a5a84-cac4-de39-8956-e38771807450@redhat.com>
 <8F42F6F1-A65D-490D-9F2F-E12746870B29@apple.com>
 <6666a886-720d-1ead-8f7e-13e65dcaaeb4@redhat.com>
X-Mailer: Apple Mail (2.3594.4.17)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572
 definitions=2020-02-27_08:2020-02-26,2020-02-27 signatures=0
Content-type: multipart/alternative;
 boundary="Apple-Mail=_656EDCE8-0EB3-47A4-A1F9-C2C49993C4D8"

--Apple-Mail=_656EDCE8-0EB3-47A4-A1F9-C2C49993C4D8
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii



> On Feb 26, 2020, at 1:42 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>=20
> Hi Andrew,
>=20
> On 02/25/20 22:35, Andrew Fish wrote:
>=20
>> Laszlo,
>>=20
>> The FLASH offsets changing breaking things makes sense.
>>=20
>> I now realize this is like updating the EFI ROM without rebooting the
>> system.  Thus changes in how the new EFI code works is not the issue.
>>=20
>> Is this migration event visible to the firmware? Traditionally the
>> NVRAM is a region in the FD so if you update the FD you have to skip
>> NVRAM region or save and restore it. Is that activity happening in
>> this case? Even if the ROM layout does not change how do you not lose
>> the contents of the NVRAM store when the live migration happens? Sorry
>> if this is a remedial question but I'm trying to learn how this
>> migration works.
>=20
> With live migration, the running guest doesn't notice anything. This is
> a general requirement for live migration (regardless of UEFI or flash).
>=20
> You are very correct to ask about "skipping" the NVRAM region. With the
> approach that OvmfPkg originally supported, live migration would simply
> be unfeasible. The "build" utility would produce a single (unified)
> OVMF.fd file, which would contain both NVRAM and executable regions, and
> the guest's variable updates would modify the one file that would exist.
> This is inappropriate even without considering live migration, because
> OVMF binary upgrades (package updates) on the virtualization host would
> force guests to lose their private variable stores (NVRAMs).
>=20
> Therefore, the "build" utility produces "split" files too, in addition
> to the unified OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.
> OVMF.fd is simply the concatenation of the latter two.
>=20
> $ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd
> [prints nothing]


Laszlo,

Thanks for the detailed explanation.=20

Maybe I was overcomplicating this. Given your explanation I think the part=
 I'm missing is OVMF is implying FLASH layout, in this split model, based o=
n the size of the OVMF_CODE.fd and OVMF_VARS.fd.  Given that if OVMF_CODE.f=
d gets bigger the variable address changes from a QEMU point of view. So ba=
sically it is the QEMU  API that is making assumptions about the relative l=
ayout of the FD in the split model that makes a migration to larger ROM not=
 work. Basically the -pflash API does not support changing the size of the =
ROM without moving NVRAM given the way it is currently defined.=20

Given the above it seems like the 2 options are:
1) Pad OVMF_CODE.fd to be very large so there is room to grow.
2) Add some feature to QUEM that allows the variable store address to not =
be based on OVMF_CODE.fd size.=20

I did see this [1] and combined with your email I either understand, or I'=
m still confused? :)

I'm not saying we need to change anything, I'm just trying to make sure I =
understand how OVMF and QEMU are tied to together.=20

[1] https://www.redhat.com/archives/libvir-list/2019-January/msg01031.html

Thanks,

Andrew Fish




>=20
> When you define a new domain (VM) on a virtualization host, the domain
> definition saves a reference (pathname) to the OVMF_CODE.fd file.
> However, the OVMF_VARS.fd file (the variable store *template*) is not
> directly referenced; instead, it is *copied* into a separate (private)
> file for the domain.
>=20
> Furthermore, once booted, guest has two flash chips, one that maps the
> firmware executable OVMF_CODE.fd read-only, and another pflash chip that
> maps its private varstore file read-write.
>=20
> This makes it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via
> package upgrades on the virt host) without messing with varstores that
> were earlier instantiated from OVMF_VARS.fd. What's important here is
> that the various constants in the new (upgraded) OVMF_CODE.fd file
> remain compatible with the *old* OVMF_VARS.fd structure, across package
> upgrades.
>=20
> If that's not possible for introducing e.g. a new feature, then the
> package upgrade must not overwrite the OVMF_CODE.fd file in place, but
> must provide an additional firmware binary. This firmware binary can
> then only be used by freshly defined domains (old domains cannot be
> switched over). Old domains can be switched over manually -- and only if
> the sysadmin decides it is OK to lose the current variable store
> contents. Then the old varstore file for the domain is deleted
> (manually), the domain definition is updated, and then a new (logically
> empty, pristine) varstore can be created from the *new* OVMF_2_VARS.fd
> that matches the *new* OVMF_2_CODE.fd.
>=20
>=20
> During live migration, the "RAM-like" contents of both pflash chips are
> migrated (the guest-side view of both chips remains the same, including
> the case when the writeable chip happens to be in "programming mode",
> i.e., during a UEFI variable write through the Fault Tolerant Write and
> Firmware Volume Block(2) protocols).
>=20
> Once live migration completes, QEMU dumps the full contents of the
> writeable chip to the backing file (on the destination host). Going
> forward, flash writes from within the guest are reflected to said
> host-side file on-line, just like it happened on the source host before
> live migration. If the file backing the r/w pflash chip is on NFS
> (shared by both src and dst hosts), then this one-time dumping when the
> migration completes is superfluous, but it's also harmless.
>=20
> The interesting question is, what happens when you power down the VM on
> the destination host (=3D post migration), and launch it again there, fr=
om
> zero. In that case, the firmware executable file comes from the
> *destination host* (it was never persistently migrated from the source
> host, i.e. never written out on the dst). It simply comes from the OVMF
> package that had been installed on the destination host, by the
> sysadmin. However, the varstore pflash does reflect the permanent result
> of the previous migration. So this is where things can fall apart, if
> both firmware binaries (on the src host and on the dst host) don't agree
> about the internal structure of the varstore pflash.
>=20
> Thanks
> Laszlo
>=20
>=20
>=20
>=20


--Apple-Mail=_656EDCE8-0EB3-47A4-A1F9-C2C49993C4D8
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset=
=
=3Dus-ascii"></head><body style=3D"word-wrap: break-word; -webkit-nbsp-mod=
e: space; line-break: after-white-space;" class=3D""><br class=3D""><div><b=
r class=3D""><blockquote type=3D"cite" class=3D""><div class=3D"">On Feb 26=
, 2020, at 1:42 AM, Laszlo Ersek &lt;<a href=3D"mailto:lersek@redhat.com" c=
lass=3D"">lersek@redhat.com</a>&gt; wrote:</div><br class=3D"Apple-intercha=
nge-newline"><div class=3D""><div class=3D"">Hi Andrew,<br class=3D""><br c=
lass=3D"">On 02/25/20 22:35, Andrew Fish wrote:<br class=3D""><br class=3D"=
"><blockquote type=3D"cite" class=3D"">Laszlo,<br class=3D""><br class=3D""=
>The FLASH offsets changing breaking things makes sense.<br class=3D""><br =
class=3D"">I now realize this is like updating the EFI ROM without rebootin=
g the<br class=3D"">system. &nbsp;Thus changes in how the new EFI code work=
s is not the issue.<br class=3D""><br class=3D"">Is this migration event vi=
sible to the firmware? Traditionally the<br class=3D"">NVRAM is a region in=
 the FD so if you update the FD you have to skip<br class=3D"">NVRAM region=
 or save and restore it. Is that activity happening in<br class=3D"">this c=
ase? Even if the ROM layout does not change how do you not lose<br class=3D=
"">the contents of the NVRAM store when the live migration happens? Sorry<b=
r class=3D"">if this is a remedial question but I'm trying to learn how thi=
s<br class=3D"">migration works.<br class=3D""></blockquote><br class=3D"">=
With live migration, the running guest doesn't notice anything. This is<br =
class=3D"">a general requirement for live migration (regardless of UEFI or =
flash).<br class=3D""><br class=3D"">You are very correct to ask about "ski=
pping" the NVRAM region. With the<br class=3D"">approach that OvmfPkg origi=
nally supported, live migration would simply<br class=3D"">be unfeasible. T=
he "build" utility would produce a single (unified)<br class=3D"">OVMF.fd f=
ile, which would contain both NVRAM and executable regions, and<br class=3D=
"">the guest's variable updates would modify the one file that would exist.=
<br class=3D"">This is inappropriate even without considering live migratio=
n, because<br class=3D"">OVMF binary upgrades (package updates) on the virt=
ualization host would<br class=3D"">force guests to lose their private vari=
able stores (NVRAMs).<br class=3D""><br class=3D""></div></div></blockquote=
><blockquote type=3D"cite" class=3D""><div class=3D""><div class=3D"">There=
fore, the "build" utility produces "split" files too, in addition<br class=
=3D"">to the unified OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.<=
br class=3D"">OVMF.fd is simply the concatenation of the latter two.<br cla=
ss=3D""><br class=3D"">$ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd<br c=
lass=3D"">[prints nothing]<br class=3D""></div></div></blockquote><div><br =
class=3D""></div><div><br class=3D""></div><div><div>Laszlo,</div><div><br =
class=3D""></div><div>Thanks for the detailed explanation.&nbsp;</div><div>=
<br class=3D""></div><div>Maybe I was overcomplicating this. Given your exp=
lanation I think the part I'm missing is OVMF is implying FLASH layout, in =
this split model, based on the size of the&nbsp;OVMF_CODE.fd and&nbsp;<span=
 style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); background-color:=
 rgb(255, 255, 255);" class=3D"">OVMF_VARS.fd. &nbsp;Given that if&nbsp;</s=
pan>OVMF_CODE.fd gets bigger the variable address changes from a QEMU point=
 of view. So basically it is the QEMU &nbsp;API that is making assumptions =
about the relative layout of the FD in the split model that makes a migrati=
on to larger ROM not work. Basically the -pflash API does not support chang=
ing the size of the ROM without moving NVRAM given the way it is currently =
defined.&nbsp;</div><div><br class=3D""></div><div>Given the above it seems=
 like the 2 options are:</div><div>1) Pad&nbsp;<span style=3D"caret-color: =
rgb(0, 0, 0); color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" c=
lass=3D"">OVMF_CODE.fd to be very large so there is room to grow.</span></d=
iv><div><span style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); back=
ground-color: rgb(255, 255, 255);" class=3D"">2) Add some feature to QUEM t=
hat allows the variable store address to not be based on&nbsp;</span><span =
style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); background-color: =
rgb(255, 255, 255);" class=3D"">OVMF_CODE.fd size.&nbsp;</span></div><div><=
span style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); background-co=
lor: rgb(255, 255, 255);" class=3D""><br class=3D""></span></div><div><span=
 style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); background-color:=
 rgb(255, 255, 255);" class=3D"">I did see this [1] and combined with your =
email I either understand, or I'm still confused? :)</span></div><div><br c=
lass=3D""></div><div>I'm not saying we need to change anything, I'm just tr=
ying to make sure I understand how OVMF and QEMU are tied to together.&nbsp=
;</div><div><span style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); =
background-color: rgb(255, 255, 255);" class=3D""><br class=3D""></span></d=
iv><div><span style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); back=
ground-color: rgb(255, 255, 255);" class=3D"">[1]&nbsp;</span><font color=
=3D"#000000" class=3D""><a href=3D"https://www.redhat.com/archives/libvir-=
list/2019-January/msg01031.html" class=3D"">https://www.redhat.com/archives=
/libvir-list/2019-January/msg01031.html</a></font></div><div><span style=3D=
"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); background-color: rgb(255,=
 255, 255);" class=3D""><br class=3D""></span></div><div><span style=3D"car=
et-color: rgb(0, 0, 0); color: rgb(0, 0, 0); background-color: rgb(255, 255=
, 255);" class=3D"">Thanks,</span></div><div><span style=3D"caret-color: rg=
b(0, 0, 0); color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" cla=
ss=3D""><br class=3D""></span></div><div><span style=3D"caret-color: rgb(0,=
 0, 0); color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class=
=3D"">Andrew Fish</span></div><div><br class=3D""></div><blockquote type=
=3D"cite" class=3D""><div class=3D""></div></blockquote></div><div><br cla=
ss=3D""></div><br class=3D""><blockquote type=3D"cite" class=3D""><div clas=
s=3D""><div class=3D""><br class=3D"">When you define a new domain (VM) on =
a virtualization host, the domain<br class=3D"">definition saves a referenc=
e (pathname) to the OVMF_CODE.fd file.<br class=3D"">However, the OVMF_VARS=
.fd file (the variable store *template*) is not<br class=3D"">directly refe=
renced; instead, it is *copied* into a separate (private)<br class=3D"">fil=
e for the domain.<br class=3D""><br class=3D"">Furthermore, once booted, gu=
est has two flash chips, one that maps the<br class=3D"">firmware executabl=
e OVMF_CODE.fd read-only, and another pflash chip that<br class=3D"">maps i=
ts private varstore file read-write.<br class=3D""><br class=3D"">This make=
s it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via<br class=3D"">p=
ackage upgrades on the virt host) without messing with varstores that<br cl=
ass=3D"">were earlier instantiated from OVMF_VARS.fd. What's important here=
 is<br class=3D"">that the various constants in the new (upgraded) OVMF_COD=
E.fd file<br class=3D"">remain compatible with the *old* OVMF_VARS.fd struc=
ture, across package<br class=3D"">upgrades.<br class=3D""><br class=3D"">I=
f that's not possible for introducing e.g. a new feature, then the<br class=
=
=3D"">package upgrade must not overwrite the OVMF_CODE.fd file in place, b=
ut<br class=3D"">must provide an additional firmware binary. This firmware =
binary can<br class=3D"">then only be used by freshly defined domains (old =
domains cannot be<br class=3D"">switched over). Old domains can be switched=
 over manually -- and only if<br class=3D"">the sysadmin decides it is OK t=
o lose the current variable store<br class=3D"">contents. Then the old vars=
tore file for the domain is deleted<br class=3D"">(manually), the domain de=
finition is updated, and then a new (logically<br class=3D"">empty, pristin=
e) varstore can be created from the *new* OVMF_2_VARS.fd<br class=3D"">that=
 matches the *new* OVMF_2_CODE.fd.<br class=3D""><br class=3D""><br class=
=3D"">During live migration, the "RAM-like" contents of both pflash chips =
are<br class=3D"">migrated (the guest-side view of both chips remains the s=
ame, including<br class=3D"">the case when the writeable chip happens to be=
 in "programming mode",<br class=3D"">i.e., during a UEFI variable write th=
rough the Fault Tolerant Write and<br class=3D"">Firmware Volume Block(2) p=
rotocols).<br class=3D""><br class=3D"">Once live migration completes, QEMU=
 dumps the full contents of the<br class=3D"">writeable chip to the backing=
 file (on the destination host). Going<br class=3D"">forward, flash writes =
from within the guest are reflected to said<br class=3D"">host-side file on=
-line, just like it happened on the source host before<br class=3D"">live m=
igration. If the file backing the r/w pflash chip is on NFS<br class=3D"">(=
shared by both src and dst hosts), then this one-time dumping when the<br c=
lass=3D"">migration completes is superfluous, but it's also harmless.<br cl=
ass=3D""><br class=3D"">The interesting question is, what happens when you =
power down the VM on<br class=3D"">the destination host (=3D post migration=
), and launch it again there, from<br class=3D"">zero. In that case, the fi=
rmware executable file comes from the<br class=3D"">*destination host* (it =
was never persistently migrated from the source<br class=3D"">host, i.e. ne=
ver written out on the dst). It simply comes from the OVMF<br class=3D"">pa=
ckage that had been installed on the destination host, by the<br class=3D""=
>sysadmin. However, the varstore pflash does reflect the permanent result<b=
r class=3D"">of the previous migration. So this is where things can fall ap=
art, if<br class=3D"">both firmware binaries (on the src host and on the ds=
t host) don't agree<br class=3D"">about the internal structure of the varst=
ore pflash.<br class=3D""><br class=3D"">Thanks<br class=3D"">Laszlo<br cla=
ss=3D""><br class=3D""><br class=3D""><br class=3D""><br class=3D""></div><=
/div></blockquote></div><br class=3D""></body></html>

--Apple-Mail=_656EDCE8-0EB3-47A4-A1F9-C2C49993C4D8--