From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jiewen.yao@intel.com>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 5437021CEB0F6
 for <edk2-devel@lists.01.org>; Thu,  7 Sep 2017 07:45:52 -0700 (PDT)
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 07 Sep 2017 07:48:32 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.42,358,1500966000"; 
 d="scan'208,217";a="309092678"
Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206])
 by fmsmga004.fm.intel.com with ESMTP; 07 Sep 2017 07:48:30 -0700
Received: from fmsmsx155.amr.corp.intel.com (10.18.116.71) by
 FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Thu, 7 Sep 2017 07:48:30 -0700
Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by
 FMSMSX155.amr.corp.intel.com (10.18.116.71) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Thu, 7 Sep 2017 07:48:29 -0700
Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.39]) by
 SHSMSX151.ccr.corp.intel.com ([169.254.3.98]) with mapi id 14.03.0319.002;
 Thu, 7 Sep 2017 22:48:04 +0800
From: "Yao, Jiewen" <jiewen.yao@intel.com>
To: Brijesh Singh <brijesh.singh@amd.com>, Laszlo Ersek <lersek@redhat.com>,
 "Zeng, Star" <star.zeng@intel.com>, edk2-devel-01 <edk2-devel@lists.01.org>
CC: "Dong, Eric" <eric.dong@intel.com>
Thread-Topic: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap
 common buffers at ExitBootServices()
Thread-Index: AQHTJO6HlEgpQf73uk+WGvJRCtH+06KkA0GAgACz0wCAAMTM8IAAA0SAgADDw8D//836AIAA9pEQgAA8GYCAADkqgIABX6yQ///xnwCAADB3AIAAhodA
Date: Thu, 7 Sep 2017 14:48:03 +0000
Message-ID: <74D8A39837DF1E4DA445A8C0B3885C503A9A92BA@shsmsx102.ccr.corp.intel.com>
References: <20170903195449.30261-1-lersek@redhat.com>
 <0C09AFA07DD0434D9E2A0C6AEB0483103B93A125@shsmsx102.ccr.corp.intel.com>
 <4b24b1eb-362f-3b46-97e2-bdfda53f40c9@redhat.com>
 <74D8A39837DF1E4DA445A8C0B3885C503A9A79BD@shsmsx102.ccr.corp.intel.com>
 <5f1fdc84-5824-bee2-5a1a-fbd67adf5443@redhat.com>
 <74D8A39837DF1E4DA445A8C0B3885C503A9A7F10@shsmsx102.ccr.corp.intel.com>
 <12d71f32-9dcf-4f6e-b033-d5f82104caca@redhat.com>
 <74D8A39837DF1E4DA445A8C0B3885C503A9A852C@shsmsx102.ccr.corp.intel.com>
 <24d66e81-336e-3924-8045-4749d98e2fbb@redhat.com>
 <94154cfe-92bf-ba6f-f25f-3963891f6932@amd.com>
 <74D8A39837DF1E4DA445A8C0B3885C503A9A8F0C@shsmsx102.ccr.corp.intel.com>
 <c7368a54-d448-562a-1a72-704a3087f35d@redhat.com>
 <976f687a-1add-f86a-0061-0137f2df0eb6@amd.com>
In-Reply-To: <976f687a-1add-f86a-0061-0137f2df0eb6@amd.com>
Accept-Language: zh-CN, en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-version: 11.0.0.116
dlp-reaction: no-action
x-originating-ip: [10.239.127.40]
MIME-Version: 1.0
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
Subject: Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Sep 2017 14:45:52 -0000
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Great. Thanks to confirm that. Now it is clear to me.

Then let's wait Laszlo's new patch to make all DMA buffer to private. :)

Thank you
Yao, Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Brij=
esh Singh
Sent: Thursday, September 7, 2017 10:40 PM
To: Laszlo Ersek <lersek@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Z=
eng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
Cc: brijesh.singh@amd.com; Dong, Eric <eric.dong@intel.com>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap co=
mmon buffers at ExitBootServices()

Hi Jiewen,

On 09/07/2017 06:46 AM, Laszlo Ersek wrote:
> On 09/07/17 06:46, Yao, Jiewen wrote:
>> Thanks for the sharing, Brijesh.
>>
>> "If a page was marked as "shared"
>> then its guest responsibility to make it "private" after its done commun=
icating with
>> hypervisor."
>>
>> I believe I have same understanding - a *guest* should guarantee that.
>>
>> My question is: During the *transition* from firmware to OS, *which gues=
t* should make the shared buffer to be private? Is it "guest firmware" or "=
guest OS"?
>>
>> Maybe I can ask the specific question to get it more clear.
>>
>> 1)       If the common DMA buffer is not unmapped at ExitBootService, is=
 that treated as an issue?
>>
>> 2)       If the read/write DMA buffer is not unmapped at ExitBootService=
, is that treated as an issue?
>
> Very good questions, totally to the point.
>
> On the authoritative answer, I defer to Brijesh.
>


Both the above cases (#1 and #2) are problems. Since buffers was owned by f=
irmware
and firmware made it "shared" hence firmware is responsible to mark them as=
 private
after its done with the buffer. In other words, we must call Unmap() from E=
xitBootServices
to ensure that buffers mapped using BusMasterCommonBuffer/BusMasterRead/Bus=
MasterWrite
is marked as "private" before we make it available to the guest OS. (we do =
similar thing
in Linux OS).

Having any kind of side channel to communicate the encryption status of a p=
age
will not work -- we should be able to support a usecase where we boot OVMF =
in
64-bit but launch 32-bit Linux guest. When Linux boots in 32-bit mode it do=
es not have
access to encryption bit (C-bit is bit-47 in page table) and can't mark the=
 page as
private (even if we provide some kind of side-channel).

thank you very much for all your help.

> (
>
> My personal opinion is that both situations (#1 and #2) are problems,
> because they break the *practical* security invariant for SEV guests:
>
> - most memory should be encrypted at all times, *and*
>
> - any memory that is decrypted must have an owner that is responsible
>    for re-encrypting the memory eventually.
>
> Therefore, *either* the firmware has to re-encrypt all remaining DMA
> buffers at ExitBootServices(), *or* a new information channel must be
> designed, from firmware to OS, to carry over the decryption status.
>
> I strongly prefer the first option, for the following reason: the same
> questions apply to all EDK2 IOMMU protocol interfaces, not just the one
> exported by the SEV driver.
>
> )
>
> Thanks,
> Laszlo
>
>> From: Brijesh Singh [mailto:brijesh.singh@amd.com]
>> Sent: Wednesday, September 6, 2017 11:40 PM
>> To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Yao, Jie=
wen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.z=
eng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists=
.01.org<mailto:edk2-devel@lists.01.org>>
>> Cc: brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>; Dong, Eric <eri=
c.dong@intel.com<mailto:eric.dong@intel.com>>
>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap=
 common buffers at ExitBootServices()
>>
>>
>>
>> On 09/06/2017 07:14 AM, Laszlo Ersek wrote:
>>> On 09/06/17 06:37, Yao, Jiewen wrote:
>>>> Thanks for the clarification. Comment in line.
>>>>
>>>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>>>> Sent: Wednesday, September 6, 2017 1:57 AM
>>>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mail=
to:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>; Zeng, Star <star.=
zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmail=
to:star.zeng@intel.com>>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:ed=
k2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@li=
sts.01.org>>>
>>>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com<mailto:=
eric.dong@intel.com%3cmailto:eric.dong@intel.com>>>; Brijesh Singh <brijesh=
.singh@amd.com<mailto:brijesh.singh@amd.com<mailto:brijesh.singh@amd.com%3c=
mailto:brijesh.singh@amd.com>>>
>>>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unm=
ap common buffers at ExitBootServices()
>>>
>>>>> Then after ExitBootService, the OS will take control of CR3 and set c=
orrect
>>>>> encryption bit.
>>>>
>>>> This is true, the guest OS will set up new page tables, and in those
>>>> PTEs, the C-bit ("encrypt") will likely be set by default.
>>>>
>>>> However, the guest OS will not *rewrite* all of the memory, with the
>>>> C-bit set. This means that any memory that the firmware didn't actuall=
y
>>>> re-encrypt (in the IOMMU driver) will still expose stale data to the
>>>> hypervisor.
>>>> [Jiewen] That is an interesting question.
>>>> Is there any security concern associated?
>>>
>>> I can't tell for sure. Answering this question is up to the proponents
>>> of SEV, who have designed the threat model for SEV.
>>>
>>> My operating assumption is that we should keep memory as tightly
>>> encrypted as possible at the firmware --> OS control transfer. It could
>>> be an exaggerated expectation from my side; I'd just better be safe tha=
n
>>> sorry :)
>>>
>>>
>>
>> Let me give some brief intro on SEV (Secure Encrypted Virtualization) an=
d then
>> we can discuss a security model (if needed).
>>
>> SEV is an extension to the AMD-V architecture which supports running enc=
rypted
>> virtual machines (VMs) under the control of a hypervisor. Encrypted VMs =
have their
>> pages (code and data) secured such that only the guest itself has access=
 to the
>> unencrypted version. Each encrypted VMs is associated with a unique encr=
yption
>> key; if its data is accessed by a different entity using a different key=
 the
>> encrypted guest data will be incorrectly decrypted, leading to unintelli=
gible data.
>> You can also find some detail for SEV in white paper [1].
>>
>> SEV encrypted Vs have the concept of private and shared memory. The priv=
ate memory
>> is encrypted with the guest-specific key, while shared memory may be enc=
rypted
>> with hypervisor key. SEV guest VMs can choose which pages they would lik=
e to
>> be private. But the instruction pages and guest page tables are always t=
reated
>> as private by the hardware. The DMA operation inside the guest must be p=
erformed
>> on shared pages -- this is mainly because in virtualization world the de=
vice
>> DMA needs some assistance from the hypervisor.
>>
>> The security model provided by the SEV ensure that hypervisor will no lo=
nger able
>> to inspect or alter any guest code or data. The guest OS controls what i=
t want to
>> share with outside world (in this case with Hypervisor).
>>
>> In software implementation we took approach to encrypt everything possib=
le starting
>> early in boot. In this approach whenever OVMF wants to perform certain D=
MA operations
>> it allocate a shared page, issues the command, free the shared page afte=
r the command
>> is completed (in other word we use sw bounce buffer to complete the DMA =
operation).
>>
>> We have implemented IOMMU driver to provide the following functions:
>>
>> AllocateBuffer():
>> --------------------
>> it allocate a private pages, as per UEFI spec the driver will map the bu=
ffer allocated
>> from this routine using BusMasterCommonBuffer hence we allocate extra st=
ash pages
>> in addition to requested pages.
>>
>>
>> Map
>> ----
>> If requested operation is BusMasterRead or BusMasterWrite then we alloca=
te a shared buffer
>> and DeviceAddress point to shared buffer.
>>
>> If requested operation is BusMasterCommonBuffer then we perform in-place=
 decryption of the
>> contents and update the page-table to clear the C-bit (basically make it=
 shared page)
>>
>> Unmap
>> ------
>> If operation was BusMasterRead or BusMasterWrite then we complete the un=
mapping and free
>> the shared buffer allocated in Map().
>>
>> If operation was BusMasterCommonBuffer then we perform in-place encrypti=
on and set the C-bit
>> (basically make the page private)
>>
>> FreeBuffer:
>> -----------
>> Free the pages
>>
>>
>> Lets run with the below scenario:
>>
>> 1) guest marks a page as "shared" and gives its physical address to HV (=
e.g DMA read)
>> 2) hypervisor completes the request operation
>> 3) hypervisor caches the shared physical address and monitor it for info=
rmation leak
>> 4) sometime later if guest write data in its "shared" physical address t=
hen hypervisor can
>>      easily read it without guest knowledge.
>>
>> SEV hardware can protect us against the attack where someone tries to in=
ject or alter the
>> guest code. As I noted above any instruction fetch is forced as private =
hence if attacker
>> write some code into a shared buffer and point the RIP to his/her code t=
hen instruction
>> fetch will try to decrypt it and get unintelligible opcode. If a page wa=
s marked as "shared"
>> then its guest responsibility to make it "private" after its done commun=
icating with
>> hypervisor.
>>
>> [1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_M=
emory_Encryption_Whitepaper_v7-Public.pdf
>>
>>
>>>> If the C-bit is cleared for a read/write buffer, is the content in the
>>>> read/write buffer also exposed to hypervisor?
>>>
>>> Not immediately. Only when the guest also rewrites the memory through
>>> the appropriate virtual address.
>>>
>>> Basically, in the virtual machine, the C-bit in a PTE that covers a
>>> particular guest virtual address (GVA) controls whether a guest write
>>> through that GVA will result in memory encryption, and if a gues read
>>> through that GVA will result in memory decryption.
>>>
>>> The current state of the C-bit doesn't matter for the hypervisor, what
>>> matters is the last guest write through a GVA whose PTE had the C-bit
>>> set, or clear. If the C-bit was clear when the guest last wrote the
>>> area, then the hypervisor can read the data. If the C-bit was set, then
>>> the hypervisor can only read garbage.
>>>
>>>
>>>> I means if there is security concern, the concern should be applied to
>>>> both common buffer and read/write buffer.
>>>> Then we have to figure a way to resolve both buffer.
>>>
>>> Yes, this is correct. The PciIo.Unmap operation, as currently
>>> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
>>> correctly for all operations, but it can only guarantee *not* freeing
>>> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
>>> is safe, while Unmap()ing Read/Write is not. The encryption would be
>>> re-established just fine for Read/Write as well, but we would change th=
e
>>> UEFI memmap.
>>>
>>> In OVMF, we currently manage this problem by making all asynchronous DM=
A
>>> mappings CommonBuffer, even if they could othewise be Read or Write. We
>>> use Read and Write only if the DMA operation completes before the
>>> higher-level protocol function returns (so we can immediately Unmap the
>>> operation, and the ExitBootServices() handler doesn't have to care).
>>>
>>>
>>>> To be honest, that is exactly my biggest question on this patch:
>>>> why do we only handle the common buffer specially?
>>>
>>> For the following reason:
>>>
>>> - Given that CommonBuffer mappings take a separate AllocateBuffer() /
>>> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
>>> implementors -- beyond what the UEFI spec requries -- to keep *all*
>>> dynamic memory management out of Map/Unmap. If they need dynamic memory
>>> management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
>>> This way Unmap is safe in ExitBootServices handlers.
>>>
>>> - We cannot *reasonably ask* PciIo implementors to provide the same
>>> guarantee for Read and Write mappings, simply because there are no othe=
r
>>> APIs that could manage the dynamic memory for Map and Unmap --
>>> AllocateBuffer and FreeBuffer are not required for Read and Write
>>> mappings. PciIo implementors couldn't agree to our request even if they
>>> wanted to. Therefore Unmapping Read/Write operations is inherently
>>> unsafe in EBS handlers.
>>>
>>>
>>>>> NOTE: The device should still be halt state, because the device is
>>>>> also controlled by OS.
>>>>
>>>> This is the key point -- the device is *not* controlled by the guest O=
S,
>>>> in the worst case.
>>>>
>>>> The virtual device is ultimately implemented by the hypervisor. We don=
't
>>>> expect the virtual device (=3D the hypervisor) to mis-behave on purpos=
e.
>>>> However, if the hypervisor is compromised by an external attacker (for
>>>> example over the network, or via privilege escalation from another
>>>> guest), then all guest RAM that has not been encrypted up to that poin=
t
>>>> becomes visible to the attacker. In other words, the hypervisor ("the
>>>> device") becomes retro-actively malicious.
>>>> [Jiewen] If that is the threat model, I have a question on the exposur=
e:
>>>> 1) If the concern is for existing data, it means all DMA data already =
written.
>>>> However, the DMA data is already consumed or produced by virtual devic=
e
>>>> or a hypervisor. If the hypervisor is attacked, it already gets all th=
e data content.
>>>
>>> Maybe, maybe not. The data may have been sent to a different host over
>>> the network, and wiped from memory.
>>>
>>> Or, the data may have been written to a disk image that is separately
>>> encrypted by the host OS, and has been detached (unplugged) from the
>>> guest, and also unmounted from the host, with the disk key securely
>>> wiped from host memory.
>>>
>>> We can also imagine the reverse direction. Assume that the user of the
>>> virtual machine goes into the UEFI shell in OVMF, starts the EDIT
>>> program, and types some secret information into a text file on the ESP.
>>> Further assume that the disk image is encrypted on the host OS. It is
>>> conceivable that fragments of the secret could remain stuck in the AHCI
>>> (disk) and USB (keyboard) DMA buffers.
>>>
>>> Maybe you think that these are "made up" examples, and I agree, I just
>>> made them up. The point is, it is not my place to discount possible
>>> attack vectors (I haven't designed SEV). Attackers will be limited by
>>> their *own* imaginations only, not by mine :)
>>>
>>>
>>>> 2) if the concern is for future data, it means all user data to be wri=
tten.
>>>> However, the C-bit already prevent that.
>>>
>>> As far as I understand SEV, future data is out of scope. The goal is to
>>> prevent *retroactive* guest data leaks (=3D whatever is currently in ho=
st
>>> OS memory) if an attacker compromises an otherwise non-malicious hyperv=
isor.
>>>
>>> If an attacker compromises a virtualization host, then they can just si=
t
>>> silent and watch data flow into and out of guests from that point
>>> onward, because they can then monitor all DMA (which always happens in
>>> clear-text) real-time.
>>>
>>>
>>>> Maybe I do not fully understand the threat model defined.
>>>> If you can explain more, that would be great.
>>>
>>> Well I tried to explain *my* understanding of SEV :) I hope Brijesh wil=
l
>>> correct me if I'm wrong.
>>>
>>>
>>>> The point of SEV is to keep as much guest data encrypted at all times =
as
>>>> possible, so that *whenever* the hypervisor is compromised by an
>>>> attacker, the guest memory -- which the attacker can inevitably dump
>>>> from the host side -- remains un-intellegible to the attacker.
>>>> [Jiewen] OK. If this is a security question, I suggest we define a cle=
ar
>>>> threat model at first.
>>>> Or what we are doing might be unnecessary or insufficient.
>>>
>>> I agree.
>>>
>>> As I said above, my operating principle has been to re-encrypt all DMA
>>> buffers as soon as possible. For long-standing DMA buffers, re-encrypt
>>> them at ExitBootServices at the latest.
>>>
>>>
>>>> [Jiewen] For "require that Unmap() work for CommonBuffer
>>>> operations without releasing memory", I would hold my opinion until
>>>> it is documented clearly in UEFI spec.
>>>
>>> You are right, of course.
>>>
>>> It's just that we cannot avoid exploring ways, for this feature, that
>>> currently fall outside of the spec. Whatever we do here will be outside
>>> of the spec, one way or another. I suggested what I thought could be a
>>> reasonable extension to the spec, one that could be implemented by PciI=
o
>>> implementors even before the spec is modified.
>>>
>>> edk2's PciIo.Unmap already behaves like this, without breaking the spec=
.
>>>
>>> If there's a better way -- for example modifying CoreExitBootServices()
>>> in edk2, to signal IOMMU drivers separately, *after* notifying other
>>> ExitBootServices() handlers --, that might work as well.
>>>
>>>
>>>> My current concern is:
>>>> First, this sentence is NOT defined in UEFI specification.
>>>
>>> Correct.
>>>
>>>
>>>> Second, it might break an existing implementation or existing feature,=
 such as tracing.
>>>
>>> Correct.
>>>
>>>> Till now, I did not see any restriction in UEFI spec say: In this func=
tion, you are not allowed
>>>> to call memory services.
>>>> The only restriction is
>>>> 1) TPL restriction, where memory service must be <=3D TPL_NOTIFY.
>>>> 2) AP restriction, where no UEFI service/protocol is allowed for AP.
>>>
>>> I agree.
>>>
>>>
>>>> I'm just trying to operate with the APIs currently defined by the UEFI
>>>> spec, and these assumptions were the best I could come up with.
>>>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec=
.
>>>> Especially the IHV Option ROM should not consume any private API.
>>>
>>> I disagree about "only follow". If there are additional requirements
>>> that can be satisfied without breaking the spec, driver implementors ca=
n
>>> choose to conform to both sets of requirements.
>>>
>>> For example (if I understand correctly), Microsoft / Windows present a
>>> bunch of requirements *beyond* the UEFI spec, for both platform and
>>> add-on card firmware. Most vendors conform :)
>>>
>>>
>>>>> [Jiewen] I am not sure who will control "When control is transferred =
to the OS, all
>>>>> guest memory should be encrypted again, even those areas that were on=
ce
>>>>> used as CommonBuffers."
>>>>> For SEV case, I think it should be controlled by OS, because it is ju=
st CR3.
>>>>
>>>> If it was indeed just CR3, then I would fully agree with you.
>>>>
>>>> But -- to my understanding --, ensuring that the memory is actually
>>>> encrypted requires that the guest *write* to the memory through a
>>>> virtual address whose PTE has the C-bit set.
>>>>
>>>> And I don't think the guest OS can be expected to rewrite all of its
>>>> memory at launch. :(
>>>>
>>>> [Jiewen] That is fine.
>>>> I suggest we get clear on the threat model as the first step.
>>>> The threat model for SEV usage in OVMF.
>>>>
>>>> I am sorry if that is already discussed before. I might ignore the con=
versation.
>>>
>>> No problem; it's always good to re-investigate our assumptions and
>>> operating principles.
>>>
>>>
>>>> If you or any SEV owner can share the insight, that will be great.
>>>> See my question above "If that is the threat model, I have a question =
on the exposure:..."
>>>
>>> I shared *my* impressions of the threat model (which are somewhat
>>> unclear, admittedly, and thus could make me overly cautious).
>>>
>>> I hope Brijesh can extend and/or correct my description.
>>>
>>>
>>>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOM=
MU
>>>>> are opposites -- VT-d permits all DMA unless configured otherwise, wh=
ile
>>>>> SEV forbids all DMA unless configured otherwise.
>>>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to=
 disable all DMA access.
>>>>> I setup translation table, but mark all entry to be NOT-PRESENT.
>>>>> I grant DMA access only if there is a specific request by a device.
>>>>>
>>>>> I open DMA access in ExitBootServices, just want to make sure it is c=
ompatible to
>>>>> the existing OS, which might not have knowledge on IOMMU.
>>>>> I will provide a patch to make it a policy very soon. As such VTd IOM=
MU may
>>>>> turn off IOMMU, or keep it enabled at ExitBootService.
>>>>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>>>>
>>>> Thanks for the clarification.
>>>>
>>>> But, again, will this not lead to the possibility that the VT-d IOMMU'=
s
>>>> ExitBootServices() handler disables all DMA *before* the PCI drivers g=
et
>>>> a chance to run their own ExitBootServices() handlers, disabling the
>>>> individual devices?
>>>> [Jiewen] Sorry for the confusing. Let me explain:
>>>> No, VTd never disables *all* DMA buffer at ExitBootService.
>>>>
>>>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
>>>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
>>>> "only allow the DMA from Map() request".
>>>
>>> Okay, but this interpretation was exactly what I thought of first (see
>>> above): "VT-d permits all DMA unless configured otherwise". You answere=
d
>>> that it wasn't the case.
>>>
>>> So basically, if I understand it correctly now, at ExitBootServices the
>>> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
>>>
>>> That is equivalent to decrypting all memory under SEV, and is the exact
>>> opposite of what we want. (As I understand it.)
>>>
>>>
>>>> If that happens, then a series of IOMMU faults could be generated, whi=
ch
>>>> I described above. (I.e., such IOMMU faults could result, at least in
>>>> the case of SEV, in garbage being written to disk, via queued commands=
.)
>>>> [Jiewen] You are right. That would not happen.
>>>> IOMMU driver should not bring any compatibility problem for the PCI dr=
iver,
>>>> iff the PCI driver follows the UEFI specification.
>>>
>>> Agreed.
>>>
>>>
>>>> Now, to finish up, here's an idea I just had.
>>>>
>>>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
>>>> notification function?
>>>>
>>>> If we are, then we could do the following:
>>>>
>>>> * PciIo.Unmap() and friends would work as always (no restrictions on
>>>>     dynamic memory allocation or freeing, for any kind of DMA operatio=
n).
>>>>
>>>> * PCI drivers would not be expected to call PciIo.Unmap() in their
>>>>     ExitBootServices() handlers.
>>>>
>>>> * The IOMMU driver would have an ExitBootServices() notification
>>>>     function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>>>>     (doesn't matter which).
>>>>
>>>> * In this notification function, the IOMMU driver would signal *anothe=
r*
>>>>     event (a private one). The notification function for this event wo=
uld
>>>>     be enqueued strictly at the TPL_CALLBACK level.
>>>>
>>>> * The notification function for the second event (private to the IOMMU=
)
>>>>     would iterate over all existent mappings, and unmap them, without
>>>>     allocating or freeing memory.
>>>>
>>>> The key point is that by signaling the second event, the IOMMU driver
>>>> could order its own cleanup code after all other ExitBootServices()
>>>> callbacks. (Assuming, at least, that no other ExitBootServices()
>>>> callback employs the same trick to defer itself!) Therefore by the tim=
e
>>>> the second callback runs, all PCI devices have been halted, and it is
>>>> safe to tear down the translations.
>>>>
>>>> The ordering would be ensured by the following:
>>>>
>>>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>>>>     to be signaled before the first notification action is taken." Thi=
s
>>>>     means that, by the time the IOMMU's ExitBootServices() handler is
>>>>     entered, all other ExitBootServices() handlers have been *queued* =
at
>>>>     least, at TPL_CALLBACK or TPL_NOTIFY.
>>>>
>>>> - Therefore, when we signal our second callback from there, for
>>>>     TPL_CALLBACK, the second callback will be queued at the end of the
>>>>     TPL_CALLBACK queue.
>>>>
>>>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "i=
s
>>>>     functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>>>>     for the Type argument of CreateEvent." So it wouldn't matter if a
>>>>     driver used CreateEvent() or CreateEventEx() for setting up its ow=
n
>>>>     handler, the handler would be queued just the same.
>>>>
>>>> I think this is ugly and brittle, but perhaps the only way to clean up
>>>> *all* translations safely, with regard to PciIo.Unmap() +
>>>> ExitBootServices(), without updating the UEFI spec.
>>>>
>>>> What do you think?
>>>>
>>>> [Jiewen] If the order is correct, and all PCI device driver is halt at=
 ExitBootServices, that works.
>>>> We do not need update PCI driver and we do not need update UEFI spec.
>>>> We only need update IOMMU driver which is concerned, based upon the th=
reat model.
>>>> That probably is best solution. :-)
>>>
>>> I'm very glad to hear that you like the idea.
>>>
>>> However, it depends on whether we are permitted, by the UEFI spec, to
>>> signal another event in an ExitBootServices() notification function.
>>>
>>> Are we permitted to do that?
>>>
>>> Does the UEFI spec guarantee that the notification function for the
>>> *second* event will be queued just like it would be under "normal"
>>> circumstances?
>>>
>>> (I know we must not allocate or free memory in the notification functio=
n
>>> of the *second* event either; I just want to know if the second event's
>>> handler is *queued* like it would normally be.)
>>>
>>>
>>>> I assume you want to handle both common buffer and read/write buffer, =
right?
>>>
>>> Yes. Under this idea, all kinds of operations would be cleaned up.
>>>
>>>
>>>> And if possible, I still have interest to get clear on the threat mode=
l for SEV in OVMF.
>>>> If you or any SEV owner can share ...
>>>
>>> Absolutely. Please see above.
>>>
>>> Thank you!
>>> Laszlo
>>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
>
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel