From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <lersek@redhat.com>
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 4468821E62BBE
 for <edk2-devel@lists.01.org>; Thu,  7 Sep 2017 04:44:00 -0700 (PDT)
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com
 [10.5.11.13])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id E1A55356D8;
 Thu,  7 Sep 2017 11:46:50 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com E1A55356D8
Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com;
 dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com;
 spf=fail smtp.mailfrom=lersek@redhat.com
Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-54.rdu2.redhat.com
 [10.10.120.54])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 29EF360603;
 Thu,  7 Sep 2017 11:46:48 +0000 (UTC)
To: "Yao, Jiewen" <jiewen.yao@intel.com>,
 Brijesh Singh <brijesh.singh@amd.com>, "Zeng, Star" <star.zeng@intel.com>,
 edk2-devel-01 <edk2-devel@lists.01.org>
Cc: "Dong, Eric" <eric.dong@intel.com>
References: <20170903195449.30261-1-lersek@redhat.com>
 <0C09AFA07DD0434D9E2A0C6AEB0483103B93A125@shsmsx102.ccr.corp.intel.com>
 <4b24b1eb-362f-3b46-97e2-bdfda53f40c9@redhat.com>
 <74D8A39837DF1E4DA445A8C0B3885C503A9A79BD@shsmsx102.ccr.corp.intel.com>
 <5f1fdc84-5824-bee2-5a1a-fbd67adf5443@redhat.com>
 <74D8A39837DF1E4DA445A8C0B3885C503A9A7F10@shsmsx102.ccr.corp.intel.com>
 <12d71f32-9dcf-4f6e-b033-d5f82104caca@redhat.com>
 <74D8A39837DF1E4DA445A8C0B3885C503A9A852C@shsmsx102.ccr.corp.intel.com>
 <24d66e81-336e-3924-8045-4749d98e2fbb@redhat.com>
 <94154cfe-92bf-ba6f-f25f-3963891f6932@amd.com>
 <74D8A39837DF1E4DA445A8C0B3885C503A9A8F0C@shsmsx102.ccr.corp.intel.com>
From: Laszlo Ersek <lersek@redhat.com>
Message-ID: <c7368a54-d448-562a-1a72-704a3087f35d@redhat.com>
Date: Thu, 7 Sep 2017 13:46:48 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <74D8A39837DF1E4DA445A8C0B3885C503A9A8F0C@shsmsx102.ccr.corp.intel.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.5.110.30]); Thu, 07 Sep 2017 11:46:51 +0000 (UTC)
Subject: Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Sep 2017 11:44:00 -0000
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit

On 09/07/17 06:46, Yao, Jiewen wrote:
> Thanks for the sharing, Brijesh.
> 
> "If a page was marked as "shared"
> then its guest responsibility to make it "private" after its done communicating with
> hypervisor."
> 
> I believe I have same understanding - a *guest* should guarantee that.
> 
> My question is: During the *transition* from firmware to OS, *which guest* should make the shared buffer to be private? Is it "guest firmware" or "guest OS"?
> 
> Maybe I can ask the specific question to get it more clear.
> 
> 1)       If the common DMA buffer is not unmapped at ExitBootService, is that treated as an issue?
> 
> 2)       If the read/write DMA buffer is not unmapped at ExitBootService, is that treated as an issue?

Very good questions, totally to the point.

On the authoritative answer, I defer to Brijesh.

(

My personal opinion is that both situations (#1 and #2) are problems,
because they break the *practical* security invariant for SEV guests:

- most memory should be encrypted at all times, *and*

- any memory that is decrypted must have an owner that is responsible
  for re-encrypting the memory eventually.

Therefore, *either* the firmware has to re-encrypt all remaining DMA
buffers at ExitBootServices(), *or* a new information channel must be
designed, from firmware to OS, to carry over the decryption status.

I strongly prefer the first option, for the following reason: the same
questions apply to all EDK2 IOMMU protocol interfaces, not just the one
exported by the SEV driver.

)

Thanks,
Laszlo

> From: Brijesh Singh [mailto:brijesh.singh@amd.com]
> Sent: Wednesday, September 6, 2017 11:40 PM
> To: Laszlo Ersek <lersek@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
> Cc: brijesh.singh@amd.com; Dong, Eric <eric.dong@intel.com>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
> 
> 
> 
> On 09/06/2017 07:14 AM, Laszlo Ersek wrote:
>> On 09/06/17 06:37, Yao, Jiewen wrote:
>>> Thanks for the clarification. Comment in line.
>>>
>>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>>> Sent: Wednesday, September 6, 2017 1:57 AM
>>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
>>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>; Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>>
>>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>
>>>> Then after ExitBootService, the OS will take control of CR3 and set correct
>>>> encryption bit.
>>>
>>> This is true, the guest OS will set up new page tables, and in those
>>> PTEs, the C-bit ("encrypt") will likely be set by default.
>>>
>>> However, the guest OS will not *rewrite* all of the memory, with the
>>> C-bit set. This means that any memory that the firmware didn't actually
>>> re-encrypt (in the IOMMU driver) will still expose stale data to the
>>> hypervisor.
>>> [Jiewen] That is an interesting question.
>>> Is there any security concern associated?
>>
>> I can't tell for sure. Answering this question is up to the proponents
>> of SEV, who have designed the threat model for SEV.
>>
>> My operating assumption is that we should keep memory as tightly
>> encrypted as possible at the firmware --> OS control transfer. It could
>> be an exaggerated expectation from my side; I'd just better be safe than
>> sorry :)
>>
>>
> 
> Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then
> we can discuss a security model (if needed).
> 
> SEV is an extension to the AMD-V architecture which supports running encrypted
> virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have their
> pages (code and data) secured such that only the guest itself has access to the
> unencrypted version. Each encrypted VMs is associated with a unique encryption
> key; if its data is accessed by a different entity using a different key the
> encrypted guest data will be incorrectly decrypted, leading to unintelligible data.
> You can also find some detail for SEV in white paper [1].
> 
> SEV encrypted Vs have the concept of private and shared memory. The private memory
> is encrypted with the guest-specific key, while shared memory may be encrypted
> with hypervisor key. SEV guest VMs can choose which pages they would like to
> be private. But the instruction pages and guest page tables are always treated
> as private by the hardware. The DMA operation inside the guest must be performed
> on shared pages -- this is mainly because in virtualization world the device
> DMA needs some assistance from the hypervisor.
> 
> The security model provided by the SEV ensure that hypervisor will no longer able
> to inspect or alter any guest code or data. The guest OS controls what it want to
> share with outside world (in this case with Hypervisor).
> 
> In software implementation we took approach to encrypt everything possible starting
> early in boot. In this approach whenever OVMF wants to perform certain DMA operations
> it allocate a shared page, issues the command, free the shared page after the command
> is completed (in other word we use sw bounce buffer to complete the DMA operation).
> 
> We have implemented IOMMU driver to provide the following functions:
> 
> AllocateBuffer():
> --------------------
> it allocate a private pages, as per UEFI spec the driver will map the buffer allocated
> from this routine using BusMasterCommonBuffer hence we allocate extra stash pages
> in addition to requested pages.
> 
> 
> Map
> ----
> If requested operation is BusMasterRead or BusMasterWrite then we allocate a shared buffer
> and DeviceAddress point to shared buffer.
> 
> If requested operation is BusMasterCommonBuffer then we perform in-place decryption of the
> contents and update the page-table to clear the C-bit (basically make it shared page)
> 
> Unmap
> ------
> If operation was BusMasterRead or BusMasterWrite then we complete the unmapping and free
> the shared buffer allocated in Map().
> 
> If operation was BusMasterCommonBuffer then we perform in-place encryption and set the C-bit
> (basically make the page private)
> 
> FreeBuffer:
> -----------
> Free the pages
> 
> 
> Lets run with the below scenario:
> 
> 1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA read)
> 2) hypervisor completes the request operation
> 3) hypervisor caches the shared physical address and monitor it for information leak
> 4) sometime later if guest write data in its "shared" physical address then hypervisor can
>     easily read it without guest knowledge.
> 
> SEV hardware can protect us against the attack where someone tries to inject or alter the
> guest code. As I noted above any instruction fetch is forced as private hence if attacker
> write some code into a shared buffer and point the RIP to his/her code then instruction
> fetch will try to decrypt it and get unintelligible opcode. If a page was marked as "shared"
> then its guest responsibility to make it "private" after its done communicating with
> hypervisor.
> 
> [1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
> 
> 
>>> If the C-bit is cleared for a read/write buffer, is the content in the
>>> read/write buffer also exposed to hypervisor?
>>
>> Not immediately. Only when the guest also rewrites the memory through
>> the appropriate virtual address.
>>
>> Basically, in the virtual machine, the C-bit in a PTE that covers a
>> particular guest virtual address (GVA) controls whether a guest write
>> through that GVA will result in memory encryption, and if a gues read
>> through that GVA will result in memory decryption.
>>
>> The current state of the C-bit doesn't matter for the hypervisor, what
>> matters is the last guest write through a GVA whose PTE had the C-bit
>> set, or clear. If the C-bit was clear when the guest last wrote the
>> area, then the hypervisor can read the data. If the C-bit was set, then
>> the hypervisor can only read garbage.
>>
>>
>>> I means if there is security concern, the concern should be applied to
>>> both common buffer and read/write buffer.
>>> Then we have to figure a way to resolve both buffer.
>>
>> Yes, this is correct. The PciIo.Unmap operation, as currently
>> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
>> correctly for all operations, but it can only guarantee *not* freeing
>> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
>> is safe, while Unmap()ing Read/Write is not. The encryption would be
>> re-established just fine for Read/Write as well, but we would change the
>> UEFI memmap.
>>
>> In OVMF, we currently manage this problem by making all asynchronous DMA
>> mappings CommonBuffer, even if they could othewise be Read or Write. We
>> use Read and Write only if the DMA operation completes before the
>> higher-level protocol function returns (so we can immediately Unmap the
>> operation, and the ExitBootServices() handler doesn't have to care).
>>
>>
>>> To be honest, that is exactly my biggest question on this patch:
>>> why do we only handle the common buffer specially?
>>
>> For the following reason:
>>
>> - Given that CommonBuffer mappings take a separate AllocateBuffer() /
>> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
>> implementors -- beyond what the UEFI spec requries -- to keep *all*
>> dynamic memory management out of Map/Unmap. If they need dynamic memory
>> management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
>> This way Unmap is safe in ExitBootServices handlers.
>>
>> - We cannot *reasonably ask* PciIo implementors to provide the same
>> guarantee for Read and Write mappings, simply because there are no other
>> APIs that could manage the dynamic memory for Map and Unmap --
>> AllocateBuffer and FreeBuffer are not required for Read and Write
>> mappings. PciIo implementors couldn't agree to our request even if they
>> wanted to. Therefore Unmapping Read/Write operations is inherently
>> unsafe in EBS handlers.
>>
>>
>>>> NOTE: The device should still be halt state, because the device is
>>>> also controlled by OS.
>>>
>>> This is the key point -- the device is *not* controlled by the guest OS,
>>> in the worst case.
>>>
>>> The virtual device is ultimately implemented by the hypervisor. We don't
>>> expect the virtual device (= the hypervisor) to mis-behave on purpose.
>>> However, if the hypervisor is compromised by an external attacker (for
>>> example over the network, or via privilege escalation from another
>>> guest), then all guest RAM that has not been encrypted up to that point
>>> becomes visible to the attacker. In other words, the hypervisor ("the
>>> device") becomes retro-actively malicious.
>>> [Jiewen] If that is the threat model, I have a question on the exposure:
>>> 1) If the concern is for existing data, it means all DMA data already written.
>>> However, the DMA data is already consumed or produced by virtual device
>>> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.
>>
>> Maybe, maybe not. The data may have been sent to a different host over
>> the network, and wiped from memory.
>>
>> Or, the data may have been written to a disk image that is separately
>> encrypted by the host OS, and has been detached (unplugged) from the
>> guest, and also unmounted from the host, with the disk key securely
>> wiped from host memory.
>>
>> We can also imagine the reverse direction. Assume that the user of the
>> virtual machine goes into the UEFI shell in OVMF, starts the EDIT
>> program, and types some secret information into a text file on the ESP.
>> Further assume that the disk image is encrypted on the host OS. It is
>> conceivable that fragments of the secret could remain stuck in the AHCI
>> (disk) and USB (keyboard) DMA buffers.
>>
>> Maybe you think that these are "made up" examples, and I agree, I just
>> made them up. The point is, it is not my place to discount possible
>> attack vectors (I haven't designed SEV). Attackers will be limited by
>> their *own* imaginations only, not by mine :)
>>
>>
>>> 2) if the concern is for future data, it means all user data to be written.
>>> However, the C-bit already prevent that.
>>
>> As far as I understand SEV, future data is out of scope. The goal is to
>> prevent *retroactive* guest data leaks (= whatever is currently in host
>> OS memory) if an attacker compromises an otherwise non-malicious hypervisor.
>>
>> If an attacker compromises a virtualization host, then they can just sit
>> silent and watch data flow into and out of guests from that point
>> onward, because they can then monitor all DMA (which always happens in
>> clear-text) real-time.
>>
>>
>>> Maybe I do not fully understand the threat model defined.
>>> If you can explain more, that would be great.
>>
>> Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
>> correct me if I'm wrong.
>>
>>
>>> The point of SEV is to keep as much guest data encrypted at all times as
>>> possible, so that *whenever* the hypervisor is compromised by an
>>> attacker, the guest memory -- which the attacker can inevitably dump
>>> from the host side -- remains un-intellegible to the attacker.
>>> [Jiewen] OK. If this is a security question, I suggest we define a clear
>>> threat model at first.
>>> Or what we are doing might be unnecessary or insufficient.
>>
>> I agree.
>>
>> As I said above, my operating principle has been to re-encrypt all DMA
>> buffers as soon as possible. For long-standing DMA buffers, re-encrypt
>> them at ExitBootServices at the latest.
>>
>>
>>> [Jiewen] For "require that Unmap() work for CommonBuffer
>>> operations without releasing memory", I would hold my opinion until
>>> it is documented clearly in UEFI spec.
>>
>> You are right, of course.
>>
>> It's just that we cannot avoid exploring ways, for this feature, that
>> currently fall outside of the spec. Whatever we do here will be outside
>> of the spec, one way or another. I suggested what I thought could be a
>> reasonable extension to the spec, one that could be implemented by PciIo
>> implementors even before the spec is modified.
>>
>> edk2's PciIo.Unmap already behaves like this, without breaking the spec.
>>
>> If there's a better way -- for example modifying CoreExitBootServices()
>> in edk2, to signal IOMMU drivers separately, *after* notifying other
>> ExitBootServices() handlers --, that might work as well.
>>
>>
>>> My current concern is:
>>> First, this sentence is NOT defined in UEFI specification.
>>
>> Correct.
>>
>>
>>> Second, it might break an existing implementation or existing feature, such as tracing.
>>
>> Correct.
>>
>>> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
>>> to call memory services.
>>> The only restriction is
>>> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
>>> 2) AP restriction, where no UEFI service/protocol is allowed for AP.
>>
>> I agree.
>>
>>
>>> I'm just trying to operate with the APIs currently defined by the UEFI
>>> spec, and these assumptions were the best I could come up with.
>>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
>>> Especially the IHV Option ROM should not consume any private API.
>>
>> I disagree about "only follow". If there are additional requirements
>> that can be satisfied without breaking the spec, driver implementors can
>> choose to conform to both sets of requirements.
>>
>> For example (if I understand correctly), Microsoft / Windows present a
>> bunch of requirements *beyond* the UEFI spec, for both platform and
>> add-on card firmware. Most vendors conform :)
>>
>>
>>>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>>>> guest memory should be encrypted again, even those areas that were once
>>>> used as CommonBuffers."
>>>> For SEV case, I think it should be controlled by OS, because it is just CR3.
>>>
>>> If it was indeed just CR3, then I would fully agree with you.
>>>
>>> But -- to my understanding --, ensuring that the memory is actually
>>> encrypted requires that the guest *write* to the memory through a
>>> virtual address whose PTE has the C-bit set.
>>>
>>> And I don't think the guest OS can be expected to rewrite all of its
>>> memory at launch. :(
>>>
>>> [Jiewen] That is fine.
>>> I suggest we get clear on the threat model as the first step.
>>> The threat model for SEV usage in OVMF.
>>>
>>> I am sorry if that is already discussed before. I might ignore the conversation.
>>
>> No problem; it's always good to re-investigate our assumptions and
>> operating principles.
>>
>>
>>> If you or any SEV owner can share the insight, that will be great.
>>> See my question above "If that is the threat model, I have a question on the exposure:..."
>>
>> I shared *my* impressions of the threat model (which are somewhat
>> unclear, admittedly, and thus could make me overly cautious).
>>
>> I hope Brijesh can extend and/or correct my description.
>>
>>
>>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>>>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>>>> SEV forbids all DMA unless configured otherwise.
>>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>>>> I setup translation table, but mark all entry to be NOT-PRESENT.
>>>> I grant DMA access only if there is a specific request by a device.
>>>>
>>>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>>>> the existing OS, which might not have knowledge on IOMMU.
>>>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>>>> turn off IOMMU, or keep it enabled at ExitBootService.
>>>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>>>
>>> Thanks for the clarification.
>>>
>>> But, again, will this not lead to the possibility that the VT-d IOMMU's
>>> ExitBootServices() handler disables all DMA *before* the PCI drivers get
>>> a chance to run their own ExitBootServices() handlers, disabling the
>>> individual devices?
>>> [Jiewen] Sorry for the confusing. Let me explain:
>>> No, VTd never disables *all* DMA buffer at ExitBootService.
>>>
>>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
>>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
>>> "only allow the DMA from Map() request".
>>
>> Okay, but this interpretation was exactly what I thought of first (see
>> above): "VT-d permits all DMA unless configured otherwise". You answered
>> that it wasn't the case.
>>
>> So basically, if I understand it correctly now, at ExitBootServices the
>> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
>>
>> That is equivalent to decrypting all memory under SEV, and is the exact
>> opposite of what we want. (As I understand it.)
>>
>>
>>> If that happens, then a series of IOMMU faults could be generated, which
>>> I described above. (I.e., such IOMMU faults could result, at least in
>>> the case of SEV, in garbage being written to disk, via queued commands.)
>>> [Jiewen] You are right. That would not happen.
>>> IOMMU driver should not bring any compatibility problem for the PCI driver,
>>> iff the PCI driver follows the UEFI specification.
>>
>> Agreed.
>>
>>
>>> Now, to finish up, here's an idea I just had.
>>>
>>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
>>> notification function?
>>>
>>> If we are, then we could do the following:
>>>
>>> * PciIo.Unmap() and friends would work as always (no restrictions on
>>>    dynamic memory allocation or freeing, for any kind of DMA operation).
>>>
>>> * PCI drivers would not be expected to call PciIo.Unmap() in their
>>>    ExitBootServices() handlers.
>>>
>>> * The IOMMU driver would have an ExitBootServices() notification
>>>    function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>>>    (doesn't matter which).
>>>
>>> * In this notification function, the IOMMU driver would signal *another*
>>>    event (a private one). The notification function for this event would
>>>    be enqueued strictly at the TPL_CALLBACK level.
>>>
>>> * The notification function for the second event (private to the IOMMU)
>>>    would iterate over all existent mappings, and unmap them, without
>>>    allocating or freeing memory.
>>>
>>> The key point is that by signaling the second event, the IOMMU driver
>>> could order its own cleanup code after all other ExitBootServices()
>>> callbacks. (Assuming, at least, that no other ExitBootServices()
>>> callback employs the same trick to defer itself!) Therefore by the time
>>> the second callback runs, all PCI devices have been halted, and it is
>>> safe to tear down the translations.
>>>
>>> The ordering would be ensured by the following:
>>>
>>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>>>    to be signaled before the first notification action is taken." This
>>>    means that, by the time the IOMMU's ExitBootServices() handler is
>>>    entered, all other ExitBootServices() handlers have been *queued* at
>>>    least, at TPL_CALLBACK or TPL_NOTIFY.
>>>
>>> - Therefore, when we signal our second callback from there, for
>>>    TPL_CALLBACK, the second callback will be queued at the end of the
>>>    TPL_CALLBACK queue.
>>>
>>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>>>    functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>>>    for the Type argument of CreateEvent." So it wouldn't matter if a
>>>    driver used CreateEvent() or CreateEventEx() for setting up its own
>>>    handler, the handler would be queued just the same.
>>>
>>> I think this is ugly and brittle, but perhaps the only way to clean up
>>> *all* translations safely, with regard to PciIo.Unmap() +
>>> ExitBootServices(), without updating the UEFI spec.
>>>
>>> What do you think?
>>>
>>> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
>>> We do not need update PCI driver and we do not need update UEFI spec.
>>> We only need update IOMMU driver which is concerned, based upon the threat model.
>>> That probably is best solution. :-)
>>
>> I'm very glad to hear that you like the idea.
>>
>> However, it depends on whether we are permitted, by the UEFI spec, to
>> signal another event in an ExitBootServices() notification function.
>>
>> Are we permitted to do that?
>>
>> Does the UEFI spec guarantee that the notification function for the
>> *second* event will be queued just like it would be under "normal"
>> circumstances?
>>
>> (I know we must not allocate or free memory in the notification function
>> of the *second* event either; I just want to know if the second event's
>> handler is *queued* like it would normally be.)
>>
>>
>>> I assume you want to handle both common buffer and read/write buffer, right?
>>
>> Yes. Under this idea, all kinds of operations would be cleaned up.
>>
>>
>>> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
>>> If you or any SEV owner can share ...
>>
>> Absolutely. Please see above.
>>
>> Thank you!
>> Laszlo
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>