From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id EFD7921CEB0FD for ; Thu, 7 Sep 2017 09:38:09 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C8B20356F4; Thu, 7 Sep 2017 16:41:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com C8B20356F4 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lersek@redhat.com Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-54.rdu2.redhat.com [10.10.120.54]) by smtp.corp.redhat.com (Postfix) with ESMTP id CFF4E68D81; Thu, 7 Sep 2017 16:40:58 +0000 (UTC) To: "Yao, Jiewen" , Brijesh Singh , "Zeng, Star" , edk2-devel-01 Cc: "Dong, Eric" References: <20170903195449.30261-1-lersek@redhat.com> <0C09AFA07DD0434D9E2A0C6AEB0483103B93A125@shsmsx102.ccr.corp.intel.com> <4b24b1eb-362f-3b46-97e2-bdfda53f40c9@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A79BD@shsmsx102.ccr.corp.intel.com> <5f1fdc84-5824-bee2-5a1a-fbd67adf5443@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A7F10@shsmsx102.ccr.corp.intel.com> <12d71f32-9dcf-4f6e-b033-d5f82104caca@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A852C@shsmsx102.ccr.corp.intel.com> <24d66e81-336e-3924-8045-4749d98e2fbb@redhat.com> <94154cfe-92bf-ba6f-f25f-3963891f6932@amd.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A8F0C@shsmsx102.ccr.corp.intel.com> <976f687a-1add-f86a-0061-0137f2df0eb6@amd.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A92BA@shsmsx102.ccr.corp.intel.com> From: Laszlo Ersek Message-ID: <5300a156-2244-ae24-7a66-0a90a3a33148@redhat.com> Date: Thu, 7 Sep 2017 18:40:57 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <74D8A39837DF1E4DA445A8C0B3885C503A9A92BA@shsmsx102.ccr.corp.intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 07 Sep 2017 16:41:01 +0000 (UTC) Subject: Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Sep 2017 16:38:10 -0000 Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit On 09/07/17 16:48, Yao, Jiewen wrote: > Great. Thanks to confirm that. Now it is clear to me. > > Then let's wait Laszlo's new patch to make all DMA buffer to private. :) I've got the patches ready and smoke-tested. Let me look a bit more at the logs, and re-review the patches. I hope I can post the series still today (well, in my timezone anyway :) ) Thanks! Laszlo > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Brijesh Singh > Sent: Thursday, September 7, 2017 10:40 PM > To: Laszlo Ersek ; Yao, Jiewen ; Zeng, Star ; edk2-devel-01 > Cc: brijesh.singh@amd.com; Dong, Eric > Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() > > Hi Jiewen, > > On 09/07/2017 06:46 AM, Laszlo Ersek wrote: >> On 09/07/17 06:46, Yao, Jiewen wrote: >>> Thanks for the sharing, Brijesh. >>> >>> "If a page was marked as "shared" >>> then its guest responsibility to make it "private" after its done communicating with >>> hypervisor." >>> >>> I believe I have same understanding - a *guest* should guarantee that. >>> >>> My question is: During the *transition* from firmware to OS, *which guest* should make the shared buffer to be private? Is it "guest firmware" or "guest OS"? >>> >>> Maybe I can ask the specific question to get it more clear. >>> >>> 1) If the common DMA buffer is not unmapped at ExitBootService, is that treated as an issue? >>> >>> 2) If the read/write DMA buffer is not unmapped at ExitBootService, is that treated as an issue? >> >> Very good questions, totally to the point. >> >> On the authoritative answer, I defer to Brijesh. >> > > > Both the above cases (#1 and #2) are problems. Since buffers was owned by firmware > and firmware made it "shared" hence firmware is responsible to mark them as private > after its done with the buffer. In other words, we must call Unmap() from ExitBootServices > to ensure that buffers mapped using BusMasterCommonBuffer/BusMasterRead/BusMasterWrite > is marked as "private" before we make it available to the guest OS. (we do similar thing > in Linux OS). > > Having any kind of side channel to communicate the encryption status of a page > will not work -- we should be able to support a usecase where we boot OVMF in > 64-bit but launch 32-bit Linux guest. When Linux boots in 32-bit mode it does not have > access to encryption bit (C-bit is bit-47 in page table) and can't mark the page as > private (even if we provide some kind of side-channel). > > thank you very much for all your help. > >> ( >> >> My personal opinion is that both situations (#1 and #2) are problems, >> because they break the *practical* security invariant for SEV guests: >> >> - most memory should be encrypted at all times, *and* >> >> - any memory that is decrypted must have an owner that is responsible >> for re-encrypting the memory eventually. >> >> Therefore, *either* the firmware has to re-encrypt all remaining DMA >> buffers at ExitBootServices(), *or* a new information channel must be >> designed, from firmware to OS, to carry over the decryption status. >> >> I strongly prefer the first option, for the following reason: the same >> questions apply to all EDK2 IOMMU protocol interfaces, not just the one >> exported by the SEV driver. >> >> ) >> >> Thanks, >> Laszlo >> >>> From: Brijesh Singh [mailto:brijesh.singh@amd.com] >>> Sent: Wednesday, September 6, 2017 11:40 PM >>> To: Laszlo Ersek >; Yao, Jiewen >; Zeng, Star >; edk2-devel-01 > >>> Cc: brijesh.singh@amd.com; Dong, Eric > >>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() >>> >>> >>> >>> On 09/06/2017 07:14 AM, Laszlo Ersek wrote: >>>> On 09/06/17 06:37, Yao, Jiewen wrote: >>>>> Thanks for the clarification. Comment in line. >>>>> >>>>> From: Laszlo Ersek [mailto:lersek@redhat.com] >>>>> Sent: Wednesday, September 6, 2017 1:57 AM >>>>> To: Yao, Jiewen >>; Zeng, Star >>; edk2-devel-01 >> >>>>> Cc: Dong, Eric >>; Brijesh Singh >> >>>>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() >>>> >>>>>> Then after ExitBootService, the OS will take control of CR3 and set correct >>>>>> encryption bit. >>>>> >>>>> This is true, the guest OS will set up new page tables, and in those >>>>> PTEs, the C-bit ("encrypt") will likely be set by default. >>>>> >>>>> However, the guest OS will not *rewrite* all of the memory, with the >>>>> C-bit set. This means that any memory that the firmware didn't actually >>>>> re-encrypt (in the IOMMU driver) will still expose stale data to the >>>>> hypervisor. >>>>> [Jiewen] That is an interesting question. >>>>> Is there any security concern associated? >>>> >>>> I can't tell for sure. Answering this question is up to the proponents >>>> of SEV, who have designed the threat model for SEV. >>>> >>>> My operating assumption is that we should keep memory as tightly >>>> encrypted as possible at the firmware --> OS control transfer. It could >>>> be an exaggerated expectation from my side; I'd just better be safe than >>>> sorry :) >>>> >>>> >>> >>> Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then >>> we can discuss a security model (if needed). >>> >>> SEV is an extension to the AMD-V architecture which supports running encrypted >>> virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have their >>> pages (code and data) secured such that only the guest itself has access to the >>> unencrypted version. Each encrypted VMs is associated with a unique encryption >>> key; if its data is accessed by a different entity using a different key the >>> encrypted guest data will be incorrectly decrypted, leading to unintelligible data. >>> You can also find some detail for SEV in white paper [1]. >>> >>> SEV encrypted Vs have the concept of private and shared memory. The private memory >>> is encrypted with the guest-specific key, while shared memory may be encrypted >>> with hypervisor key. SEV guest VMs can choose which pages they would like to >>> be private. But the instruction pages and guest page tables are always treated >>> as private by the hardware. The DMA operation inside the guest must be performed >>> on shared pages -- this is mainly because in virtualization world the device >>> DMA needs some assistance from the hypervisor. >>> >>> The security model provided by the SEV ensure that hypervisor will no longer able >>> to inspect or alter any guest code or data. The guest OS controls what it want to >>> share with outside world (in this case with Hypervisor). >>> >>> In software implementation we took approach to encrypt everything possible starting >>> early in boot. In this approach whenever OVMF wants to perform certain DMA operations >>> it allocate a shared page, issues the command, free the shared page after the command >>> is completed (in other word we use sw bounce buffer to complete the DMA operation). >>> >>> We have implemented IOMMU driver to provide the following functions: >>> >>> AllocateBuffer(): >>> -------------------- >>> it allocate a private pages, as per UEFI spec the driver will map the buffer allocated >>> from this routine using BusMasterCommonBuffer hence we allocate extra stash pages >>> in addition to requested pages. >>> >>> >>> Map >>> ---- >>> If requested operation is BusMasterRead or BusMasterWrite then we allocate a shared buffer >>> and DeviceAddress point to shared buffer. >>> >>> If requested operation is BusMasterCommonBuffer then we perform in-place decryption of the >>> contents and update the page-table to clear the C-bit (basically make it shared page) >>> >>> Unmap >>> ------ >>> If operation was BusMasterRead or BusMasterWrite then we complete the unmapping and free >>> the shared buffer allocated in Map(). >>> >>> If operation was BusMasterCommonBuffer then we perform in-place encryption and set the C-bit >>> (basically make the page private) >>> >>> FreeBuffer: >>> ----------- >>> Free the pages >>> >>> >>> Lets run with the below scenario: >>> >>> 1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA read) >>> 2) hypervisor completes the request operation >>> 3) hypervisor caches the shared physical address and monitor it for information leak >>> 4) sometime later if guest write data in its "shared" physical address then hypervisor can >>> easily read it without guest knowledge. >>> >>> SEV hardware can protect us against the attack where someone tries to inject or alter the >>> guest code. As I noted above any instruction fetch is forced as private hence if attacker >>> write some code into a shared buffer and point the RIP to his/her code then instruction >>> fetch will try to decrypt it and get unintelligible opcode. If a page was marked as "shared" >>> then its guest responsibility to make it "private" after its done communicating with >>> hypervisor. >>> >>> [1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf >>> >>> >>>>> If the C-bit is cleared for a read/write buffer, is the content in the >>>>> read/write buffer also exposed to hypervisor? >>>> >>>> Not immediately. Only when the guest also rewrites the memory through >>>> the appropriate virtual address. >>>> >>>> Basically, in the virtual machine, the C-bit in a PTE that covers a >>>> particular guest virtual address (GVA) controls whether a guest write >>>> through that GVA will result in memory encryption, and if a gues read >>>> through that GVA will result in memory decryption. >>>> >>>> The current state of the C-bit doesn't matter for the hypervisor, what >>>> matters is the last guest write through a GVA whose PTE had the C-bit >>>> set, or clear. If the C-bit was clear when the guest last wrote the >>>> area, then the hypervisor can read the data. If the C-bit was set, then >>>> the hypervisor can only read garbage. >>>> >>>> >>>>> I means if there is security concern, the concern should be applied to >>>>> both common buffer and read/write buffer. >>>>> Then we have to figure a way to resolve both buffer. >>>> >>>> Yes, this is correct. The PciIo.Unmap operation, as currently >>>> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption >>>> correctly for all operations, but it can only guarantee *not* freeing >>>> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices >>>> is safe, while Unmap()ing Read/Write is not. The encryption would be >>>> re-established just fine for Read/Write as well, but we would change the >>>> UEFI memmap. >>>> >>>> In OVMF, we currently manage this problem by making all asynchronous DMA >>>> mappings CommonBuffer, even if they could othewise be Read or Write. We >>>> use Read and Write only if the DMA operation completes before the >>>> higher-level protocol function returns (so we can immediately Unmap the >>>> operation, and the ExitBootServices() handler doesn't have to care). >>>> >>>> >>>>> To be honest, that is exactly my biggest question on this patch: >>>>> why do we only handle the common buffer specially? >>>> >>>> For the following reason: >>>> >>>> - Given that CommonBuffer mappings take a separate AllocateBuffer() / >>>> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo >>>> implementors -- beyond what the UEFI spec requries -- to keep *all* >>>> dynamic memory management out of Map/Unmap. If they need dynamic memory >>>> management, we ask them to do it in AllocateBuffer/FreeBuffer instead. >>>> This way Unmap is safe in ExitBootServices handlers. >>>> >>>> - We cannot *reasonably ask* PciIo implementors to provide the same >>>> guarantee for Read and Write mappings, simply because there are no other >>>> APIs that could manage the dynamic memory for Map and Unmap -- >>>> AllocateBuffer and FreeBuffer are not required for Read and Write >>>> mappings. PciIo implementors couldn't agree to our request even if they >>>> wanted to. Therefore Unmapping Read/Write operations is inherently >>>> unsafe in EBS handlers. >>>> >>>> >>>>>> NOTE: The device should still be halt state, because the device is >>>>>> also controlled by OS. >>>>> >>>>> This is the key point -- the device is *not* controlled by the guest OS, >>>>> in the worst case. >>>>> >>>>> The virtual device is ultimately implemented by the hypervisor. We don't >>>>> expect the virtual device (= the hypervisor) to mis-behave on purpose. >>>>> However, if the hypervisor is compromised by an external attacker (for >>>>> example over the network, or via privilege escalation from another >>>>> guest), then all guest RAM that has not been encrypted up to that point >>>>> becomes visible to the attacker. In other words, the hypervisor ("the >>>>> device") becomes retro-actively malicious. >>>>> [Jiewen] If that is the threat model, I have a question on the exposure: >>>>> 1) If the concern is for existing data, it means all DMA data already written. >>>>> However, the DMA data is already consumed or produced by virtual device >>>>> or a hypervisor. If the hypervisor is attacked, it already gets all the data content. >>>> >>>> Maybe, maybe not. The data may have been sent to a different host over >>>> the network, and wiped from memory. >>>> >>>> Or, the data may have been written to a disk image that is separately >>>> encrypted by the host OS, and has been detached (unplugged) from the >>>> guest, and also unmounted from the host, with the disk key securely >>>> wiped from host memory. >>>> >>>> We can also imagine the reverse direction. Assume that the user of the >>>> virtual machine goes into the UEFI shell in OVMF, starts the EDIT >>>> program, and types some secret information into a text file on the ESP. >>>> Further assume that the disk image is encrypted on the host OS. It is >>>> conceivable that fragments of the secret could remain stuck in the AHCI >>>> (disk) and USB (keyboard) DMA buffers. >>>> >>>> Maybe you think that these are "made up" examples, and I agree, I just >>>> made them up. The point is, it is not my place to discount possible >>>> attack vectors (I haven't designed SEV). Attackers will be limited by >>>> their *own* imaginations only, not by mine :) >>>> >>>> >>>>> 2) if the concern is for future data, it means all user data to be written. >>>>> However, the C-bit already prevent that. >>>> >>>> As far as I understand SEV, future data is out of scope. The goal is to >>>> prevent *retroactive* guest data leaks (= whatever is currently in host >>>> OS memory) if an attacker compromises an otherwise non-malicious hypervisor. >>>> >>>> If an attacker compromises a virtualization host, then they can just sit >>>> silent and watch data flow into and out of guests from that point >>>> onward, because they can then monitor all DMA (which always happens in >>>> clear-text) real-time. >>>> >>>> >>>>> Maybe I do not fully understand the threat model defined. >>>>> If you can explain more, that would be great. >>>> >>>> Well I tried to explain *my* understanding of SEV :) I hope Brijesh will >>>> correct me if I'm wrong. >>>> >>>> >>>>> The point of SEV is to keep as much guest data encrypted at all times as >>>>> possible, so that *whenever* the hypervisor is compromised by an >>>>> attacker, the guest memory -- which the attacker can inevitably dump >>>>> from the host side -- remains un-intellegible to the attacker. >>>>> [Jiewen] OK. If this is a security question, I suggest we define a clear >>>>> threat model at first. >>>>> Or what we are doing might be unnecessary or insufficient. >>>> >>>> I agree. >>>> >>>> As I said above, my operating principle has been to re-encrypt all DMA >>>> buffers as soon as possible. For long-standing DMA buffers, re-encrypt >>>> them at ExitBootServices at the latest. >>>> >>>> >>>>> [Jiewen] For "require that Unmap() work for CommonBuffer >>>>> operations without releasing memory", I would hold my opinion until >>>>> it is documented clearly in UEFI spec. >>>> >>>> You are right, of course. >>>> >>>> It's just that we cannot avoid exploring ways, for this feature, that >>>> currently fall outside of the spec. Whatever we do here will be outside >>>> of the spec, one way or another. I suggested what I thought could be a >>>> reasonable extension to the spec, one that could be implemented by PciIo >>>> implementors even before the spec is modified. >>>> >>>> edk2's PciIo.Unmap already behaves like this, without breaking the spec. >>>> >>>> If there's a better way -- for example modifying CoreExitBootServices() >>>> in edk2, to signal IOMMU drivers separately, *after* notifying other >>>> ExitBootServices() handlers --, that might work as well. >>>> >>>> >>>>> My current concern is: >>>>> First, this sentence is NOT defined in UEFI specification. >>>> >>>> Correct. >>>> >>>> >>>>> Second, it might break an existing implementation or existing feature, such as tracing. >>>> >>>> Correct. >>>> >>>>> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed >>>>> to call memory services. >>>>> The only restriction is >>>>> 1) TPL restriction, where memory service must be <= TPL_NOTIFY. >>>>> 2) AP restriction, where no UEFI service/protocol is allowed for AP. >>>> >>>> I agree. >>>> >>>> >>>>> I'm just trying to operate with the APIs currently defined by the UEFI >>>>> spec, and these assumptions were the best I could come up with. >>>>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec. >>>>> Especially the IHV Option ROM should not consume any private API. >>>> >>>> I disagree about "only follow". If there are additional requirements >>>> that can be satisfied without breaking the spec, driver implementors can >>>> choose to conform to both sets of requirements. >>>> >>>> For example (if I understand correctly), Microsoft / Windows present a >>>> bunch of requirements *beyond* the UEFI spec, for both platform and >>>> add-on card firmware. Most vendors conform :) >>>> >>>> >>>>>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all >>>>>> guest memory should be encrypted again, even those areas that were once >>>>>> used as CommonBuffers." >>>>>> For SEV case, I think it should be controlled by OS, because it is just CR3. >>>>> >>>>> If it was indeed just CR3, then I would fully agree with you. >>>>> >>>>> But -- to my understanding --, ensuring that the memory is actually >>>>> encrypted requires that the guest *write* to the memory through a >>>>> virtual address whose PTE has the C-bit set. >>>>> >>>>> And I don't think the guest OS can be expected to rewrite all of its >>>>> memory at launch. :( >>>>> >>>>> [Jiewen] That is fine. >>>>> I suggest we get clear on the threat model as the first step. >>>>> The threat model for SEV usage in OVMF. >>>>> >>>>> I am sorry if that is already discussed before. I might ignore the conversation. >>>> >>>> No problem; it's always good to re-investigate our assumptions and >>>> operating principles. >>>> >>>> >>>>> If you or any SEV owner can share the insight, that will be great. >>>>> See my question above "If that is the threat model, I have a question on the exposure:..." >>>> >>>> I shared *my* impressions of the threat model (which are somewhat >>>> unclear, admittedly, and thus could make me overly cautious). >>>> >>>> I hope Brijesh can extend and/or correct my description. >>>> >>>> >>>>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU >>>>>> are opposites -- VT-d permits all DMA unless configured otherwise, while >>>>>> SEV forbids all DMA unless configured otherwise. >>>>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access. >>>>>> I setup translation table, but mark all entry to be NOT-PRESENT. >>>>>> I grant DMA access only if there is a specific request by a device. >>>>>> >>>>>> I open DMA access in ExitBootServices, just want to make sure it is compatible to >>>>>> the existing OS, which might not have knowledge on IOMMU. >>>>>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may >>>>>> turn off IOMMU, or keep it enabled at ExitBootService. >>>>>> An IOMMU aware OS may take over IOMMU directly and reprogram it. >>>>> >>>>> Thanks for the clarification. >>>>> >>>>> But, again, will this not lead to the possibility that the VT-d IOMMU's >>>>> ExitBootServices() handler disables all DMA *before* the PCI drivers get >>>>> a chance to run their own ExitBootServices() handlers, disabling the >>>>> individual devices? >>>>> [Jiewen] Sorry for the confusing. Let me explain: >>>>> No, VTd never disables *all* DMA buffer at ExitBootService. >>>>> >>>>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA". >>>>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and >>>>> "only allow the DMA from Map() request". >>>> >>>> Okay, but this interpretation was exactly what I thought of first (see >>>> above): "VT-d permits all DMA unless configured otherwise". You answered >>>> that it wasn't the case. >>>> >>>> So basically, if I understand it correctly now, at ExitBootServices the >>>> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct? >>>> >>>> That is equivalent to decrypting all memory under SEV, and is the exact >>>> opposite of what we want. (As I understand it.) >>>> >>>> >>>>> If that happens, then a series of IOMMU faults could be generated, which >>>>> I described above. (I.e., such IOMMU faults could result, at least in >>>>> the case of SEV, in garbage being written to disk, via queued commands.) >>>>> [Jiewen] You are right. That would not happen. >>>>> IOMMU driver should not bring any compatibility problem for the PCI driver, >>>>> iff the PCI driver follows the UEFI specification. >>>> >>>> Agreed. >>>> >>>> >>>>> Now, to finish up, here's an idea I just had. >>>>> >>>>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices() >>>>> notification function? >>>>> >>>>> If we are, then we could do the following: >>>>> >>>>> * PciIo.Unmap() and friends would work as always (no restrictions on >>>>> dynamic memory allocation or freeing, for any kind of DMA operation). >>>>> >>>>> * PCI drivers would not be expected to call PciIo.Unmap() in their >>>>> ExitBootServices() handlers. >>>>> >>>>> * The IOMMU driver would have an ExitBootServices() notification >>>>> function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level >>>>> (doesn't matter which). >>>>> >>>>> * In this notification function, the IOMMU driver would signal *another* >>>>> event (a private one). The notification function for this event would >>>>> be enqueued strictly at the TPL_CALLBACK level. >>>>> >>>>> * The notification function for the second event (private to the IOMMU) >>>>> would iterate over all existent mappings, and unmap them, without >>>>> allocating or freeing memory. >>>>> >>>>> The key point is that by signaling the second event, the IOMMU driver >>>>> could order its own cleanup code after all other ExitBootServices() >>>>> callbacks. (Assuming, at least, that no other ExitBootServices() >>>>> callback employs the same trick to defer itself!) Therefore by the time >>>>> the second callback runs, all PCI devices have been halted, and it is >>>>> safe to tear down the translations. >>>>> >>>>> The ordering would be ensured by the following: >>>>> >>>>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed >>>>> to be signaled before the first notification action is taken." This >>>>> means that, by the time the IOMMU's ExitBootServices() handler is >>>>> entered, all other ExitBootServices() handlers have been *queued* at >>>>> least, at TPL_CALLBACK or TPL_NOTIFY. >>>>> >>>>> - Therefore, when we signal our second callback from there, for >>>>> TPL_CALLBACK, the second callback will be queued at the end of the >>>>> TPL_CALLBACK queue. >>>>> >>>>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is >>>>> functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag >>>>> for the Type argument of CreateEvent." So it wouldn't matter if a >>>>> driver used CreateEvent() or CreateEventEx() for setting up its own >>>>> handler, the handler would be queued just the same. >>>>> >>>>> I think this is ugly and brittle, but perhaps the only way to clean up >>>>> *all* translations safely, with regard to PciIo.Unmap() + >>>>> ExitBootServices(), without updating the UEFI spec. >>>>> >>>>> What do you think? >>>>> >>>>> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works. >>>>> We do not need update PCI driver and we do not need update UEFI spec. >>>>> We only need update IOMMU driver which is concerned, based upon the threat model. >>>>> That probably is best solution. :-) >>>> >>>> I'm very glad to hear that you like the idea. >>>> >>>> However, it depends on whether we are permitted, by the UEFI spec, to >>>> signal another event in an ExitBootServices() notification function. >>>> >>>> Are we permitted to do that? >>>> >>>> Does the UEFI spec guarantee that the notification function for the >>>> *second* event will be queued just like it would be under "normal" >>>> circumstances? >>>> >>>> (I know we must not allocate or free memory in the notification function >>>> of the *second* event either; I just want to know if the second event's >>>> handler is *queued* like it would normally be.) >>>> >>>> >>>>> I assume you want to handle both common buffer and read/write buffer, right? >>>> >>>> Yes. Under this idea, all kinds of operations would be cleaned up. >>>> >>>> >>>>> And if possible, I still have interest to get clear on the threat model for SEV in OVMF. >>>>> If you or any SEV owner can share ... >>>> >>>> Absolutely. Please see above. >>>> >>>> Thank you! >>>> Laszlo >>>> >>> _______________________________________________ >>> edk2-devel mailing list >>> edk2-devel@lists.01.org >>> https://lists.01.org/mailman/listinfo/edk2-devel >>> >> > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel >