From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 0778D21EA35B9 for ; Wed, 6 Sep 2017 21:43:18 -0700 (PDT) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP; 06 Sep 2017 21:46:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,356,1500966000"; d="scan'208,217";a="149084988" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga006.fm.intel.com with ESMTP; 06 Sep 2017 21:46:08 -0700 Received: from fmsmsx112.amr.corp.intel.com (10.18.116.6) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 6 Sep 2017 21:46:08 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by FMSMSX112.amr.corp.intel.com (10.18.116.6) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 6 Sep 2017 21:46:07 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.39]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.219]) with mapi id 14.03.0319.002; Thu, 7 Sep 2017 12:46:05 +0800 From: "Yao, Jiewen" To: Brijesh Singh , Laszlo Ersek , "Zeng, Star" , edk2-devel-01 CC: "Dong, Eric" Thread-Topic: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() Thread-Index: AQHTJO6HlEgpQf73uk+WGvJRCtH+06KkA0GAgACz0wCAAMTM8IAAA0SAgADDw8D//836AIAA9pEQgAA8GYCAADkqgIABX6yQ Date: Thu, 7 Sep 2017 04:46:04 +0000 Message-ID: <74D8A39837DF1E4DA445A8C0B3885C503A9A8F0C@shsmsx102.ccr.corp.intel.com> References: <20170903195449.30261-1-lersek@redhat.com> <0C09AFA07DD0434D9E2A0C6AEB0483103B93A125@shsmsx102.ccr.corp.intel.com> <4b24b1eb-362f-3b46-97e2-bdfda53f40c9@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A79BD@shsmsx102.ccr.corp.intel.com> <5f1fdc84-5824-bee2-5a1a-fbd67adf5443@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A7F10@shsmsx102.ccr.corp.intel.com> <12d71f32-9dcf-4f6e-b033-d5f82104caca@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A852C@shsmsx102.ccr.corp.intel.com> <24d66e81-336e-3924-8045-4749d98e2fbb@redhat.com> <94154cfe-92bf-ba6f-f25f-3963891f6932@amd.com> In-Reply-To: <94154cfe-92bf-ba6f-f25f-3963891f6932@amd.com> Accept-Language: zh-CN, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [10.239.127.40] MIME-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 Subject: Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Sep 2017 04:43:18 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thanks for the sharing, Brijesh. "If a page was marked as "shared" then its guest responsibility to make it "private" after its done communica= ting with hypervisor." I believe I have same understanding - a *guest* should guarantee that. My question is: During the *transition* from firmware to OS, *which guest* = should make the shared buffer to be private? Is it "guest firmware" or "gue= st OS"? Maybe I can ask the specific question to get it more clear. 1) If the common DMA buffer is not unmapped at ExitBootService, is th= at treated as an issue? 2) If the read/write DMA buffer is not unmapped at ExitBootService, i= s that treated as an issue? Thank you Yao Jiewen From: Brijesh Singh [mailto:brijesh.singh@amd.com] Sent: Wednesday, September 6, 2017 11:40 PM To: Laszlo Ersek ; Yao, Jiewen ; Z= eng, Star ; edk2-devel-01 Cc: brijesh.singh@amd.com; Dong, Eric Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap co= mmon buffers at ExitBootServices() On 09/06/2017 07:14 AM, Laszlo Ersek wrote: > On 09/06/17 06:37, Yao, Jiewen wrote: >> Thanks for the clarification. Comment in line. >> >> From: Laszlo Ersek [mailto:lersek@redhat.com] >> Sent: Wednesday, September 6, 2017 1:57 AM >> To: Yao, Jiewen >; Zen= g, Star >; edk2-devel-01 > >> Cc: Dong, Eric >; Brijes= h Singh > >> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap= common buffers at ExitBootServices() > >>> Then after ExitBootService, the OS will take control of CR3 and set cor= rect >>> encryption bit. >> >> This is true, the guest OS will set up new page tables, and in those >> PTEs, the C-bit ("encrypt") will likely be set by default. >> >> However, the guest OS will not *rewrite* all of the memory, with the >> C-bit set. This means that any memory that the firmware didn't actually >> re-encrypt (in the IOMMU driver) will still expose stale data to the >> hypervisor. >> [Jiewen] That is an interesting question. >> Is there any security concern associated? > > I can't tell for sure. Answering this question is up to the proponents > of SEV, who have designed the threat model for SEV. > > My operating assumption is that we should keep memory as tightly > encrypted as possible at the firmware --> OS control transfer. It could > be an exaggerated expectation from my side; I'd just better be safe than > sorry :) > > Let me give some brief intro on SEV (Secure Encrypted Virtualization) and t= hen we can discuss a security model (if needed). SEV is an extension to the AMD-V architecture which supports running encryp= ted virtual machines (VMs) under the control of a hypervisor. Encrypted VMs hav= e their pages (code and data) secured such that only the guest itself has access to= the unencrypted version. Each encrypted VMs is associated with a unique encrypt= ion key; if its data is accessed by a different entity using a different key th= e encrypted guest data will be incorrectly decrypted, leading to unintelligib= le data. You can also find some detail for SEV in white paper [1]. SEV encrypted Vs have the concept of private and shared memory. The private= memory is encrypted with the guest-specific key, while shared memory may be encryp= ted with hypervisor key. SEV guest VMs can choose which pages they would like t= o be private. But the instruction pages and guest page tables are always trea= ted as private by the hardware. The DMA operation inside the guest must be perf= ormed on shared pages -- this is mainly because in virtualization world the devic= e DMA needs some assistance from the hypervisor. The security model provided by the SEV ensure that hypervisor will no longe= r able to inspect or alter any guest code or data. The guest OS controls what it w= ant to share with outside world (in this case with Hypervisor). In software implementation we took approach to encrypt everything possible = starting early in boot. In this approach whenever OVMF wants to perform certain DMA = operations it allocate a shared page, issues the command, free the shared page after t= he command is completed (in other word we use sw bounce buffer to complete the DMA ope= ration). We have implemented IOMMU driver to provide the following functions: AllocateBuffer(): -------------------- it allocate a private pages, as per UEFI spec the driver will map the buffe= r allocated from this routine using BusMasterCommonBuffer hence we allocate extra stash= pages in addition to requested pages. Map ---- If requested operation is BusMasterRead or BusMasterWrite then we allocate = a shared buffer and DeviceAddress point to shared buffer. If requested operation is BusMasterCommonBuffer then we perform in-place de= cryption of the contents and update the page-table to clear the C-bit (basically make it sh= ared page) Unmap ------ If operation was BusMasterRead or BusMasterWrite then we complete the unmap= ping and free the shared buffer allocated in Map(). If operation was BusMasterCommonBuffer then we perform in-place encryption = and set the C-bit (basically make the page private) FreeBuffer: ----------- Free the pages Lets run with the below scenario: 1) guest marks a page as "shared" and gives its physical address to HV (e.g= DMA read) 2) hypervisor completes the request operation 3) hypervisor caches the shared physical address and monitor it for informa= tion leak 4) sometime later if guest write data in its "shared" physical address then= hypervisor can easily read it without guest knowledge. SEV hardware can protect us against the attack where someone tries to injec= t or alter the guest code. As I noted above any instruction fetch is forced as private hen= ce if attacker write some code into a shared buffer and point the RIP to his/her code then= instruction fetch will try to decrypt it and get unintelligible opcode. If a page was m= arked as "shared" then its guest responsibility to make it "private" after its done communica= ting with hypervisor. [1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memo= ry_Encryption_Whitepaper_v7-Public.pdf >> If the C-bit is cleared for a read/write buffer, is the content in the >> read/write buffer also exposed to hypervisor? > > Not immediately. Only when the guest also rewrites the memory through > the appropriate virtual address. > > Basically, in the virtual machine, the C-bit in a PTE that covers a > particular guest virtual address (GVA) controls whether a guest write > through that GVA will result in memory encryption, and if a gues read > through that GVA will result in memory decryption. > > The current state of the C-bit doesn't matter for the hypervisor, what > matters is the last guest write through a GVA whose PTE had the C-bit > set, or clear. If the C-bit was clear when the guest last wrote the > area, then the hypervisor can read the data. If the C-bit was set, then > the hypervisor can only read garbage. > > >> I means if there is security concern, the concern should be applied to >> both common buffer and read/write buffer. >> Then we have to figure a way to resolve both buffer. > > Yes, this is correct. The PciIo.Unmap operation, as currently > implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption > correctly for all operations, but it can only guarantee *not* freeing > memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices > is safe, while Unmap()ing Read/Write is not. The encryption would be > re-established just fine for Read/Write as well, but we would change the > UEFI memmap. > > In OVMF, we currently manage this problem by making all asynchronous DMA > mappings CommonBuffer, even if they could othewise be Read or Write. We > use Read and Write only if the DMA operation completes before the > higher-level protocol function returns (so we can immediately Unmap the > operation, and the ExitBootServices() handler doesn't have to care). > > >> To be honest, that is exactly my biggest question on this patch: >> why do we only handle the common buffer specially? > > For the following reason: > > - Given that CommonBuffer mappings take a separate AllocateBuffer() / > FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo > implementors -- beyond what the UEFI spec requries -- to keep *all* > dynamic memory management out of Map/Unmap. If they need dynamic memory > management, we ask them to do it in AllocateBuffer/FreeBuffer instead. > This way Unmap is safe in ExitBootServices handlers. > > - We cannot *reasonably ask* PciIo implementors to provide the same > guarantee for Read and Write mappings, simply because there are no other > APIs that could manage the dynamic memory for Map and Unmap -- > AllocateBuffer and FreeBuffer are not required for Read and Write > mappings. PciIo implementors couldn't agree to our request even if they > wanted to. Therefore Unmapping Read/Write operations is inherently > unsafe in EBS handlers. > > >>> NOTE: The device should still be halt state, because the device is >>> also controlled by OS. >> >> This is the key point -- the device is *not* controlled by the guest OS, >> in the worst case. >> >> The virtual device is ultimately implemented by the hypervisor. We don't >> expect the virtual device (=3D the hypervisor) to mis-behave on purpose. >> However, if the hypervisor is compromised by an external attacker (for >> example over the network, or via privilege escalation from another >> guest), then all guest RAM that has not been encrypted up to that point >> becomes visible to the attacker. In other words, the hypervisor ("the >> device") becomes retro-actively malicious. >> [Jiewen] If that is the threat model, I have a question on the exposure: >> 1) If the concern is for existing data, it means all DMA data already wr= itten. >> However, the DMA data is already consumed or produced by virtual device >> or a hypervisor. If the hypervisor is attacked, it already gets all the = data content. > > Maybe, maybe not. The data may have been sent to a different host over > the network, and wiped from memory. > > Or, the data may have been written to a disk image that is separately > encrypted by the host OS, and has been detached (unplugged) from the > guest, and also unmounted from the host, with the disk key securely > wiped from host memory. > > We can also imagine the reverse direction. Assume that the user of the > virtual machine goes into the UEFI shell in OVMF, starts the EDIT > program, and types some secret information into a text file on the ESP. > Further assume that the disk image is encrypted on the host OS. It is > conceivable that fragments of the secret could remain stuck in the AHCI > (disk) and USB (keyboard) DMA buffers. > > Maybe you think that these are "made up" examples, and I agree, I just > made them up. The point is, it is not my place to discount possible > attack vectors (I haven't designed SEV). Attackers will be limited by > their *own* imaginations only, not by mine :) > > >> 2) if the concern is for future data, it means all user data to be writt= en. >> However, the C-bit already prevent that. > > As far as I understand SEV, future data is out of scope. The goal is to > prevent *retroactive* guest data leaks (=3D whatever is currently in host > OS memory) if an attacker compromises an otherwise non-malicious hypervis= or. > > If an attacker compromises a virtualization host, then they can just sit > silent and watch data flow into and out of guests from that point > onward, because they can then monitor all DMA (which always happens in > clear-text) real-time. > > >> Maybe I do not fully understand the threat model defined. >> If you can explain more, that would be great. > > Well I tried to explain *my* understanding of SEV :) I hope Brijesh will > correct me if I'm wrong. > > >> The point of SEV is to keep as much guest data encrypted at all times as >> possible, so that *whenever* the hypervisor is compromised by an >> attacker, the guest memory -- which the attacker can inevitably dump >> from the host side -- remains un-intellegible to the attacker. >> [Jiewen] OK. If this is a security question, I suggest we define a clear >> threat model at first. >> Or what we are doing might be unnecessary or insufficient. > > I agree. > > As I said above, my operating principle has been to re-encrypt all DMA > buffers as soon as possible. For long-standing DMA buffers, re-encrypt > them at ExitBootServices at the latest. > > >> [Jiewen] For "require that Unmap() work for CommonBuffer >> operations without releasing memory", I would hold my opinion until >> it is documented clearly in UEFI spec. > > You are right, of course. > > It's just that we cannot avoid exploring ways, for this feature, that > currently fall outside of the spec. Whatever we do here will be outside > of the spec, one way or another. I suggested what I thought could be a > reasonable extension to the spec, one that could be implemented by PciIo > implementors even before the spec is modified. > > edk2's PciIo.Unmap already behaves like this, without breaking the spec. > > If there's a better way -- for example modifying CoreExitBootServices() > in edk2, to signal IOMMU drivers separately, *after* notifying other > ExitBootServices() handlers --, that might work as well. > > >> My current concern is: >> First, this sentence is NOT defined in UEFI specification. > > Correct. > > >> Second, it might break an existing implementation or existing feature, s= uch as tracing. > > Correct. > >> Till now, I did not see any restriction in UEFI spec say: In this functi= on, you are not allowed >> to call memory services. >> The only restriction is >> 1) TPL restriction, where memory service must be <=3D TPL_NOTIFY. >> 2) AP restriction, where no UEFI service/protocol is allowed for AP. > > I agree. > > >> I'm just trying to operate with the APIs currently defined by the UEFI >> spec, and these assumptions were the best I could come up with. >> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec. >> Especially the IHV Option ROM should not consume any private API. > > I disagree about "only follow". If there are additional requirements > that can be satisfied without breaking the spec, driver implementors can > choose to conform to both sets of requirements. > > For example (if I understand correctly), Microsoft / Windows present a > bunch of requirements *beyond* the UEFI spec, for both platform and > add-on card firmware. Most vendors conform :) > > >>> [Jiewen] I am not sure who will control "When control is transferred to= the OS, all >>> guest memory should be encrypted again, even those areas that were once >>> used as CommonBuffers." >>> For SEV case, I think it should be controlled by OS, because it is just= CR3. >> >> If it was indeed just CR3, then I would fully agree with you. >> >> But -- to my understanding --, ensuring that the memory is actually >> encrypted requires that the guest *write* to the memory through a >> virtual address whose PTE has the C-bit set. >> >> And I don't think the guest OS can be expected to rewrite all of its >> memory at launch. :( >> >> [Jiewen] That is fine. >> I suggest we get clear on the threat model as the first step. >> The threat model for SEV usage in OVMF. >> >> I am sorry if that is already discussed before. I might ignore the conve= rsation. > > No problem; it's always good to re-investigate our assumptions and > operating principles. > > >> If you or any SEV owner can share the insight, that will be great. >> See my question above "If that is the threat model, I have a question on= the exposure:..." > > I shared *my* impressions of the threat model (which are somewhat > unclear, admittedly, and thus could make me overly cautious). > > I hope Brijesh can extend and/or correct my description. > > >>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU >>> are opposites -- VT-d permits all DMA unless configured otherwise, whil= e >>> SEV forbids all DMA unless configured otherwise. >>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to d= isable all DMA access. >>> I setup translation table, but mark all entry to be NOT-PRESENT. >>> I grant DMA access only if there is a specific request by a device. >>> >>> I open DMA access in ExitBootServices, just want to make sure it is com= patible to >>> the existing OS, which might not have knowledge on IOMMU. >>> I will provide a patch to make it a policy very soon. As such VTd IOMMU= may >>> turn off IOMMU, or keep it enabled at ExitBootService. >>> An IOMMU aware OS may take over IOMMU directly and reprogram it. >> >> Thanks for the clarification. >> >> But, again, will this not lead to the possibility that the VT-d IOMMU's >> ExitBootServices() handler disables all DMA *before* the PCI drivers get >> a chance to run their own ExitBootServices() handlers, disabling the >> individual devices? >> [Jiewen] Sorry for the confusing. Let me explain: >> No, VTd never disables *all* DMA buffer at ExitBootService. >> >> "disable VTd" means to "disable IOMMU protection" and "allow all DMA". >> "Keep VTd enabled" means to "keep IOMMU protection enabled" and >> "only allow the DMA from Map() request". > > Okay, but this interpretation was exactly what I thought of first (see > above): "VT-d permits all DMA unless configured otherwise". You answered > that it wasn't the case. > > So basically, if I understand it correctly now, at ExitBootServices the > VT-d IOMMU driver opens up all RAM for DMA access. Is that correct? > > That is equivalent to decrypting all memory under SEV, and is the exact > opposite of what we want. (As I understand it.) > > >> If that happens, then a series of IOMMU faults could be generated, which >> I described above. (I.e., such IOMMU faults could result, at least in >> the case of SEV, in garbage being written to disk, via queued commands.) >> [Jiewen] You are right. That would not happen. >> IOMMU driver should not bring any compatibility problem for the PCI driv= er, >> iff the PCI driver follows the UEFI specification. > > Agreed. > > >> Now, to finish up, here's an idea I just had. >> >> Are we allowed to call gBS->SignalEvent() in an ExitBootServices() >> notification function? >> >> If we are, then we could do the following: >> >> * PciIo.Unmap() and friends would work as always (no restrictions on >> dynamic memory allocation or freeing, for any kind of DMA operation). >> >> * PCI drivers would not be expected to call PciIo.Unmap() in their >> ExitBootServices() handlers. >> >> * The IOMMU driver would have an ExitBootServices() notification >> function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level >> (doesn't matter which). >> >> * In this notification function, the IOMMU driver would signal *another* >> event (a private one). The notification function for this event would >> be enqueued strictly at the TPL_CALLBACK level. >> >> * The notification function for the second event (private to the IOMMU) >> would iterate over all existent mappings, and unmap them, without >> allocating or freeing memory. >> >> The key point is that by signaling the second event, the IOMMU driver >> could order its own cleanup code after all other ExitBootServices() >> callbacks. (Assuming, at least, that no other ExitBootServices() >> callback employs the same trick to defer itself!) Therefore by the time >> the second callback runs, all PCI devices have been halted, and it is >> safe to tear down the translations. >> >> The ordering would be ensured by the following: >> >> - The UEFI spec says under CreateEventEx(), "All events are guaranteed >> to be signaled before the first notification action is taken." This >> means that, by the time the IOMMU's ExitBootServices() handler is >> entered, all other ExitBootServices() handlers have been *queued* at >> least, at TPL_CALLBACK or TPL_NOTIFY. >> >> - Therefore, when we signal our second callback from there, for >> TPL_CALLBACK, the second callback will be queued at the end of the >> TPL_CALLBACK queue. >> >> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is >> functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag >> for the Type argument of CreateEvent." So it wouldn't matter if a >> driver used CreateEvent() or CreateEventEx() for setting up its own >> handler, the handler would be queued just the same. >> >> I think this is ugly and brittle, but perhaps the only way to clean up >> *all* translations safely, with regard to PciIo.Unmap() + >> ExitBootServices(), without updating the UEFI spec. >> >> What do you think? >> >> [Jiewen] If the order is correct, and all PCI device driver is halt at E= xitBootServices, that works. >> We do not need update PCI driver and we do not need update UEFI spec. >> We only need update IOMMU driver which is concerned, based upon the thre= at model. >> That probably is best solution. :-) > > I'm very glad to hear that you like the idea. > > However, it depends on whether we are permitted, by the UEFI spec, to > signal another event in an ExitBootServices() notification function. > > Are we permitted to do that? > > Does the UEFI spec guarantee that the notification function for the > *second* event will be queued just like it would be under "normal" > circumstances? > > (I know we must not allocate or free memory in the notification function > of the *second* event either; I just want to know if the second event's > handler is *queued* like it would normally be.) > > >> I assume you want to handle both common buffer and read/write buffer, ri= ght? > > Yes. Under this idea, all kinds of operations would be cleaned up. > > >> And if possible, I still have interest to get clear on the threat model = for SEV in OVMF. >> If you or any SEV owner can share ... > > Absolutely. Please see above. > > Thank you! > Laszlo >