From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id CD9A521E2570A for ; Fri, 28 Jul 2017 06:36:55 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 30B17287EDA; Fri, 28 Jul 2017 13:39:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 30B17287EDA Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lersek@redhat.com Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-27.phx2.redhat.com [10.3.116.27]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5AE7E17F50; Fri, 28 Jul 2017 13:38:53 +0000 (UTC) To: Brijesh Singh , Ard Biesheuvel Cc: "edk2-devel@lists.01.org" , Tom Lendacky , Jordan Justen , Jason Wang , "Michael S . Tsirkin" , Gerd Hoffmann References: <1500502151-13508-1-git-send-email-brijesh.singh@amd.com> <841bec5f-6f6e-8b1f-25ba-0fd37a915b72@redhat.com> <4e2fc623-3656-eea7-09a8-b5c6d2f694e1@amd.com> <4071596d-32c9-e6d9-8c93-0d43d28e9b5a@redhat.com> From: Laszlo Ersek Message-ID: Date: Fri, 28 Jul 2017 15:38:52 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 28 Jul 2017 13:39:00 +0000 (UTC) Subject: Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jul 2017 13:36:56 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 07/27/17 21:00, Brijesh Singh wrote: > On 07/27/2017 12:56 PM, Ard Biesheuvel wrote: >> On 27 July 2017 at 18:16, Laszlo Ersek wrote: >>> Normally, Map allocates the bounce buffer internally, and Unmap >>> releases the bounce buffer internally (for BusMasterRead and >>> BusMasterWrite), which is not right here. If you use >>> AllocateBuffer() separately, then Map() -- with >>> BusMasterCommonBuffer -- will not allocate internally, and Unmap() >>> will also not deallocate internally. So, in the ExitBootServices() >>> callback, you will be able to call Unmap() only, and then *not* call >>> FreeBuffer() at all. >>> >>> This is why I suggest introducing all four functions (Allocate, >>> Free, Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio >>> drivers should use all four functions explicitly, not just Map and >>> Unmap. >>> >>> ... Actually, I think there is a bug in >>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following >>> distribution of operations at the moment: >>> >>> - IoMmuAllocateBuffer() allocates pages and clears the memory >>> encryption mask >>> >>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and >>> deallocates pages >>> >>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is >>> requested (and IoMmuAllocateBuffer() was called previously). >>> Otherwise, IoMmuMap() allocates pages, allocates MAP_INFO, and >>> clears the memory encryption mask. >>> >>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer >>> operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the >>> encryption mask, and frees both the pages and MAP_INFO. >>> > > I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And > introduce list to track if the buffer was allocated by us. If buffer > was allocated by us then we can safely say that it does not require a > bounce buffer. While implementing the initial code I was not able to > see BusMasterCommonBuffer usage. But with virtio things are getting a > bit more clear in my mind. The purpose of the internal list is a little bit different -- it is a "free list", not a tracker list. Under the proposed scheme, a MAP_INFO structure will have to be allocated for *all* mappings: both for CommonBuffer (where the actual pages come from the caller, who retrieved them earlier with an AllocateBuffer() call), and for Read/Write (where the actual pages will be allocated internally to Map). Allocating the MAP_INFO structure in Map() is necessary in *both* cases because the Unmap() function can *only* consult this MAP_INFO structure to learn the address and the size at which it has to re-set the memory encryption mask. This is because the Unmap() function gets no other input parameter. Then, regardless of the original operation (CommonBuffer vs. Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For CommonBuffer, the actual pages are owned by the caller, so we don't free them here; for Read/Write, the actual pages are owned by Map/Unmap, so we free them in Unmap() *in addition* to recycling MAP_INFO.) The main point is that MAP_INFO cannot be released with FreePool() because that could change the UEFI memmap during ExitBootServices() processing. So MAP_INFO has to be "released" to an internal *free* list. And since we have an internal free list, the original MAP_INFO allocation in Map() should consult the free list first, and only if it is empty should it fall back to AllocatePool(). Whether the actual pages tracked by MAP_INFO were allocated internally by Map(), or externally by the caller (via AllocateBuffer()) is an independent question. (And it can be seen from the MAP_INFO.Operation field.) Does this make it clearer? > >>> This distribution of operations seems wrong. The key point is that >>> AllocateBuffer() *need not* result in a buffer that is immediately >>> usable, and that client code is required to call Map() >>> *unconditionally*, even if BusMasterCommonBuffer is the desired >>> operation. Therefore, the right distribution of operations is: >>> >>> - IoMmuAllocateBuffer() allocates pages and does not touch the >>> encryption mask.. >>> >>> - IoMmuFreeBuffer() deallocates pages and does not touch the >>> encryption mask. >>> > > Actually one of main reason why we cleared and restored the memory > encryption mask during allocate/free is because we also consume the > IOMMU protocol in QemuFwCfgLib as a method to allocate and free a DMA > buffer. I am certainly open to suggestions. Argh. That's again my fault then, I should have noticed it. :( I apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first" for me as well. As discussed earlier (and confirmed by Ard), calling *just* AllocateBuffer() is never enough, Map()/Unmap() are always necessary. So here's what I suggest (to keep the changes localized): - Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer() function to output a "VOID *Mapping" parameter as well, in addition to the address. - Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer() function to take a "VOID *Mapping" parameter in addition to the buffer address. - In the DXE implementation of InternalQemuFwCfgSevDmaAllocateBuffer(), keep the current IOMMU AllocateBuffer() call, but also call IOMMU Map(), with CommonBuffer. Furthermore, propagate the Mapping output of Map() outwards. (Note that Map() may have to be called in a loop, dependent on "NumberOfBytes".) - In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(), call the IOMMU Unmap() function first (using the new Mapping parameter). > > [1] > https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159 > > [2] > https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197 > > >>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is >>> requested, and it allocates pages (bounce buffer) otherwise. >>> > > I am trying to wrap my head around how we can support > BusMasterCommonBuffer when buffer was not allocated by us. Changing > the memory encryption mask in a page table will not update the > contents. Thank you for the clarification. I've now tried to review the current code for a better understanding. Are the below statements correct? - For the guest, guest memory is always readable transparently. - For the host, guest memory is always readable *as it was written last*. - If the guest memory was last written with the C bit clear, then the host can read it as plaintext (regardless of the current state of the C bit). - If the guest memory was last written with the C bit set, then the host can only read garbage (regardless of the current state of the C bit). *If* this is the case, then: (a) We already have a sort of guest->host information leak. Namely, in IoMmuFreeBuffer() [OvmfPkg/IoMmuDxe/AmdSevIoMmu.c], the C bit is restored, and the pages are released with gBS->FreePages(). However, the pages being released are not rewritten in place (they are not actually re-encrypted, only marked for encryption whenever they are next written to). This means that pages released like this remain readable to the hypervisor for an unpredictable time. This is not necessarily a critical problem (after all the contents of those pages were visible to the hypervisor at some point anyway!); but in the longer term, such pages could accumulate, and if the hypervisor is compromised later, it could potentially read an unbounded amount of "past guest data" as plain-text. (b) Plus, approaching the question from the Map() direction, we need to consider two scenarios: - Client code calls AllocateBuffer(), then Map(), and it writes to the buffer only then. This should be safe. - client code calls AllocateBuffer(), writes to it, and then calls Map(). This will result in memory contents that look like garbage to the hypervisor. Bad. I can imagine the following to handle these cases: in the Map() and Unmap() functions, we have to decrypt and encrypt the memory contents in-place, after changing the C bit (without allocating additional memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes (this will always remain in encrypted memory). Update the C bit with a single function call for the entire range (like now) -- this will not affect the guest-readability of the pages --, then bounce each page within the range to the static buffer and back to its original place. In effect this will in-place encrypt or decrypt the memory, and will be faster than a byte-wise rewrite. * BusMasterRead (host will want to read guest memory): - Client calls Map() without calling AllocateBuffer() first. Map() allocates an internal bounce buffer, clears the C bit, and does the copying. - Client fires off the PCI transaction. - Client calls Unmap(), without calling FreeBuffer() later. Unmap() restores the C-bit, *zeros* the pages, and releases the pages. * BusMasterWrite (host will want to write to guest memory): - Client calls Map() without calling AllocateBuffer() first. Map() allocates an internal bounce buffer and clears the C bit. - Client fires off the PCI transaction. - Client calls Unmap(), without calling FreeBuffer() later. Unmap() copies the bounce buffer to crpyted (client-owned) memory, restores the C-bit, *zeros* the pages, and releases the pages. * BusMasterCommonBuffer: - Client calls AllocateBuffer(), and places some data in the returned memory. - Client calls Map(). Map() clears the C bit in one fell swoop, and then decrypts the buffer in-place (by bouncing it page-wise to the static array and back). - Client communicates with the device. - Client calls Unmap(). Unmap() restores the C bit in one fell swoop, and encrypts the buffer in-place (by bouncing it page-wise to the static array and back). - Client reads some residual data from the buffer. - Client calls FreeBuffer(). FreeBuffer() relases the pages. This is suitable for the discussed ExitBootServices() callback update too, where the callback will reset the virtio device (like now) and then call Unmap() for all shared areas, without calling FreeBuffer() on them. The above logic will re-encrypt the contents without affecting the UEFI memmap. > Also since the memory encryption mask works on PAGE_SIZE hence > changing the encryption mask on not our allocated buffer could mess > things up (e.g if NumberOfBytes is not PAGE_SIZE aligned). This is not a problem for BusMasterRead and BusMasterWrite because for those operations, Map() allocates the internal bounce buffer. Also not a problem for BusMasterCommonBuffer, because then the client first calls AllocateBuffer (whole number of pages), and so Map() can round up the number of bytes and de-crypt full pages in-place (see above). > >>> *Regardless* of BusMaster operation, the following actions are >>> carried out unconditionally: >>> >>> . the memory encryption mask is cleared in this function (and in >>> this function only), >>> >>> . An attempt is made to grab a MAP_INFO structure from an >>> internal free list (to be introduced!). The head of the list is >>> a new static variable. If the free list is empty, then a >>> MAP_INFO structure is allocated with AllocatePool(). The >>> NO_MAPPING macro becomes unused and can be deleted from the >>> source code. >>> >>> - IoMmuUnmap() clears the encryption mask unconditionally. (For >>> this, it has to consult the MAP_INFO structure that is being >>> passed in from the caller.) In addition: >>> >>> . If MapInfo->Operation is BusMasterCommonBuffer, then we know >>> the allocation was done separately in AllocateBuffer, so we do >>> not release the pages. Otherwise, we do release the pages. >>> >>> . MapInfo is linked back on the internal free list (see above). >>> It is *never* released with FreePool(). >>> >>> This approach guarantees that IoMmuUnmap() can de-program the >>> IOMMU (= re-set the memory encryption mask) without changing the >>> UEFI memory map. (I trust that MemEncryptSevSetPageEncMask() will >>> not split page tables internally when it *re*sets the encryption >>> mask -- is that correct?) > > Yes, MemEncryptSevSetPageEncMask() will not split pages when it > restores mask. Great, thanks. Laszlo