From: Brijesh Singh <brijesh.singh@amd.com>
To: Laszlo Ersek <lersek@redhat.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: brijesh.singh@amd.com,
"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
Tom Lendacky <thomas.lendacky@amd.com>,
Jordan Justen <jordan.l.justen@intel.com>,
Jason Wang <jasowang@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
Gerd Hoffmann <kraxel@redhat.com>
Subject: Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Date: Fri, 28 Jul 2017 11:00:30 -0500 [thread overview]
Message-ID: <217545ac-962d-089f-9c9a-d2bbfca6427e@amd.com> (raw)
In-Reply-To: <e1137140-f5d5-7eb8-0162-952694b24f96@redhat.com>
On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
> On 07/27/17 21:00, Brijesh Singh wrote:
>> On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
>>> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:
>
>>>> Normally, Map allocates the bounce buffer internally, and Unmap
>>>> releases the bounce buffer internally (for BusMasterRead and
>>>> BusMasterWrite), which is not right here. If you use
>>>> AllocateBuffer() separately, then Map() -- with
>>>> BusMasterCommonBuffer -- will not allocate internally, and Unmap()
>>>> will also not deallocate internally. So, in the ExitBootServices()
>>>> callback, you will be able to call Unmap() only, and then *not* call
>>>> FreeBuffer() at all.
>>>>
>>>> This is why I suggest introducing all four functions (Allocate,
>>>> Free, Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio
>>>> drivers should use all four functions explicitly, not just Map and
>>>> Unmap.
>>>>
>>>> ... Actually, I think there is a bug in
>>>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following
>>>> distribution of operations at the moment:
>>>>
>>>> - IoMmuAllocateBuffer() allocates pages and clears the memory
>>>> encryption mask
>>>>
>>>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and
>>>> deallocates pages
>>>>
>>>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is
>>>> requested (and IoMmuAllocateBuffer() was called previously).
>>>> Otherwise, IoMmuMap() allocates pages, allocates MAP_INFO, and
>>>> clears the memory encryption mask.
>>>>
>>>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>>> operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>>> encryption mask, and frees both the pages and MAP_INFO.
>>>>
>>
>> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
>> introduce list to track if the buffer was allocated by us. If buffer
>> was allocated by us then we can safely say that it does not require a
>> bounce buffer. While implementing the initial code I was not able to
>> see BusMasterCommonBuffer usage. But with virtio things are getting a
>> bit more clear in my mind.
>
> The purpose of the internal list is a little bit different -- it is a
> "free list", not a tracker list.
>
> Under the proposed scheme, a MAP_INFO structure will have to be
> allocated for *all* mappings: both for CommonBuffer (where the actual
> pages come from the caller, who retrieved them earlier with an
> AllocateBuffer() call), and for Read/Write (where the actual pages will
> be allocated internally to Map). Allocating the MAP_INFO structure in
> Map() is necessary in *both* cases because the Unmap() function can
> *only* consult this MAP_INFO structure to learn the address and the size
> at which it has to re-set the memory encryption mask. This is because
> the Unmap() function gets no other input parameter.
>
> Then, regardless of the original operation (CommonBuffer vs.
> Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
> CommonBuffer, the actual pages are owned by the caller, so we don't free
> them here; for Read/Write, the actual pages are owned by Map/Unmap, so
> we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
> point is that MAP_INFO cannot be released with FreePool() because that
> could change the UEFI memmap during ExitBootServices() processing. So
> MAP_INFO has to be "released" to an internal *free* list.
>
> And since we have an internal free list, the original MAP_INFO
> allocation in Map() should consult the free list first, and only if it
> is empty should it fall back to AllocatePool().
>
> Whether the actual pages tracked by MAP_INFO were allocated internally
> by Map(), or externally by the caller (via AllocateBuffer()) is an
> independent question. (And it can be seen from the MAP_INFO.Operation
> field.)
>
> Does this make it clearer?
>
It makes sense. thanks
One question, do we need to return EFI_NOT_SUPPORTED error when we get
request to map a buffer with CommonBuffer but the buffer was not
allocated using "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"?
At least as per spec, it seems if caller wants to map a buffer with
CommonBuffer then it must have called "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"
>>
>>>> This distribution of operations seems wrong. The key point is that
>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>> usable, and that client code is required to call Map()
>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>> operation. Therefore, the right distribution of operations is:
>>>>
>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>> encryption mask..
>>>>
>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>> encryption mask.
>>>>
>>
>> Actually one of main reason why we cleared and restored the memory
>> encryption mask during allocate/free is because we also consume the
>> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a DMA
>> buffer. I am certainly open to suggestions.
>
> Argh. That's again my fault then, I should have noticed it. :( I
> apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
> for me as well.
>
> As discussed earlier (and confirmed by Ard), calling *just*
> AllocateBuffer() is never enough, Map()/Unmap() are always necessary.
>
> So here's what I suggest (to keep the changes localized):
>
> - Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
> function to output a "VOID *Mapping" parameter as well, in addition to
> the address.
>
> - Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
> function to take a "VOID *Mapping" parameter in addition to the buffer
> address.
>
> - In the DXE implementation of InternalQemuFwCfgSevDmaAllocateBuffer(),
> keep the current IOMMU AllocateBuffer() call, but also call IOMMU
> Map(), with CommonBuffer. Furthermore, propagate the Mapping output of
> Map() outwards. (Note that Map() may have to be called in a loop,
> dependent on "NumberOfBytes".)
>
> - In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(), call
> the IOMMU Unmap() function first (using the new Mapping parameter).
>
I will update the code and send patch for review. thanks
>>
>> [1]
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>
>> [2]
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>
>>
>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>> requested, and it allocates pages (bounce buffer) otherwise.
>>>>
>>
>> I am trying to wrap my head around how we can support
>> BusMasterCommonBuffer when buffer was not allocated by us. Changing
>> the memory encryption mask in a page table will not update the
>> contents.
>
> Thank you for the clarification. I've now tried to review the current
> code for a better understanding. Are the below statements correct?
>
> - For the guest, guest memory is always readable transparently.
> - For the host, guest memory is always readable *as it was written
> last*.
> - If the guest memory was last written with the C bit clear, then the
> host can read it as plaintext (regardless of the current state of the
> C bit).
> - If the guest memory was last written with the C bit set, then the host
> can only read garbage (regardless of the current state of the C bit).
>
Your understanding is correct.
> *If* this is the case, then:
>
> (a) We already have a sort of guest->host information leak. Namely, in
> IoMmuFreeBuffer() [OvmfPkg/IoMmuDxe/AmdSevIoMmu.c], the C bit is
> restored, and the pages are released with gBS->FreePages(). However, the
> pages being released are not rewritten in place (they are not actually
> re-encrypted, only marked for encryption whenever they are next written
> to). This means that pages released like this remain readable to the
> hypervisor for an unpredictable time.
>
> This is not necessarily a critical problem (after all the contents of
> those pages were visible to the hypervisor at some point anyway!); but
> in the longer term, such pages could accumulate, and if the hypervisor
> is compromised later, it could potentially read an unbounded amount of
> "past guest data" as plain-text.
>
Very good catch, I can *zero* the pages. IIRC, I did not consider doing
so because it may introduce unnecessary perform hit (mainly when dealing
with larger pages). I think when we load kernel or initrd using QemuFwCfg
DMA interface we allocate large buffers.
I will still go ahead and experiment *zeroing* page and measure the performance
impact before we integrate the solution.
> (b) Plus, approaching the question from the Map() direction, we need to
> consider two scenarios:
>
> - Client code calls AllocateBuffer(), then Map(), and it writes to the
> buffer only then. This should be safe.
> - client code calls AllocateBuffer(), writes to it, and then calls
> Map(). This will result in memory contents that look like garbage to
> the hypervisor. Bad.
>
> I can imagine the following to handle these cases: in the Map() and
> Unmap() functions, we have to decrypt and encrypt the memory contents
> in-place, after changing the C bit (without allocating additional
> memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes (this
> will always remain in encrypted memory). Update the C bit with a single
> function call for the entire range (like now) -- this will not affect
> the guest-readability of the pages --, then bounce each page within the
> range to the static buffer and back to its original place. In effect
> this will in-place encrypt or decrypt the memory, and will be faster
> than a byte-wise rewrite.
>
> * BusMasterRead (host will want to read guest memory):
> - Client calls Map() without calling AllocateBuffer() first. Map()
> allocates an internal bounce buffer, clears the C bit, and does the
> copying.
> - Client fires off the PCI transaction.
> - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
> restores the C-bit, *zeros* the pages, and releases the pages.
>
Yes this will work just fine (see my previous comments on *zeroing* pages)
> * BusMasterWrite (host will want to write to guest memory):
> - Client calls Map() without calling AllocateBuffer() first. Map()
> allocates an internal bounce buffer and clears the C bit.
> - Client fires off the PCI transaction.
> - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
> copies the bounce buffer to crpyted (client-owned) memory, restores
> the C-bit, *zeros* the pages, and releases the pages.
>
Yes, this will work just fine (see my previous comments on *zeroing* pages)
> * BusMasterCommonBuffer:
> - Client calls AllocateBuffer(), and places some data in the returned
> memory.
> - Client calls Map(). Map() clears the C bit in one fell swoop, and
> then decrypts the buffer in-place (by bouncing it page-wise to the
> static array and back).
> - Client communicates with the device.
> - Client calls Unmap(). Unmap() restores the C bit in one fell swoop,
> and encrypts the buffer in-place (by bouncing it page-wise to the
> static array and back).
> - Client reads some residual data from the buffer.
> - Client calls FreeBuffer(). FreeBuffer() relases the pages.
>
Yes this works fine as long as the client uses EFI_PCI_IO_PROTOCOL.AllocateBuffer()
to allocate the buffer.
> This is suitable for the discussed ExitBootServices() callback update
> too, where the callback will reset the virtio device (like now) and then
> call Unmap() for all shared areas, without calling FreeBuffer() on them.
> The above logic will re-encrypt the contents without affecting the UEFI
> memmap.
>
>> Also since the memory encryption mask works on PAGE_SIZE hence
>> changing the encryption mask on not our allocated buffer could mess
>> things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
>
> This is not a problem for BusMasterRead and BusMasterWrite because for
> those operations, Map() allocates the internal bounce buffer. Also not a
> problem for BusMasterCommonBuffer, because then the client first calls
> AllocateBuffer (whole number of pages), and so Map() can round up the
> number of bytes and de-crypt full pages in-place (see above).
>
>>
>>>> *Regardless* of BusMaster operation, the following actions are
>>>> carried out unconditionally:
>>>>
>>>> . the memory encryption mask is cleared in this function (and in
>>>> this function only),
>>>>
>>>> . An attempt is made to grab a MAP_INFO structure from an
>>>> internal free list (to be introduced!). The head of the list is
>>>> a new static variable. If the free list is empty, then a
>>>> MAP_INFO structure is allocated with AllocatePool(). The
>>>> NO_MAPPING macro becomes unused and can be deleted from the
>>>> source code.
>>>>
>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For
>>>> this, it has to consult the MAP_INFO structure that is being
>>>> passed in from the caller.) In addition:
>>>>
>>>> . If MapInfo->Operation is BusMasterCommonBuffer, then we know
>>>> the allocation was done separately in AllocateBuffer, so we do
>>>> not release the pages. Otherwise, we do release the pages.
>>>>
>>>> . MapInfo is linked back on the internal free list (see above).
>>>> It is *never* released with FreePool().
>>>>
>>>> This approach guarantees that IoMmuUnmap() can de-program the
>>>> IOMMU (= re-set the memory encryption mask) without changing the
>>>> UEFI memory map. (I trust that MemEncryptSevSetPageEncMask() will
>>>> not split page tables internally when it *re*sets the encryption
>>>> mask -- is that correct?)
>>
>> Yes, MemEncryptSevSetPageEncMask() will not split pages when it
>> restores mask.
>
> Great, thanks.
> Laszlo
>
next prev parent reply other threads:[~2017-07-28 15:58 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-19 22:09 [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
2017-07-19 22:09 ` [RFC v1 1/3] OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit Brijesh Singh
2017-07-19 22:09 ` [RFC v1 2/3] OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support Brijesh Singh
2017-07-19 22:09 ` [RFC v1 3/3] OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support Brijesh Singh
[not found] ` <62320c1a-0cec-947c-8c63-5eb0416e4e33@redhat.com>
2017-07-21 11:17 ` [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
[not found] ` <20170722024318-mutt-send-email-mst@kernel.org>
2017-07-24 8:25 ` Gerd Hoffmann
2017-07-25 18:17 ` Laszlo Ersek
2017-07-25 23:42 ` Brijesh Singh
[not found] ` <904dae9f-e515-01ba-e16f-6561616c78af@redhat.com>
2017-07-26 15:30 ` Laszlo Ersek
2017-07-27 14:21 ` Brijesh Singh
2017-07-27 17:16 ` Laszlo Ersek
2017-07-27 17:56 ` Ard Biesheuvel
2017-07-27 19:00 ` Brijesh Singh
2017-07-27 20:55 ` Brijesh Singh
2017-07-27 21:31 ` Ard Biesheuvel
2017-07-27 21:38 ` Andrew Fish
2017-07-27 22:13 ` Brijesh Singh
2017-07-27 22:10 ` Brijesh Singh
2017-07-28 8:39 ` Ard Biesheuvel
2017-07-28 15:27 ` Laszlo Ersek
2017-07-28 13:38 ` Laszlo Ersek
2017-07-28 16:00 ` Brijesh Singh [this message]
2017-07-28 16:16 ` Laszlo Ersek
2017-07-28 19:21 ` Laszlo Ersek
2017-07-28 19:59 ` Laszlo Ersek
2017-07-29 0:52 ` Brijesh Singh
2017-07-29 1:37 ` Brijesh Singh
2017-07-31 18:20 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=217545ac-962d-089f-9c9a-d2bbfca6427e@amd.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox