public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Brijesh Singh <brijesh.singh@amd.com>
To: Laszlo Ersek <lersek@redhat.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: brijesh.singh@amd.com,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Jordan Justen <jordan.l.justen@intel.com>,
	Jason Wang <jasowang@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>
Subject: Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Date: Fri, 28 Jul 2017 11:00:30 -0500	[thread overview]
Message-ID: <217545ac-962d-089f-9c9a-d2bbfca6427e@amd.com> (raw)
In-Reply-To: <e1137140-f5d5-7eb8-0162-952694b24f96@redhat.com>



On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
> On 07/27/17 21:00, Brijesh Singh wrote:
>> On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
>>> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:
> 
>>>> Normally, Map allocates the bounce buffer internally, and Unmap
>>>> releases the bounce buffer internally (for BusMasterRead and
>>>> BusMasterWrite), which is not right here. If you use
>>>> AllocateBuffer() separately, then Map() -- with
>>>> BusMasterCommonBuffer -- will not allocate internally, and Unmap()
>>>> will also not deallocate internally. So, in the ExitBootServices()
>>>> callback, you will be able to call Unmap() only, and then *not* call
>>>> FreeBuffer() at all.
>>>>
>>>> This is why I suggest introducing all four functions (Allocate,
>>>> Free, Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio
>>>> drivers should use all four functions explicitly, not just Map and
>>>> Unmap.
>>>>
>>>> ... Actually, I think there is a bug in
>>>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following
>>>> distribution of operations at the moment:
>>>>
>>>> - IoMmuAllocateBuffer() allocates pages and clears the memory
>>>>    encryption mask
>>>>
>>>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and
>>>>    deallocates pages
>>>>
>>>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is
>>>>    requested (and IoMmuAllocateBuffer() was called previously).
>>>>    Otherwise, IoMmuMap() allocates pages, allocates MAP_INFO, and
>>>>    clears the memory encryption mask.
>>>>
>>>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>>>    operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>>>    encryption mask, and frees both the pages and MAP_INFO.
>>>>
>>
>> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
>> introduce list to track if the buffer was allocated by us. If buffer
>> was allocated by us then we can safely say that it does not require a
>> bounce buffer. While implementing the initial code I was not able to
>> see BusMasterCommonBuffer usage. But with virtio things are getting a
>> bit more clear in my mind.
> 
> The purpose of the internal list is a little bit different -- it is a
> "free list", not a tracker list.
> 
> Under the proposed scheme, a MAP_INFO structure will have to be
> allocated for *all* mappings: both for CommonBuffer (where the actual
> pages come from the caller, who retrieved them earlier with an
> AllocateBuffer() call), and for Read/Write (where the actual pages will
> be allocated internally to Map). Allocating the MAP_INFO structure in
> Map() is necessary in *both* cases because the Unmap() function can
> *only* consult this MAP_INFO structure to learn the address and the size
> at which it has to re-set the memory encryption mask. This is because
> the Unmap() function gets no other input parameter.
> 
> Then, regardless of the original operation (CommonBuffer vs.
> Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
> CommonBuffer, the actual pages are owned by the caller, so we don't free
> them here; for Read/Write, the actual pages are owned by Map/Unmap, so
> we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
> point is that MAP_INFO cannot be released with FreePool() because that
> could change the UEFI memmap during ExitBootServices() processing. So
> MAP_INFO has to be "released" to an internal *free* list.
> 
> And since we have an internal free list, the original MAP_INFO
> allocation in Map() should consult the free list first, and only if it
> is empty should it fall back to AllocatePool().
> 
> Whether the actual pages tracked by MAP_INFO were allocated internally
> by Map(), or externally by the caller (via AllocateBuffer()) is an
> independent question. (And it can be seen from the MAP_INFO.Operation
> field.)
> 
> Does this make it clearer?
> 

It makes sense. thanks

One question, do we need to return EFI_NOT_SUPPORTED error when we get
request to map a buffer with CommonBuffer but the buffer was not
allocated using "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"?

At least as per spec, it seems if caller wants to map a buffer with
CommonBuffer then it must have called "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"


>>
>>>> This distribution of operations seems wrong. The key point is that
>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>> usable, and that client code is required to call Map()
>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>> operation. Therefore, the right distribution of operations is:
>>>>
>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>    encryption mask..
>>>>
>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>>    encryption mask.
>>>>
>>
>> Actually one of main reason why we cleared and restored the memory
>> encryption mask during allocate/free is because we also consume the
>> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a DMA
>> buffer. I am certainly open to suggestions.
> 
> Argh. That's again my fault then, I should have noticed it. :( I
> apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
> for me as well.
> 
> As discussed earlier (and confirmed by Ard), calling *just*
> AllocateBuffer() is never enough, Map()/Unmap() are always necessary.
> 
> So here's what I suggest (to keep the changes localized):
> 
> - Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
>    function to output a "VOID *Mapping" parameter as well, in addition to
>    the address.
> 
> - Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
>    function to take a "VOID *Mapping" parameter in addition to the buffer
>    address.
> 
> - In the DXE implementation of InternalQemuFwCfgSevDmaAllocateBuffer(),
>    keep the current IOMMU AllocateBuffer() call, but also call IOMMU
>    Map(), with CommonBuffer. Furthermore, propagate the Mapping output of
>    Map() outwards. (Note that Map() may have to be called in a loop,
>    dependent on "NumberOfBytes".)
> 
> - In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(), call
>    the IOMMU Unmap() function first (using the new Mapping parameter).
> 

I will update the code and send patch for review. thanks

>>
>> [1]
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>
>> [2]
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>
>>
>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>
>>
>> I am trying to wrap my head around how we can support
>> BusMasterCommonBuffer when buffer was not allocated by us. Changing
>> the memory encryption mask in a page table will not update the
>> contents.
> 
> Thank you for the clarification. I've now tried to review the current
> code for a better understanding. Are the below statements correct?
> 
> - For the guest, guest memory is always readable transparently.
> - For the host, guest memory is always readable *as it was written
>    last*.
> - If the guest memory was last written with the C bit clear, then the
>    host can read it as plaintext (regardless of the current state of the
>    C bit).
> - If the guest memory was last written with the C bit set, then the host
>    can only read garbage (regardless of the current state of the C bit).
> 

Your understanding is correct.

> *If* this is the case, then:
> 
> (a) We already have a sort of guest->host information leak. Namely, in
> IoMmuFreeBuffer() [OvmfPkg/IoMmuDxe/AmdSevIoMmu.c], the C bit is
> restored, and the pages are released with gBS->FreePages(). However, the
> pages being released are not rewritten in place (they are not actually
> re-encrypted, only marked for encryption whenever they are next written
> to). This means that pages released like this remain readable to the
> hypervisor for an unpredictable time.
> 
> This is not necessarily a critical problem (after all the contents of
> those pages were visible to the hypervisor at some point anyway!); but
> in the longer term, such pages could accumulate, and if the hypervisor
> is compromised later, it could potentially read an unbounded amount of
> "past guest data" as plain-text.
> 

Very good catch, I can *zero* the pages. IIRC, I did not consider doing
so because it may introduce unnecessary perform hit (mainly when dealing
with larger pages). I think when we load kernel or initrd using QemuFwCfg
DMA interface we allocate large buffers.

I will still go ahead and experiment *zeroing* page and measure the performance
impact before we integrate the solution.

  
> (b) Plus, approaching the question from the Map() direction, we need to
> consider two scenarios:
> 
> - Client code calls AllocateBuffer(), then Map(), and it writes to the
>    buffer only then. This should be safe.
> - client code calls AllocateBuffer(), writes to it, and then calls
>    Map(). This will result in memory contents that look like garbage to
>    the hypervisor. Bad.
> 
> I can imagine the following to handle these cases: in the Map() and
> Unmap() functions, we have to decrypt and encrypt the memory contents
> in-place, after changing the C bit (without allocating additional
> memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes (this
> will always remain in encrypted memory). Update the C bit with a single
> function call for the entire range (like now) -- this will not affect
> the guest-readability of the pages --, then bounce each page within the
> range to the static buffer and back to its original place. In effect
> this will in-place encrypt or decrypt the memory, and will be faster
> than a byte-wise rewrite.
> 
> * BusMasterRead (host will want to read guest memory):
>    - Client calls Map() without calling AllocateBuffer() first. Map()
>      allocates an internal bounce buffer, clears the C bit, and does the
>      copying.
>    - Client fires off the PCI transaction.
>    - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
>      restores the C-bit, *zeros* the pages, and releases the pages.
> 

Yes this will work just fine (see my previous comments on *zeroing* pages)

> * BusMasterWrite (host will want to write to guest memory):
>    - Client calls Map() without calling AllocateBuffer() first. Map()
>      allocates an internal bounce buffer and clears the C bit.
>    - Client fires off the PCI transaction.
>    - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
>      copies the bounce buffer to crpyted (client-owned) memory, restores
>      the C-bit, *zeros* the pages, and releases the pages.
> 

Yes, this will work just fine (see my previous comments on *zeroing* pages)

> * BusMasterCommonBuffer:
>    - Client calls AllocateBuffer(), and places some data in the returned
>      memory.
>    - Client calls Map(). Map() clears the C bit in one fell swoop, and
>      then decrypts the buffer in-place (by bouncing it page-wise to the
>      static array and back).
>    - Client communicates with the device.
>    - Client calls Unmap(). Unmap() restores the C bit in one fell swoop,
>      and encrypts the buffer in-place (by bouncing it page-wise to the
>      static array and back).
>    - Client reads some residual data from the buffer.
>    - Client calls FreeBuffer(). FreeBuffer() relases the pages.
> 

Yes this works fine as long as the client uses EFI_PCI_IO_PROTOCOL.AllocateBuffer()
to allocate the buffer.


> This is suitable for the discussed ExitBootServices() callback update
> too, where the callback will reset the virtio device (like now) and then
> call Unmap() for all shared areas, without calling FreeBuffer() on them.
> The above logic will re-encrypt the contents without affecting the UEFI
> memmap.
> 
>> Also since the memory encryption mask works on PAGE_SIZE hence
>> changing the encryption mask on not our allocated buffer could mess
>> things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
> 
> This is not a problem for BusMasterRead and BusMasterWrite because for
> those operations, Map() allocates the internal bounce buffer. Also not a
> problem for BusMasterCommonBuffer, because then the client first calls
> AllocateBuffer (whole number of pages), and so Map() can round up the
> number of bytes and de-crypt full pages in-place (see above).
> 
>>
>>>>     *Regardless* of BusMaster operation, the following actions are
>>>>     carried out unconditionally:
>>>>
>>>>     . the memory encryption mask is cleared in this function (and in
>>>>       this function only),
>>>>
>>>>     . An attempt is made to grab a MAP_INFO structure from an
>>>>       internal free list (to be introduced!). The head of the list is
>>>>       a new static variable. If the free list is empty, then a
>>>>       MAP_INFO structure is allocated with AllocatePool(). The
>>>>       NO_MAPPING macro becomes unused and can be deleted from the
>>>>       source code.
>>>>
>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For
>>>>    this, it has to consult the MAP_INFO structure that is being
>>>>    passed in from the caller.) In addition:
>>>>
>>>>     . If MapInfo->Operation is BusMasterCommonBuffer, then we know
>>>>       the allocation was done separately in AllocateBuffer, so we do
>>>>       not release the pages. Otherwise, we do release the pages.
>>>>
>>>>     . MapInfo is linked back on the internal free list (see above).
>>>>       It is *never* released with FreePool().
>>>>
>>>>     This approach guarantees that IoMmuUnmap() can de-program the
>>>>     IOMMU (= re-set the memory encryption mask) without changing the
>>>>     UEFI memory map. (I trust that MemEncryptSevSetPageEncMask() will
>>>>     not split page tables internally when it *re*sets the encryption
>>>>     mask -- is that correct?)
>>
>> Yes, MemEncryptSevSetPageEncMask() will not split pages when it
>> restores mask.
> 
> Great, thanks.
> Laszlo
> 


  reply	other threads:[~2017-07-28 15:58 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-19 22:09 [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
2017-07-19 22:09 ` [RFC v1 1/3] OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit Brijesh Singh
2017-07-19 22:09 ` [RFC v1 2/3] OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support Brijesh Singh
2017-07-19 22:09 ` [RFC v1 3/3] OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support Brijesh Singh
     [not found] ` <62320c1a-0cec-947c-8c63-5eb0416e4e33@redhat.com>
2017-07-21 11:17   ` [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
     [not found]     ` <20170722024318-mutt-send-email-mst@kernel.org>
2017-07-24  8:25       ` Gerd Hoffmann
2017-07-25 18:17 ` Laszlo Ersek
2017-07-25 23:42   ` Brijesh Singh
     [not found]   ` <904dae9f-e515-01ba-e16f-6561616c78af@redhat.com>
2017-07-26 15:30     ` Laszlo Ersek
2017-07-27 14:21   ` Brijesh Singh
2017-07-27 17:16     ` Laszlo Ersek
2017-07-27 17:56       ` Ard Biesheuvel
2017-07-27 19:00         ` Brijesh Singh
2017-07-27 20:55           ` Brijesh Singh
2017-07-27 21:31             ` Ard Biesheuvel
2017-07-27 21:38               ` Andrew Fish
2017-07-27 22:13                 ` Brijesh Singh
2017-07-27 22:10               ` Brijesh Singh
2017-07-28  8:39                 ` Ard Biesheuvel
2017-07-28 15:27                   ` Laszlo Ersek
2017-07-28 13:38           ` Laszlo Ersek
2017-07-28 16:00             ` Brijesh Singh [this message]
2017-07-28 16:16               ` Laszlo Ersek
2017-07-28 19:21               ` Laszlo Ersek
2017-07-28 19:59               ` Laszlo Ersek
2017-07-29  0:52                 ` Brijesh Singh
2017-07-29  1:37                   ` Brijesh Singh
2017-07-31 18:20                     ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=217545ac-962d-089f-9c9a-d2bbfca6427e@amd.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox