public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Brijesh Singh <brijesh.singh@amd.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Jordan Justen <jordan.l.justen@intel.com>,
	Jason Wang <jasowang@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>
Subject: Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Date: Fri, 28 Jul 2017 15:38:52 +0200	[thread overview]
Message-ID: <e1137140-f5d5-7eb8-0162-952694b24f96@redhat.com> (raw)
In-Reply-To: <c5c3c3cc-d0c1-4227-306c-da19f2e43203@amd.com>

On 07/27/17 21:00, Brijesh Singh wrote:
> On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
>> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:

>>> Normally, Map allocates the bounce buffer internally, and Unmap
>>> releases the bounce buffer internally (for BusMasterRead and
>>> BusMasterWrite), which is not right here. If you use
>>> AllocateBuffer() separately, then Map() -- with
>>> BusMasterCommonBuffer -- will not allocate internally, and Unmap()
>>> will also not deallocate internally. So, in the ExitBootServices()
>>> callback, you will be able to call Unmap() only, and then *not* call
>>> FreeBuffer() at all.
>>>
>>> This is why I suggest introducing all four functions (Allocate,
>>> Free, Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio
>>> drivers should use all four functions explicitly, not just Map and
>>> Unmap.
>>>
>>> ... Actually, I think there is a bug in
>>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following
>>> distribution of operations at the moment:
>>>
>>> - IoMmuAllocateBuffer() allocates pages and clears the memory
>>>   encryption mask
>>>
>>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and
>>>   deallocates pages
>>>
>>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is
>>>   requested (and IoMmuAllocateBuffer() was called previously).
>>>   Otherwise, IoMmuMap() allocates pages, allocates MAP_INFO, and
>>>   clears the memory encryption mask.
>>>
>>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>>   operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>>   encryption mask, and frees both the pages and MAP_INFO.
>>>
>
> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
> introduce list to track if the buffer was allocated by us. If buffer
> was allocated by us then we can safely say that it does not require a
> bounce buffer. While implementing the initial code I was not able to
> see BusMasterCommonBuffer usage. But with virtio things are getting a
> bit more clear in my mind.

The purpose of the internal list is a little bit different -- it is a
"free list", not a tracker list.

Under the proposed scheme, a MAP_INFO structure will have to be
allocated for *all* mappings: both for CommonBuffer (where the actual
pages come from the caller, who retrieved them earlier with an
AllocateBuffer() call), and for Read/Write (where the actual pages will
be allocated internally to Map). Allocating the MAP_INFO structure in
Map() is necessary in *both* cases because the Unmap() function can
*only* consult this MAP_INFO structure to learn the address and the size
at which it has to re-set the memory encryption mask. This is because
the Unmap() function gets no other input parameter.

Then, regardless of the original operation (CommonBuffer vs.
Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
CommonBuffer, the actual pages are owned by the caller, so we don't free
them here; for Read/Write, the actual pages are owned by Map/Unmap, so
we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
point is that MAP_INFO cannot be released with FreePool() because that
could change the UEFI memmap during ExitBootServices() processing. So
MAP_INFO has to be "released" to an internal *free* list.

And since we have an internal free list, the original MAP_INFO
allocation in Map() should consult the free list first, and only if it
is empty should it fall back to AllocatePool().

Whether the actual pages tracked by MAP_INFO were allocated internally
by Map(), or externally by the caller (via AllocateBuffer()) is an
independent question. (And it can be seen from the MAP_INFO.Operation
field.)

Does this make it clearer?

>
>>> This distribution of operations seems wrong. The key point is that
>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>> usable, and that client code is required to call Map()
>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>> operation. Therefore, the right distribution of operations is:
>>>
>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>   encryption mask..
>>>
>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>   encryption mask.
>>>
>
> Actually one of main reason why we cleared and restored the memory
> encryption mask during allocate/free is because we also consume the
> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a DMA
> buffer. I am certainly open to suggestions.

Argh. That's again my fault then, I should have noticed it. :( I
apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
for me as well.

As discussed earlier (and confirmed by Ard), calling *just*
AllocateBuffer() is never enough, Map()/Unmap() are always necessary.

So here's what I suggest (to keep the changes localized):

- Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
  function to output a "VOID *Mapping" parameter as well, in addition to
  the address.

- Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
  function to take a "VOID *Mapping" parameter in addition to the buffer
  address.

- In the DXE implementation of InternalQemuFwCfgSevDmaAllocateBuffer(),
  keep the current IOMMU AllocateBuffer() call, but also call IOMMU
  Map(), with CommonBuffer. Furthermore, propagate the Mapping output of
  Map() outwards. (Note that Map() may have to be called in a loop,
  dependent on "NumberOfBytes".)

- In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(), call
  the IOMMU Unmap() function first (using the new Mapping parameter).

>
> [1]
> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>
> [2]
> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>
>
>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>   requested, and it allocates pages (bounce buffer) otherwise.
>>>
>
> I am trying to wrap my head around how we can support
> BusMasterCommonBuffer when buffer was not allocated by us. Changing
> the memory encryption mask in a page table will not update the
> contents.

Thank you for the clarification. I've now tried to review the current
code for a better understanding. Are the below statements correct?

- For the guest, guest memory is always readable transparently.
- For the host, guest memory is always readable *as it was written
  last*.
- If the guest memory was last written with the C bit clear, then the
  host can read it as plaintext (regardless of the current state of the
  C bit).
- If the guest memory was last written with the C bit set, then the host
  can only read garbage (regardless of the current state of the C bit).

*If* this is the case, then:

(a) We already have a sort of guest->host information leak. Namely, in
IoMmuFreeBuffer() [OvmfPkg/IoMmuDxe/AmdSevIoMmu.c], the C bit is
restored, and the pages are released with gBS->FreePages(). However, the
pages being released are not rewritten in place (they are not actually
re-encrypted, only marked for encryption whenever they are next written
to). This means that pages released like this remain readable to the
hypervisor for an unpredictable time.

This is not necessarily a critical problem (after all the contents of
those pages were visible to the hypervisor at some point anyway!); but
in the longer term, such pages could accumulate, and if the hypervisor
is compromised later, it could potentially read an unbounded amount of
"past guest data" as plain-text.

(b) Plus, approaching the question from the Map() direction, we need to
consider two scenarios:

- Client code calls AllocateBuffer(), then Map(), and it writes to the
  buffer only then. This should be safe.
- client code calls AllocateBuffer(), writes to it, and then calls
  Map(). This will result in memory contents that look like garbage to
  the hypervisor. Bad.

I can imagine the following to handle these cases: in the Map() and
Unmap() functions, we have to decrypt and encrypt the memory contents
in-place, after changing the C bit (without allocating additional
memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes (this
will always remain in encrypted memory). Update the C bit with a single
function call for the entire range (like now) -- this will not affect
the guest-readability of the pages --, then bounce each page within the
range to the static buffer and back to its original place. In effect
this will in-place encrypt or decrypt the memory, and will be faster
than a byte-wise rewrite.

* BusMasterRead (host will want to read guest memory):
  - Client calls Map() without calling AllocateBuffer() first. Map()
    allocates an internal bounce buffer, clears the C bit, and does the
    copying.
  - Client fires off the PCI transaction.
  - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
    restores the C-bit, *zeros* the pages, and releases the pages.

* BusMasterWrite (host will want to write to guest memory):
  - Client calls Map() without calling AllocateBuffer() first. Map()
    allocates an internal bounce buffer and clears the C bit.
  - Client fires off the PCI transaction.
  - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
    copies the bounce buffer to crpyted (client-owned) memory, restores
    the C-bit, *zeros* the pages, and releases the pages.

* BusMasterCommonBuffer:
  - Client calls AllocateBuffer(), and places some data in the returned
    memory.
  - Client calls Map(). Map() clears the C bit in one fell swoop, and
    then decrypts the buffer in-place (by bouncing it page-wise to the
    static array and back).
  - Client communicates with the device.
  - Client calls Unmap(). Unmap() restores the C bit in one fell swoop,
    and encrypts the buffer in-place (by bouncing it page-wise to the
    static array and back).
  - Client reads some residual data from the buffer.
  - Client calls FreeBuffer(). FreeBuffer() relases the pages.

This is suitable for the discussed ExitBootServices() callback update
too, where the callback will reset the virtio device (like now) and then
call Unmap() for all shared areas, without calling FreeBuffer() on them.
The above logic will re-encrypt the contents without affecting the UEFI
memmap.

> Also since the memory encryption mask works on PAGE_SIZE hence
> changing the encryption mask on not our allocated buffer could mess
> things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).

This is not a problem for BusMasterRead and BusMasterWrite because for
those operations, Map() allocates the internal bounce buffer. Also not a
problem for BusMasterCommonBuffer, because then the client first calls
AllocateBuffer (whole number of pages), and so Map() can round up the
number of bytes and de-crypt full pages in-place (see above).

>
>>>    *Regardless* of BusMaster operation, the following actions are
>>>    carried out unconditionally:
>>>
>>>    . the memory encryption mask is cleared in this function (and in
>>>      this function only),
>>>
>>>    . An attempt is made to grab a MAP_INFO structure from an
>>>      internal free list (to be introduced!). The head of the list is
>>>      a new static variable. If the free list is empty, then a
>>>      MAP_INFO structure is allocated with AllocatePool(). The
>>>      NO_MAPPING macro becomes unused and can be deleted from the
>>>      source code.
>>>
>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For
>>>   this, it has to consult the MAP_INFO structure that is being
>>>   passed in from the caller.) In addition:
>>>
>>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know
>>>      the allocation was done separately in AllocateBuffer, so we do
>>>      not release the pages. Otherwise, we do release the pages.
>>>
>>>    . MapInfo is linked back on the internal free list (see above).
>>>      It is *never* released with FreePool().
>>>
>>>    This approach guarantees that IoMmuUnmap() can de-program the
>>>    IOMMU (= re-set the memory encryption mask) without changing the
>>>    UEFI memory map. (I trust that MemEncryptSevSetPageEncMask() will
>>>    not split page tables internally when it *re*sets the encryption
>>>    mask -- is that correct?)
>
> Yes, MemEncryptSevSetPageEncMask() will not split pages when it
> restores mask.

Great, thanks.
Laszlo


  parent reply	other threads:[~2017-07-28 13:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-19 22:09 [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
2017-07-19 22:09 ` [RFC v1 1/3] OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit Brijesh Singh
2017-07-19 22:09 ` [RFC v1 2/3] OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support Brijesh Singh
2017-07-19 22:09 ` [RFC v1 3/3] OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support Brijesh Singh
     [not found] ` <62320c1a-0cec-947c-8c63-5eb0416e4e33@redhat.com>
2017-07-21 11:17   ` [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
     [not found]     ` <20170722024318-mutt-send-email-mst@kernel.org>
2017-07-24  8:25       ` Gerd Hoffmann
2017-07-25 18:17 ` Laszlo Ersek
2017-07-25 23:42   ` Brijesh Singh
     [not found]   ` <904dae9f-e515-01ba-e16f-6561616c78af@redhat.com>
2017-07-26 15:30     ` Laszlo Ersek
2017-07-27 14:21   ` Brijesh Singh
2017-07-27 17:16     ` Laszlo Ersek
2017-07-27 17:56       ` Ard Biesheuvel
2017-07-27 19:00         ` Brijesh Singh
2017-07-27 20:55           ` Brijesh Singh
2017-07-27 21:31             ` Ard Biesheuvel
2017-07-27 21:38               ` Andrew Fish
2017-07-27 22:13                 ` Brijesh Singh
2017-07-27 22:10               ` Brijesh Singh
2017-07-28  8:39                 ` Ard Biesheuvel
2017-07-28 15:27                   ` Laszlo Ersek
2017-07-28 13:38           ` Laszlo Ersek [this message]
2017-07-28 16:00             ` Brijesh Singh
2017-07-28 16:16               ` Laszlo Ersek
2017-07-28 19:21               ` Laszlo Ersek
2017-07-28 19:59               ` Laszlo Ersek
2017-07-29  0:52                 ` Brijesh Singh
2017-07-29  1:37                   ` Brijesh Singh
2017-07-31 18:20                     ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1137140-f5d5-7eb8-0162-952694b24f96@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox