public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
@ 2017-07-19 22:09 Brijesh Singh
  2017-07-19 22:09 ` [RFC v1 1/3] OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit Brijesh Singh
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Brijesh Singh @ 2017-07-19 22:09 UTC (permalink / raw)
  To: edk2-devel
  Cc: Tom Lendacky, Brijesh Singh, Jordan Justen, Laszlo Ersek,
	Jason Wang, Michael S . Tsirkin

I have found that OVMF fails to detect the disk when iommu_platform is set from
qemu cli. The failure occurs during the feature bit negotiation.

Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch series introduced
a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute and methods to
allocate, free, map and unmap the DMA memory for the master bus devices

In this patch series, I have tried to enable the IOMMU_PLATFORM feature for
VirtioBlkDevice. I am sending this as RFC to seek feedback before I extend the support
for other Virtio devices. The patch has been tested in SEV guest - mainly because
IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If needed, I can
extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests.

qemu cli used for testing:

# $QEMU \
  ...
  -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \
  -device virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off
  ...

Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>

Brijesh Singh (3):
  OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
  OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
  OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support

 OvmfPkg/Library/VirtioLib/VirtioLib.inf      |   1 +
 OvmfPkg/VirtioBlkDxe/VirtioBlk.inf           |   5 +
 OvmfPkg/VirtioGpuDxe/VirtioGpu.inf           |   1 +
 OvmfPkg/VirtioNetDxe/VirtioNet.inf           |   1 +
 OvmfPkg/VirtioRngDxe/VirtioRng.inf           |   1 +
 OvmfPkg/VirtioScsiDxe/VirtioScsi.inf         |   1 +
 OvmfPkg/Include/IndustryStandard/Virtio095.h |   1 +
 OvmfPkg/Include/IndustryStandard/Virtio10.h  |   5 +
 OvmfPkg/Include/Library/VirtioLib.h          |  20 ++++
 OvmfPkg/Library/VirtioLib/VirtioLib.c        |  96 ++++++++++++++-
 OvmfPkg/VirtioBlkDxe/VirtioBlk.c             | 125 ++++++++++++++++++--
 11 files changed, 244 insertions(+), 13 deletions(-)

-- 
Brijesh Singh
2.7.4



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC v1 1/3] OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
  2017-07-19 22:09 [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
@ 2017-07-19 22:09 ` Brijesh Singh
  2017-07-19 22:09 ` [RFC v1 2/3] OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support Brijesh Singh
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Brijesh Singh @ 2017-07-19 22:09 UTC (permalink / raw)
  To: edk2-devel
  Cc: Tom Lendacky, Brijesh Singh, Jordan Justen, Laszlo Ersek,
	Jason Wang, Michael S . Tsirkin

Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 OvmfPkg/Include/IndustryStandard/Virtio10.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/OvmfPkg/Include/IndustryStandard/Virtio10.h b/OvmfPkg/Include/IndustryStandard/Virtio10.h
index 4c9b62a3cf59..5948b9ef1d9f 100644
--- a/OvmfPkg/Include/IndustryStandard/Virtio10.h
+++ b/OvmfPkg/Include/IndustryStandard/Virtio10.h
@@ -83,4 +83,9 @@ typedef struct {
 //
 #define VIRTIO_F_VERSION_1 BIT32
 
+//
+// VirtIo 1.0 IOMMU feature bits
+//
+#define VIRTIO_F_IOMMU_PLATFORM BIT33
+
 #endif // _VIRTIO_1_0_H_
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC v1 2/3] OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
  2017-07-19 22:09 [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
  2017-07-19 22:09 ` [RFC v1 1/3] OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit Brijesh Singh
@ 2017-07-19 22:09 ` Brijesh Singh
  2017-07-19 22:09 ` [RFC v1 3/3] OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support Brijesh Singh
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Brijesh Singh @ 2017-07-19 22:09 UTC (permalink / raw)
  To: edk2-devel
  Cc: Tom Lendacky, Brijesh Singh, Jordan Justen, Laszlo Ersek,
	Jason Wang, Michael S . Tsirkin

When iommu_platform is set, use IOMMU protocols buffer allocation
and free routines to allocate and free vring buffer (guest-host
communication buffer).

Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 OvmfPkg/Library/VirtioLib/VirtioLib.inf      |  1 +
 OvmfPkg/VirtioBlkDxe/VirtioBlk.inf           |  1 +
 OvmfPkg/VirtioGpuDxe/VirtioGpu.inf           |  1 +
 OvmfPkg/VirtioNetDxe/VirtioNet.inf           |  1 +
 OvmfPkg/VirtioRngDxe/VirtioRng.inf           |  1 +
 OvmfPkg/VirtioScsiDxe/VirtioScsi.inf         |  1 +
 OvmfPkg/Include/IndustryStandard/Virtio095.h |  1 +
 OvmfPkg/Include/Library/VirtioLib.h          | 20 ++++
 OvmfPkg/Library/VirtioLib/VirtioLib.c        | 96 +++++++++++++++++++-
 9 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/OvmfPkg/Library/VirtioLib/VirtioLib.inf b/OvmfPkg/Library/VirtioLib/VirtioLib.inf
index fb5897a88ecf..6629d0d52b04 100644
--- a/OvmfPkg/Library/VirtioLib/VirtioLib.inf
+++ b/OvmfPkg/Library/VirtioLib/VirtioLib.inf
@@ -26,6 +26,7 @@ [Sources]
 
 [Packages]
   MdePkg/MdePkg.dec
+  MdeModulePkg/MdeModulePkg.dec
   OvmfPkg/OvmfPkg.dec
 
 [LibraryClasses]
diff --git a/OvmfPkg/VirtioBlkDxe/VirtioBlk.inf b/OvmfPkg/VirtioBlkDxe/VirtioBlk.inf
index d5975b74eb05..53d5a164a0b8 100644
--- a/OvmfPkg/VirtioBlkDxe/VirtioBlk.inf
+++ b/OvmfPkg/VirtioBlkDxe/VirtioBlk.inf
@@ -26,6 +26,7 @@ [Sources]
 
 [Packages]
   MdePkg/MdePkg.dec
+  MdeModulePkg/MdeModulePkg.dec
   OvmfPkg/OvmfPkg.dec
 
 [LibraryClasses]
diff --git a/OvmfPkg/VirtioGpuDxe/VirtioGpu.inf b/OvmfPkg/VirtioGpuDxe/VirtioGpu.inf
index 04bc2964c223..a6d4cbbd191a 100644
--- a/OvmfPkg/VirtioGpuDxe/VirtioGpu.inf
+++ b/OvmfPkg/VirtioGpuDxe/VirtioGpu.inf
@@ -31,6 +31,7 @@ [Sources]
 
 [Packages]
   MdePkg/MdePkg.dec
+  MdeModulePkg/MdeModulePkg.dec
   OvmfPkg/OvmfPkg.dec
 
 [LibraryClasses]
diff --git a/OvmfPkg/VirtioNetDxe/VirtioNet.inf b/OvmfPkg/VirtioNetDxe/VirtioNet.inf
index a855ad4ac154..2b8996ed366e 100644
--- a/OvmfPkg/VirtioNetDxe/VirtioNet.inf
+++ b/OvmfPkg/VirtioNetDxe/VirtioNet.inf
@@ -42,6 +42,7 @@ [Sources]
 
 [Packages]
   MdePkg/MdePkg.dec
+  MdeModulePkg/MdeModulePkg.dec
   OvmfPkg/OvmfPkg.dec
 
 [LibraryClasses]
diff --git a/OvmfPkg/VirtioRngDxe/VirtioRng.inf b/OvmfPkg/VirtioRngDxe/VirtioRng.inf
index 471beb37bc7f..898bfb7158c2 100644
--- a/OvmfPkg/VirtioRngDxe/VirtioRng.inf
+++ b/OvmfPkg/VirtioRngDxe/VirtioRng.inf
@@ -26,6 +26,7 @@ [Sources]
 
 [Packages]
   MdePkg/MdePkg.dec
+  MdeModulePkg/MdeModulePkg.dec
   OvmfPkg/OvmfPkg.dec
 
 [LibraryClasses]
diff --git a/OvmfPkg/VirtioScsiDxe/VirtioScsi.inf b/OvmfPkg/VirtioScsiDxe/VirtioScsi.inf
index 75581123930b..082f9a12bb09 100644
--- a/OvmfPkg/VirtioScsiDxe/VirtioScsi.inf
+++ b/OvmfPkg/VirtioScsiDxe/VirtioScsi.inf
@@ -27,6 +27,7 @@ [Sources]
 
 [Packages]
   MdePkg/MdePkg.dec
+  MdeModulePkg/MdeModulePkg.dec
   OvmfPkg/OvmfPkg.dec
 
 [LibraryClasses]
diff --git a/OvmfPkg/Include/IndustryStandard/Virtio095.h b/OvmfPkg/Include/IndustryStandard/Virtio095.h
index 6bf77cb32075..c81b29c6d9d0 100644
--- a/OvmfPkg/Include/IndustryStandard/Virtio095.h
+++ b/OvmfPkg/Include/IndustryStandard/Virtio095.h
@@ -154,6 +154,7 @@ typedef struct {
   VRING_AVAIL         Avail;
   VRING_USED          Used;
   UINT16              QueueSize;
+  VOID                *Iommu;
 } VRING;
 
 //
diff --git a/OvmfPkg/Include/Library/VirtioLib.h b/OvmfPkg/Include/Library/VirtioLib.h
index 5badfb32917f..92a3d5e10f7b 100644
--- a/OvmfPkg/Include/Library/VirtioLib.h
+++ b/OvmfPkg/Include/Library/VirtioLib.h
@@ -17,10 +17,30 @@
 #ifndef _VIRTIO_LIB_H_
 #define _VIRTIO_LIB_H_
 
+#include <Protocol/IoMmu.h>
 #include <Protocol/VirtioDevice.h>
 
 #include <IndustryStandard/Virtio.h>
 
+/**
+
+  Configure a virtio ring to use IOMMU protocol
+
+  This function tells vring to use the given IOMMU protocol interface
+  for allocating and freeing the vring buffer (this guest-host communication
+  area)
+
+  @param[in]                   The virtio ring to set up.
+
+  @param[in]                   IOMMU protocol to use.
+
+**/
+VOID
+EFIAPI
+VirtioRingUseIommu (
+  IN VRING                  *Ring,
+  IN EDKII_IOMMU_PROTOCOL   *Iommu
+  );
 
 /**
 
diff --git a/OvmfPkg/Library/VirtioLib/VirtioLib.c b/OvmfPkg/Library/VirtioLib/VirtioLib.c
index 845f206369a3..d8fdb6310d52 100644
--- a/OvmfPkg/Library/VirtioLib/VirtioLib.c
+++ b/OvmfPkg/Library/VirtioLib/VirtioLib.c
@@ -26,6 +26,69 @@
 
 /**
 
+  Configure a virtio ring to use IOMMU protocol
+
+  This function tells vring to use the given IOMMU protocol interface
+  for allocating and freeing the vring buffer (this guest-host communication
+  area)
+
+  @param[in]                   The virtio ring to set up.
+
+  @param[in]                   IOMMU protocol to use.
+
+**/
+VOID
+EFIAPI
+VirtioRingUseIommu (
+  IN VRING                  *Ring,
+  IN EDKII_IOMMU_PROTOCOL   *Iommu
+  )
+{
+  Ring->Iommu = Iommu;
+}
+
+/**
+
+  Allocate vring (the guest-host communication area)
+
+  This function checks if host has requested the Iommu support then it uses
+  the IOMMU protocol allocator for allocating vring otherwise uses AllocatePages
+
+**/
+STATIC
+VOID *
+VirtioRingAllocatePages (
+  IN  VRING   *Ring
+  )
+{
+  UINT32  NumPages;
+  EDKII_IOMMU_PROTOCOL  *Iommu;
+
+  NumPages = Ring->NumPages;
+  Iommu = (EDKII_IOMMU_PROTOCOL *)Ring->Iommu;
+
+  //
+  // If IOMMU protocol is set then use IOMMU allocator otherwise
+  // fallback to standard allocator
+  //
+  if (Iommu) {
+    EFI_STATUS  Status;
+    VOID        *Buffer;
+
+    Status = Iommu->AllocateBuffer (Iommu, 0, EfiBootServicesData,
+                        NumPages, &Buffer, EDKII_IOMMU_ATTRIBUTE_MEMORY_CACHED);
+    if (EFI_ERROR (Status)) {
+      return NULL;
+    }
+
+    return Buffer;
+  }
+
+  return AllocatePages (NumPages);
+}
+
+/**
+
   Configure a virtio ring.
 
   This function sets up internal storage (the guest-host communication area)
@@ -76,7 +139,7 @@ VirtioRingInit (
                 EFI_PAGE_SIZE);
 
   Ring->NumPages = EFI_SIZE_TO_PAGES (RingSize);
-  Ring->Base = AllocatePages (Ring->NumPages);
+  Ring->Base = VirtioRingAllocatePages (Ring);
   if (Ring->Base == NULL) {
     return EFI_OUT_OF_RESOURCES;
   }
@@ -118,6 +181,35 @@ VirtioRingInit (
   return EFI_SUCCESS;
 }
 
+/**
+
+  Free vring (the guest-host communication area)
+
+  This function checks if host has requested the Iommu support then it uses
+  the IOMMU protocol free callback otherwise uses FreePages
+
+**/
+STATIC
+VOID
+VirtioRingFreePages (
+  IN  VRING   *Ring
+  )
+{
+  EDKII_IOMMU_PROTOCOL  *Iommu;
+
+  Iommu = (EDKII_IOMMU_PROTOCOL *)Ring->Iommu;
+
+  //
+  // If IOMMU protocol is set then use IOMMU FreeBuffer otherwise
+  // fallback to FreePages
+  //
+  if (Iommu) {
+    Iommu->FreeBuffer (Iommu, Ring->NumPages, Ring->Base);
+  } else {
+    FreePages (Ring->Base, Ring->NumPages);
+  }
+}
+
 
 /**
 
@@ -136,7 +228,7 @@ VirtioRingUninit (
   IN OUT VRING *Ring
   )
 {
-  FreePages (Ring->Base, Ring->NumPages);
+  VirtioRingFreePages (Ring);
   SetMem (Ring, sizeof *Ring, 0x00);
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC v1 3/3] OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support
  2017-07-19 22:09 [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
  2017-07-19 22:09 ` [RFC v1 1/3] OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit Brijesh Singh
  2017-07-19 22:09 ` [RFC v1 2/3] OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support Brijesh Singh
@ 2017-07-19 22:09 ` Brijesh Singh
       [not found] ` <62320c1a-0cec-947c-8c63-5eb0416e4e33@redhat.com>
  2017-07-25 18:17 ` Laszlo Ersek
  4 siblings, 0 replies; 28+ messages in thread
From: Brijesh Singh @ 2017-07-19 22:09 UTC (permalink / raw)
  To: edk2-devel
  Cc: Tom Lendacky, Brijesh Singh, Jordan Justen, Laszlo Ersek,
	Jason Wang, Michael S . Tsirkin

When VIRTIO_F_IOMMU_PLATFORM feature bit in set then try locating
IOMMU protocol. If IOMMU protocol is available, then use the IOMMU
protocols functions to allocate, free, map and unmap the host buffer
to device buffer.

Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 OvmfPkg/VirtioBlkDxe/VirtioBlk.inf |   4 +
 OvmfPkg/VirtioBlkDxe/VirtioBlk.c   | 125 ++++++++++++++++++--
 2 files changed, 118 insertions(+), 11 deletions(-)

diff --git a/OvmfPkg/VirtioBlkDxe/VirtioBlk.inf b/OvmfPkg/VirtioBlkDxe/VirtioBlk.inf
index 53d5a164a0b8..12577c5f700f 100644
--- a/OvmfPkg/VirtioBlkDxe/VirtioBlk.inf
+++ b/OvmfPkg/VirtioBlkDxe/VirtioBlk.inf
@@ -41,3 +41,7 @@ [LibraryClasses]
 [Protocols]
   gEfiBlockIoProtocolGuid   ## BY_START
   gVirtioDeviceProtocolGuid ## TO_START
+  gEdkiiIoMmuProtocolGuid   ## SOMETIMES_CONSUMES
+
+[Depex]
+  gEdkiiIoMmuProtocolGuid OR gIoMmuAbsentProtocolGuid
diff --git a/OvmfPkg/VirtioBlkDxe/VirtioBlk.c b/OvmfPkg/VirtioBlkDxe/VirtioBlk.c
index 3ce72281c204..f78052c4ecc0 100644
--- a/OvmfPkg/VirtioBlkDxe/VirtioBlk.c
+++ b/OvmfPkg/VirtioBlkDxe/VirtioBlk.c
@@ -23,6 +23,8 @@
 
 **/
 
+#include <Protocol/IoMmu.h>
+
 #include <IndustryStandard/VirtioBlk.h>
 #include <Library/BaseMemoryLib.h>
 #include <Library/DebugLib.h>
@@ -33,6 +35,8 @@
 
 #include "VirtioBlk.h"
 
+STATIC EDKII_IOMMU_PROTOCOL        *mIoMmuProtocol;
+
 /**
 
   Convenience macros to read and write region 0 IO space elements of the
@@ -247,13 +251,49 @@ SynchronousRequest (
   )
 {
   UINT32                  BlockSize;
-  volatile VIRTIO_BLK_REQ Request;
-  volatile UINT8          HostStatus;
+  volatile VIRTIO_BLK_REQ LocalRequest;
+  volatile UINT8          LocalHostStatus;
+  volatile VIRTIO_BLK_REQ *Request;
+  volatile UINT8          *HostStatus;
   DESC_INDICES            Indices;
+  VOID                    *IommuBuffer;
+  UINT32                  NumPages;
+  VOID                    *Mapping;
+  EFI_STATUS              Status;
+  EFI_PHYSICAL_ADDRESS    DeviceBuffer;
 
   BlockSize = Dev->BlockIoMedia.BlockSize;
 
   //
+  // set NumPages to suppress incorrect compiler/analyzer warnings
+  //
+  NumPages = 0;
+
+  //
+  // If IOMMU platform is enabled for this device then allocate Request and
+  // HostStatus variable dynamically using IOMMU allocator
+  //
+  if (mIoMmuProtocol) {
+    EFI_STATUS  Status;
+
+    NumPages = (UINT32)EFI_SIZE_TO_PAGES (sizeof (*Request) + sizeof (*HostStatus));
+    Status = mIoMmuProtocol->AllocateBuffer (mIoMmuProtocol, 0,
+                                EfiBootServicesData, NumPages, &IommuBuffer,
+                                EDKII_IOMMU_ATTRIBUTE_MEMORY_CACHED);
+    if (EFI_ERROR (Status)) {
+      return Status;
+    }
+
+    Request = IommuBuffer;
+    HostStatus = IommuBuffer + sizeof(*Request);
+  } else {
+    Request = &LocalRequest;
+    HostStatus = &LocalHostStatus;
+    IommuBuffer = NULL;
+    Mapping = NULL;
+  }
+
+  //
   // ensured by VirtioBlkInit()
   //
   ASSERT (BlockSize > 0);
@@ -268,18 +308,18 @@ SynchronousRequest (
   // Prepare virtio-blk request header, setting zero size for flush.
   // IO Priority is homogeneously 0.
   //
-  Request.Type   = RequestIsWrite ?
+  Request->Type   = RequestIsWrite ?
                    (BufferSize == 0 ? VIRTIO_BLK_T_FLUSH : VIRTIO_BLK_T_OUT) :
                    VIRTIO_BLK_T_IN;
-  Request.IoPrio = 0;
-  Request.Sector = MultU64x32(Lba, BlockSize / 512);
+  Request->IoPrio = 0;
+  Request->Sector = MultU64x32(Lba, BlockSize / 512);
 
   VirtioPrepare (&Dev->Ring, &Indices);
 
   //
   // preset a host status for ourselves that we do not accept as success
   //
-  HostStatus = VIRTIO_BLK_S_IOERR;
+  *HostStatus = VIRTIO_BLK_S_IOERR;
 
   //
   // ensured by VirtioBlkInit() -- this predicate, in combination with the
@@ -290,7 +330,7 @@ SynchronousRequest (
   //
   // virtio-blk header in first desc
   //
-  VirtioAppendDesc (&Dev->Ring, (UINTN) &Request, sizeof Request,
+  VirtioAppendDesc (&Dev->Ring, (UINTN) Request, sizeof *Request,
     VRING_DESC_F_NEXT, &Indices);
 
   //
@@ -308,6 +348,42 @@ SynchronousRequest (
     ASSERT (BufferSize <= SIZE_1GB);
 
     //
+    // When IOMMU platform is enabled, map the host buffer to device
+    //
+    if (mIoMmuProtocol) {
+      EDKII_IOMMU_OPERATION   Operation;
+      UINTN                   NumBytes;
+
+      NumBytes = BufferSize;
+
+      if (RequestIsWrite) {
+        Operation = EdkiiIoMmuOperationBusMasterRead;
+      } else {
+        Operation = EdkiiIoMmuOperationBusMasterWrite;
+      }
+
+      Status = mIoMmuProtocol->Map (mIoMmuProtocol, Operation, (VOID *)Buffer,
+                        &NumBytes, &DeviceBuffer, &Mapping);
+      if (EFI_ERROR (Status)) {
+        Status = EFI_DEVICE_ERROR;
+        goto HandleExit;
+      }
+
+      //
+      // Verify that we are able to map the required size
+      //
+      if (NumBytes < BufferSize) {
+        DEBUG ((DEBUG_ERROR, "%a: request %d got %d\n", __FUNCTION__,
+              BufferSize, NumBytes));
+
+        Status = EFI_DEVICE_ERROR;
+        mIoMmuProtocol->Unmap (mIoMmuProtocol, Mapping);
+        goto HandleExit;
+      }
+      Buffer = (VOID *)DeviceBuffer;
+    }
+
+    //
     // VRING_DESC_F_WRITE is interpreted from the host's point of view.
     //
     VirtioAppendDesc (&Dev->Ring, (UINTN) Buffer, (UINT32) BufferSize,
@@ -318,7 +394,7 @@ SynchronousRequest (
   //
   // host status in last (second or third) desc
   //
-  VirtioAppendDesc (&Dev->Ring, (UINTN) &HostStatus, sizeof HostStatus,
+  VirtioAppendDesc (&Dev->Ring, (UINTN) HostStatus, sizeof *HostStatus,
     VRING_DESC_F_WRITE, &Indices);
 
   //
@@ -326,11 +402,23 @@ SynchronousRequest (
   //
   if (VirtioFlush (Dev->VirtIo, 0, &Dev->Ring, &Indices,
         NULL) == EFI_SUCCESS &&
-      HostStatus == VIRTIO_BLK_S_OK) {
-    return EFI_SUCCESS;
+      *HostStatus == VIRTIO_BLK_S_OK) {
+    Status = EFI_SUCCESS;
+    goto HandleExit;
   }
 
-  return EFI_DEVICE_ERROR;
+  Status = EFI_DEVICE_ERROR;
+
+HandleExit:
+  if (Mapping) {
+    mIoMmuProtocol->Unmap (mIoMmuProtocol, Mapping);
+  }
+
+  if (IommuBuffer) {
+    mIoMmuProtocol->FreeBuffer (mIoMmuProtocol, NumPages, IommuBuffer);
+  }
+
+  return Status;
 }
 
 
@@ -692,6 +780,21 @@ VirtioBlkInit (
     }
   }
 
+  //
+  // If IOMMU_PLATFORM feature is requested then try to locate IOMMU
+  // protocol before acking that we support IOMMU_PLATFORM feature
+  //
+  if (Features & VIRTIO_F_IOMMU_PLATFORM) {
+    Status = gBS->LocateProtocol (&gEdkiiIoMmuProtocolGuid, NULL, (VOID **)&mIoMmuProtocol);
+    if (EFI_ERROR (Status)) {
+      DEBUG ((DEBUG_ERROR, "%a: Failed to locate IOMMU protocol\n", __FUNCTION__));
+      goto Failed;
+    }
+
+    Features &= VIRTIO_F_IOMMU_PLATFORM;
+    VirtioRingUseIommu (&Dev->Ring, mIoMmuProtocol);
+  }
+
   Features &= VIRTIO_BLK_F_BLK_SIZE | VIRTIO_BLK_F_TOPOLOGY | VIRTIO_BLK_F_RO |
               VIRTIO_BLK_F_FLUSH | VIRTIO_F_VERSION_1;
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
       [not found] ` <62320c1a-0cec-947c-8c63-5eb0416e4e33@redhat.com>
@ 2017-07-21 11:17   ` Brijesh Singh
       [not found]     ` <20170722024318-mutt-send-email-mst@kernel.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Brijesh Singh @ 2017-07-21 11:17 UTC (permalink / raw)
  To: Jason Wang, edk2-devel
  Cc: brijesh.singh, Tom Lendacky, Jordan Justen, Laszlo Ersek,
	Michael S . Tsirkin


On 7/20/17 10:24 PM, Jason Wang wrote:
>
>
> On 2017年07月20日 06:09, Brijesh Singh wrote:
>> I have found that OVMF fails to detect the disk when iommu_platform
>> is set from
>> qemu cli. The failure occurs during the feature bit negotiation.
>>
>> Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch
>> series introduced
>> a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute
>> and methods to
>> allocate, free, map and unmap the DMA memory for the master bus devices
>>
>> In this patch series, I have tried to enable the IOMMU_PLATFORM
>> feature for
>> VirtioBlkDevice. I am sending this as RFC to seek feedback before I
>> extend the support
>> for other Virtio devices. The patch has been tested in SEV guest -
>> mainly because
>> IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If
>> needed, I can
>> extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests.
>>
>> qemu cli used for testing:
>>
>> # $QEMU \
>>    ...
>>    -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \
>>    -device
>> virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off
>>    ...
>>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>
>> Brijesh Singh (3):
>>    OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
>>    OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
>>    OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support
>
> Hi, do we need change virtio-scsi driver as well?
>
I see that OVMF has the following virtio drivers, we need to update them
all:

VirtioBlkDxe
VirtioGpuDxe
VirtioNetDxe
VirtioRngDxe
VirtioScsiDxe

I will wait for Laszlo and Jordan's initial feedback before changing
other drivers.
> Thanks
>
>>
>>   OvmfPkg/Library/VirtioLib/VirtioLib.inf      |   1 +
>>   OvmfPkg/VirtioBlkDxe/VirtioBlk.inf           |   5 +
>>   OvmfPkg/VirtioGpuDxe/VirtioGpu.inf           |   1 +
>>   OvmfPkg/VirtioNetDxe/VirtioNet.inf           |   1 +
>>   OvmfPkg/VirtioRngDxe/VirtioRng.inf           |   1 +
>>   OvmfPkg/VirtioScsiDxe/VirtioScsi.inf         |   1 +
>>   OvmfPkg/Include/IndustryStandard/Virtio095.h |   1 +
>>   OvmfPkg/Include/IndustryStandard/Virtio10.h  |   5 +
>>   OvmfPkg/Include/Library/VirtioLib.h          |  20 ++++
>>   OvmfPkg/Library/VirtioLib/VirtioLib.c        |  96 ++++++++++++++-
>>   OvmfPkg/VirtioBlkDxe/VirtioBlk.c             | 125
>> ++++++++++++++++++--
>>   11 files changed, 244 insertions(+), 13 deletions(-)
>>
>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
       [not found]     ` <20170722024318-mutt-send-email-mst@kernel.org>
@ 2017-07-24  8:25       ` Gerd Hoffmann
  0 siblings, 0 replies; 28+ messages in thread
From: Gerd Hoffmann @ 2017-07-24  8:25 UTC (permalink / raw)
  To: Michael S. Tsirkin, Brijesh Singh
  Cc: Jason Wang, edk2-devel, Tom Lendacky, Jordan Justen, Laszlo Ersek,
	David Airlie, dri-devel, virtualization

  Hi,

> > I see that OVMF has the following virtio drivers, we need to update
> > them
> > all:
> > 
> > VirtioBlkDxe
> > VirtioGpuDxe
> > VirtioNetDxe
> > VirtioRngDxe
> > VirtioScsiDxe
> > 
> > I will wait for Laszlo and Jordan's initial feedback before
> > changing
> > other drivers.
> 
> I'm not sure about the GPU. Cc relevant maintainers -
> can virtio GPU work from behind an IOMMU?

GPU uses main memory as backing storage for framebuffers, and this is
passed as guest physical address (scatterlist of addresses to be exact)
to the host.

So, I think no, this isn't going to work with the current code.

Should be possible to fix though.  We need to define what "guest
physical address" should be with VIRTIO_F_IOMMU_PLATFORM enabled
(probably guest pci bus address) and add support for proper iommu
lookups.

cheers,
  Gerd



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-19 22:09 [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
                   ` (3 preceding siblings ...)
       [not found] ` <62320c1a-0cec-947c-8c63-5eb0416e4e33@redhat.com>
@ 2017-07-25 18:17 ` Laszlo Ersek
  2017-07-25 23:42   ` Brijesh Singh
                     ` (2 more replies)
  4 siblings, 3 replies; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-25 18:17 UTC (permalink / raw)
  To: Brijesh Singh, edk2-devel
  Cc: Tom Lendacky, Jordan Justen, Jason Wang, Michael S . Tsirkin,
	Gerd Hoffmann, Ard Biesheuvel

Adding Ard

On 07/20/17 00:09, Brijesh Singh wrote:
> I have found that OVMF fails to detect the disk when iommu_platform is set from
> qemu cli. The failure occurs during the feature bit negotiation.
> 
> Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch series introduced
> a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute and methods to
> allocate, free, map and unmap the DMA memory for the master bus devices
> 
> In this patch series, I have tried to enable the IOMMU_PLATFORM feature for
> VirtioBlkDevice. I am sending this as RFC to seek feedback before I extend the support
> for other Virtio devices. The patch has been tested in SEV guest - mainly because
> IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If needed, I can
> extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests.
> 
> qemu cli used for testing:
> 
> # $QEMU \
>   ...
>   -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \
>   -device virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off
>   ...
> 
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> 
> Brijesh Singh (3):
>   OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
>   OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
>   OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support

In the course of processing my post-PTO email backlog, yesterday I
skimmed this topic very quickly, and thought about it for an hour or so
(while away for the computer). Here's my take on it.


(1) The closest to any formal language that I found for
VIRTIO_F_IOMMU_PLATFORM are:

  https://issues.oasis-open.org/browse/VIRTIO-154
  https://lists.oasis-open.org/archives/virtio-dev/201610/msg00121.html

Are these references up-to-date? The ticket also references SVN r587 as
the resolution, but that link is dead.


(2) A few weeks back I saw Jason's SeaBIOS patch (also pointed out to me
more recently by Brijesh, in private):

  [SeaBIOS] [PATCH] virtio: IOMMU support

I don't understand this patch. (I also don't understand the
"iommu_platform" device property on the QEMU command line.) According to
the spec language quoted from the mailing list above, we have four cases:

(2.1) device does not offer VIRITO_F_IOMMU_PLATFORM --> everything works
like before

(2.2) device offers VIRITO_F_IOMMU_PLATFORM, but the driver does not
negotiate it --> device is allowed to reject functioning

* device offers VIRITO_F_IOMMU_PLATFORM and the driver negotiates it:
  there are two possibilities:
  (2.3) the driver *disables* the IOMMU, and works like before
  (2.4) the driver *configures* the IOMMU and works accordingly

The SeaBIOS patch negotiates VIRITO_F_IOMMU_PLATFORM, but it *neither*
disables *nor* configures the IOMMU. It simply *ignores* the IOMMU. I
don't see how that is any different *in effect* from simply not
negotiating VIRITO_F_IOMMU_PLATFORM -- case (2.2) --, and I don't
understand why QEMU tolerates "ignoring the IOMMU" differently from "not
negotiating the flag".


(3) *If* we indeed just wanted to follow the SeaBIOS patch, then
VIRTIO_F_IOMMU_PLATFORM should be treated in OVMF *precisely* in
parallel with VIRTIO_F_VERSION_1.


(4) I'm confused by the intersection of the SEV and
VIRTIO_F_IOMMU_PLATFORM use cases. The IoMmu DXE driver added in edk2
commit f9d129e68a45 ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06) is
specific to SEV.

In SEV-less guests, the IoMmu DXE driver will not install the IOMMU
protocol, but the QEMU command line may still contain the
"iommu_platform=true" property. For example, as far as I recall, DPDK
guest payloads use an emulated "viommu" device. For this OVMF, would
have to provide a separate IOMMU DXE driver, one that could actually
interact with the "viommu" device. And there should be some coordination
that exactly one instance of gEdkiiIoMmuProtocolGuid OR
gIoMmuAbsentProtocolGuid be installed.

I see that the SEV references are only in the blurb of the patch set; no
actual commit message refers to SEV. That's OK; I just think the blurb
is confusing.


(5) Now I'm coming to my main point. The virtio device drivers in
OvmfPkg are UEFI_DRIVER modules that conform to the UEFI driver model.
They consume our home-grown the VIRTIO_DEVICE_PROTOCOL, and produce
whatever UEFI protocol is appropriate on top (for example,
EFI_BLOCK_IO_PROTOCOL in VirtioBlkDxe).

Such a driver has no business talking to the platform's IOMMU protocol
directly (even if there is one). Instead:


(5.1) The VIRTIO_DEVICE_PROTOCOL has to be extended with the necessary

  AllocateSharedPages, FreeSharedPages, MapSharedPages, UnmapSharedPages

functions.


(5.2) All top-level virtio driver code, including VirtioLib, has to be
rebased to the new VIRTIO_DEVICE_PROTOCOL member functions. This covers
the ring memory itself, the memory pointed-to by the descriptors placed
on the ring(s) -- i.e., the device specific requests --, and all further
memory that is referenced by those device specific requests.

This will result in a larger memory footprint, as all current pool
allocations will be turned into page allocations, but I guess that is
tolerable.


(5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM simply
in parallel with VIRTIO_F_VERSION_1, and don't act upon
VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically this
is just my point (3) from above.


(5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:

- "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
  virtio-pci devices, and offers virtio 0.9.5 semantics.

- "ArmVirtPkg/VirtioFdtDxe" (via "OvmfPkg/Library/VirtioMmioDeviceLib")
  binds virtio-mmio devices, and offers virtio 0.9.5 semantics.

- "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and offers
  virtio 1.0.0 semantics.

The first two drivers should implement the AllocateSharedPages() and
FreeSharedPages() member functions simply with the corresponding
MemoryAllocationLib functions (using BootServicesData type memory), and
implement the MapSharedPages() and UnmapSharedPages() member functions
as no-ops (return the input addresses transparently).

The third driver should implement all four new member functions by
respectively delegating the job to:
- EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
- EFI_PCI_IO_PROTOCOL.FreeBuffer()
- EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
- EFI_PCI_IO_PROTOCOL.Unmap()

The EFI_PCI_IO_PROTOCOL implementation will delegate these calls to the
platform-specific PCI host bridge / root bridge driver, and *that*
driver in turn is allowed to talk to an IOMMU protocol (if any).

(This last step is already covered by the following edk2 commits:
- generally, c15da8eb3587 ("MdeModulePkg/PciHostBridge: Add IOMMU
support.", 2017-04-29),
- specifically for SEV in OVMF, c6ab9aecb71b ("OvmfPkg: update
PciHostBridgeDxe to use PlatformHasIoMmuLib", 2017-07-06).)


(5.5) There's a delicate question that has to be considered with care;
namely the ExitBootServices() callbacks in the virtio drivers.

Currently these functions perform virtio resets. They cause the devices
to forget all their configuration (including guest RAM references).
Aborting in-flight DMA (e.g. in VirtioNetDxe) is a practice that is
specifically recommended in the UEFI Driver Writers' Guide; plus at some
point the guest kernel will reclaim and overwrite BootServicesData type
memory. If at that point QEMU is still looking at some (originally
firmware-allocated) areas as virtio rings, the results won't be amusing.
(Speaking from experience.) Hence the resetting of virtio devices upon
ExitBootServices().

Now, remember that ExitBootServices() callbacks *must not* change the
UEFI memory map, so *no* memory allocations *must* be released in the
callbacks. The virtio resets performed in the callbacks are surgical for
this reason; nothing else is being done. The memory will be reclaimed by
the OS, later.

With an IOMMU in the picture, further actions become necessary: *after*
the virtio reset, any buffers previously in use (including rings, device
specific requests pointed-to by descriptors, and any further memory
referenced by those requests) must be *unmapped*, but *not freed*.

(Speaking in SEV terms, this will result in those memory areas seeing
their C bits restored, without changing the UEFI memmap.)

This means that:
- the new VIRTIO_DEVICE_PROTOCOL.UnmapSharedPages() function has to be
called judiciously from these callbacks, after the virtio reset,
- *and* that the *entire* call chain originating from
UnmapSharedPages(), through PciIo, through PciRootBridgeIo, to
EdkiiIoMmu, *must* not call gBS->FreePool() or gBS->FreePages() (or the
equivalent MemoryAllocationLib functions).

Note that if the last requirement is currently violated (outside of
OvmfPkg), then that is a general problem for physical platforms as well
-- IMO, a physical NIC driver too should be able to abort DMA in its
exit-boot-services callback and then unmap any relevant IOMMU mappings
(via PciIo->Unmap().)

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-25 18:17 ` Laszlo Ersek
@ 2017-07-25 23:42   ` Brijesh Singh
       [not found]   ` <904dae9f-e515-01ba-e16f-6561616c78af@redhat.com>
  2017-07-27 14:21   ` Brijesh Singh
  2 siblings, 0 replies; 28+ messages in thread
From: Brijesh Singh @ 2017-07-25 23:42 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel
  Cc: brijesh.singh, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann, Ard Biesheuvel

Hi Laszlo,

On 07/25/2017 01:17 PM, Laszlo Ersek wrote:
> Adding Ard
> 
> On 07/20/17 00:09, Brijesh Singh wrote:
>> I have found that OVMF fails to detect the disk when iommu_platform is set from
>> qemu cli. The failure occurs during the feature bit negotiation.
>>
>> Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch series introduced
>> a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute and methods to
>> allocate, free, map and unmap the DMA memory for the master bus devices
>>
>> In this patch series, I have tried to enable the IOMMU_PLATFORM feature for
>> VirtioBlkDevice. I am sending this as RFC to seek feedback before I extend the support
>> for other Virtio devices. The patch has been tested in SEV guest - mainly because
>> IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If needed, I can
>> extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests.
>>
>> qemu cli used for testing:
>>
>> # $QEMU \
>>    ...
>>    -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \
>>    -device virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off
>>    ...
>>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>
>> Brijesh Singh (3):
>>    OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
>>    OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
>>    OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support
> 
> In the course of processing my post-PTO email backlog, yesterday I
> skimmed this topic very quickly, and thought about it for an hour or so
> (while away for the computer). Here's my take on it.
> 
> 
> (1) The closest to any formal language that I found for
> VIRTIO_F_IOMMU_PLATFORM are:
> 
>    https://issues.oasis-open.org/browse/VIRTIO-154
>    https://lists.oasis-open.org/archives/virtio-dev/201610/msg00121.html
> 
> Are these references up-to-date? The ticket also references SVN r587 as
> the resolution, but that link is dead.
> 
> 
> (2) A few weeks back I saw Jason's SeaBIOS patch (also pointed out to me
> more recently by Brijesh, in private):
> 
>    [SeaBIOS] [PATCH] virtio: IOMMU support
> 
> I don't understand this patch. (I also don't understand the
> "iommu_platform" device property on the QEMU command line.) According to
> the spec language quoted from the mailing list above, we have four cases:
> 
> (2.1) device does not offer VIRITO_F_IOMMU_PLATFORM --> everything works
> like before
> 
> (2.2) device offers VIRITO_F_IOMMU_PLATFORM, but the driver does not
> negotiate it --> device is allowed to reject functioning
> 
> * device offers VIRITO_F_IOMMU_PLATFORM and the driver negotiates it:
>    there are two possibilities:
>    (2.3) the driver *disables* the IOMMU, and works like before
>    (2.4) the driver *configures* the IOMMU and works accordingly
> 
> The SeaBIOS patch negotiates VIRITO_F_IOMMU_PLATFORM, but it *neither*
> disables *nor* configures the IOMMU. It simply *ignores* the IOMMU. I
> don't see how that is any different *in effect* from simply not
> negotiating VIRITO_F_IOMMU_PLATFORM -- case (2.2) --, and I don't
> understand why QEMU tolerates "ignoring the IOMMU" differently from "not
> negotiating the flag".
> 
> 
> (3) *If* we indeed just wanted to follow the SeaBIOS patch, then
> VIRTIO_F_IOMMU_PLATFORM should be treated in OVMF *precisely* in
> parallel with VIRTIO_F_VERSION_1.
> 
> 
> (4) I'm confused by the intersection of the SEV and
> VIRTIO_F_IOMMU_PLATFORM use cases. The IoMmu DXE driver added in edk2
> commit f9d129e68a45 ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06) is
> specific to SEV.
> 
> In SEV-less guests, the IoMmu DXE driver will not install the IOMMU
> protocol, but the QEMU command line may still contain the
> "iommu_platform=true" property. For example, as far as I recall, DPDK
> guest payloads use an emulated "viommu" device. For this OVMF, would
> have to provide a separate IOMMU DXE driver, one that could actually
> interact with the "viommu" device. And there should be some coordination
> that exactly one instance of gEdkiiIoMmuProtocolGuid OR
> gIoMmuAbsentProtocolGuid be installed.
> 
> I see that the SEV references are only in the blurb of the patch set; no
> actual commit message refers to SEV. That's OK; I just think the blurb
> is confusing.
> 

I was trying to figure out why setting iommu_platform fails to detect the HDD
in OVMF. Since SEV IOMMU work was still fresh in my mind hence I thought we
simply need to update the Virtio drivers to consume IOMMU protocol directly.
But your explanation on how things work makes sense; thanks for explaining
it in great detail.

I will follow your recommendation, and look into extending VIRTIO_DEVICE_PROTOCOL
with necessary functions and delegate the work to EFI_PCI_IO_PROTOCOL (which will
use IOMMU driver if available).

Thanks
Brijesh

> 
> (5) Now I'm coming to my main point. The virtio device drivers in
> OvmfPkg are UEFI_DRIVER modules that conform to the UEFI driver model.
> They consume our home-grown the VIRTIO_DEVICE_PROTOCOL, and produce
> whatever UEFI protocol is appropriate on top (for example,
> EFI_BLOCK_IO_PROTOCOL in VirtioBlkDxe).
> 
> Such a driver has no business talking to the platform's IOMMU protocol
> directly (even if there is one). Instead:
> 
> 
> (5.1) The VIRTIO_DEVICE_PROTOCOL has to be extended with the necessary
> 
>    AllocateSharedPages, FreeSharedPages, MapSharedPages, UnmapSharedPages
> 
> functions.
> 
> 
> (5.2) All top-level virtio driver code, including VirtioLib, has to be
> rebased to the new VIRTIO_DEVICE_PROTOCOL member functions. This covers
> the ring memory itself, the memory pointed-to by the descriptors placed
> on the ring(s) -- i.e., the device specific requests --, and all further
> memory that is referenced by those device specific requests.
> 
> This will result in a larger memory footprint, as all current pool
> allocations will be turned into page allocations, but I guess that is
> tolerable.
> 
> 
> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM simply
> in parallel with VIRTIO_F_VERSION_1, and don't act upon
> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically this
> is just my point (3) from above.
> 
> 
> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
> 
> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>    virtio-pci devices, and offers virtio 0.9.5 semantics.
> 
> - "ArmVirtPkg/VirtioFdtDxe" (via "OvmfPkg/Library/VirtioMmioDeviceLib")
>    binds virtio-mmio devices, and offers virtio 0.9.5 semantics.
> 
> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and offers
>    virtio 1.0.0 semantics.
> 
> The first two drivers should implement the AllocateSharedPages() and
> FreeSharedPages() member functions simply with the corresponding
> MemoryAllocationLib functions (using BootServicesData type memory), and
> implement the MapSharedPages() and UnmapSharedPages() member functions
> as no-ops (return the input addresses transparently).
> 
> The third driver should implement all four new member functions by
> respectively delegating the job to:
> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
> - EFI_PCI_IO_PROTOCOL.Unmap()
> 
> The EFI_PCI_IO_PROTOCOL implementation will delegate these calls to the
> platform-specific PCI host bridge / root bridge driver, and *that*
> driver in turn is allowed to talk to an IOMMU protocol (if any).
> 
> (This last step is already covered by the following edk2 commits:
> - generally, c15da8eb3587 ("MdeModulePkg/PciHostBridge: Add IOMMU
> support.", 2017-04-29),
> - specifically for SEV in OVMF, c6ab9aecb71b ("OvmfPkg: update
> PciHostBridgeDxe to use PlatformHasIoMmuLib", 2017-07-06).)
> 
> 
> (5.5) There's a delicate question that has to be considered with care;
> namely the ExitBootServices() callbacks in the virtio drivers.
> 
> Currently these functions perform virtio resets. They cause the devices
> to forget all their configuration (including guest RAM references).
> Aborting in-flight DMA (e.g. in VirtioNetDxe) is a practice that is
> specifically recommended in the UEFI Driver Writers' Guide; plus at some
> point the guest kernel will reclaim and overwrite BootServicesData type
> memory. If at that point QEMU is still looking at some (originally
> firmware-allocated) areas as virtio rings, the results won't be amusing.
> (Speaking from experience.) Hence the resetting of virtio devices upon
> ExitBootServices().
> 
> Now, remember that ExitBootServices() callbacks *must not* change the
> UEFI memory map, so *no* memory allocations *must* be released in the
> callbacks. The virtio resets performed in the callbacks are surgical for
> this reason; nothing else is being done. The memory will be reclaimed by
> the OS, later.
> 
> With an IOMMU in the picture, further actions become necessary: *after*
> the virtio reset, any buffers previously in use (including rings, device
> specific requests pointed-to by descriptors, and any further memory
> referenced by those requests) must be *unmapped*, but *not freed*.
> 
> (Speaking in SEV terms, this will result in those memory areas seeing
> their C bits restored, without changing the UEFI memmap.)
> 
> This means that:
> - the new VIRTIO_DEVICE_PROTOCOL.UnmapSharedPages() function has to be
> called judiciously from these callbacks, after the virtio reset,
> - *and* that the *entire* call chain originating from
> UnmapSharedPages(), through PciIo, through PciRootBridgeIo, to
> EdkiiIoMmu, *must* not call gBS->FreePool() or gBS->FreePages() (or the
> equivalent MemoryAllocationLib functions).
> 
> Note that if the last requirement is currently violated (outside of
> OvmfPkg), then that is a general problem for physical platforms as well
> -- IMO, a physical NIC driver too should be able to abort DMA in its
> exit-boot-services callback and then unmap any relevant IOMMU mappings
> (via PciIo->Unmap().)
> 
> Thanks
> Laszlo
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
       [not found]   ` <904dae9f-e515-01ba-e16f-6561616c78af@redhat.com>
@ 2017-07-26 15:30     ` Laszlo Ersek
  0 siblings, 0 replies; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-26 15:30 UTC (permalink / raw)
  To: Jason Wang, Brijesh Singh, edk2-devel
  Cc: Tom Lendacky, Jordan Justen, Michael S . Tsirkin, Gerd Hoffmann,
	Ard Biesheuvel

On 07/26/17 05:32, Jason Wang wrote:
> 
> 
> On 2017年07月26日 02:17, Laszlo Ersek wrote:
>> Adding Ard
>>
>> On 07/20/17 00:09, Brijesh Singh wrote:
>>> I have found that OVMF fails to detect the disk when iommu_platform
>>> is set from
>>> qemu cli. The failure occurs during the feature bit negotiation.
>>>
>>> Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch
>>> series introduced
>>> a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute
>>> and methods to
>>> allocate, free, map and unmap the DMA memory for the master bus devices
>>>
>>> In this patch series, I have tried to enable the IOMMU_PLATFORM
>>> feature for
>>> VirtioBlkDevice. I am sending this as RFC to seek feedback before I
>>> extend the support
>>> for other Virtio devices. The patch has been tested in SEV guest -
>>> mainly because
>>> IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If
>>> needed, I can
>>> extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests.
>>>
>>> qemu cli used for testing:
>>>
>>> # $QEMU \
>>>    ...
>>>    -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \
>>>    -device
>>> virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off
>>>
>>>    ...
>>>
>>> Cc: Jordan Justen<jordan.l.justen@intel.com>
>>> Cc: Laszlo Ersek<lersek@redhat.com>
>>> Cc: Jason Wang<jasowang@redhat.com>
>>> Cc: Michael S. Tsirkin<mst@redhat.com>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Brijesh Singh<brijesh.singh@amd.com>
>>>
>>> Brijesh Singh (3):
>>>    OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
>>>    OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
>>>    OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support
>> In the course of processing my post-PTO email backlog, yesterday I
>> skimmed this topic very quickly, and thought about it for an hour or so
>> (while away for the computer). Here's my take on it.
>>
>>
>> (1) The closest to any formal language that I found for
>> VIRTIO_F_IOMMU_PLATFORM are:
>>
>>    https://issues.oasis-open.org/browse/VIRTIO-154
>>    https://lists.oasis-open.org/archives/virtio-dev/201610/msg00121.html
>>
>> Are these references up-to-date? The ticket also references SVN r587 as
>> the resolution, but that link is dead.
>>
>>
>> (2) A few weeks back I saw Jason's SeaBIOS patch (also pointed out to me
>> more recently by Brijesh, in private):
>>
>>    [SeaBIOS] [PATCH] virtio: IOMMU support
>>
>> I don't understand this patch. (I also don't understand the
>> "iommu_platform" device property on the QEMU command line.) According to
>> the spec language quoted from the mailing list above, we have four cases:
>>
>> (2.1) device does not offer VIRITO_F_IOMMU_PLATFORM --> everything works
>> like before
>>
>> (2.2) device offers VIRITO_F_IOMMU_PLATFORM, but the driver does not
>> negotiate it --> device is allowed to reject functioning
>>
>> * device offers VIRITO_F_IOMMU_PLATFORM and the driver negotiates it:
>>    there are two possibilities:
>>    (2.3) the driver*disables*  the IOMMU, and works like before
>>    (2.4) the driver*configures*  the IOMMU and works accordingly
>>
>> The SeaBIOS patch negotiates VIRITO_F_IOMMU_PLATFORM, but it*neither*
>> disables*nor*  configures the IOMMU. It simply*ignores*  the IOMMU. I
>> don't see how that is any different*in effect*  from simply not
>> negotiating VIRITO_F_IOMMU_PLATFORM -- case (2.2) --, and I don't
>> understand why QEMU tolerates "ignoring the IOMMU" differently from "not
>> negotiating the flag".
> 
> I think the difference is we don't want give the ability of bypassing
> IOMMU at driver level. That's why we forbid it in the case of 2.2.
> 
> For 2.3 IOMMU was disabled by e.g guest os not driver.

Ah! That makes a lot of sense. I guess this is again motivated by the
DPDK use case -- the guest kernel would decide about IOMMU setup, but
the virtio driver (which is in guest userspace) is required to comply
with the IOMMU requirement, even if the guest kernel does not actually
program the IOMMU.

I hope my above interpretation is correct, because the recommendations
in my other (long) email match it 100%. Namely, in UEFI, IOMMU setup (or
non-setup) is separated to various platform drivers, and the virtio
device drivers should be abstracted from the actual IOMMU presence (this
is the gist of my long email). Hence simply confirming
VIRITO_F_IOMMU_PLATFORM is right for the device drivers.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-25 18:17 ` Laszlo Ersek
  2017-07-25 23:42   ` Brijesh Singh
       [not found]   ` <904dae9f-e515-01ba-e16f-6561616c78af@redhat.com>
@ 2017-07-27 14:21   ` Brijesh Singh
  2017-07-27 17:16     ` Laszlo Ersek
  2 siblings, 1 reply; 28+ messages in thread
From: Brijesh Singh @ 2017-07-27 14:21 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel
  Cc: brijesh.singh, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann, Ard Biesheuvel

Hi Laszlo,


> 
> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM simply
> in parallel with VIRTIO_F_VERSION_1, and don't act upon
> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically this
> is just my point (3) from above.
> 
> 
> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
> 
> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>    virtio-pci devices, and offers virtio 0.9.5 semantics.
> 
> - "ArmVirtPkg/VirtioFdtDxe" (via "OvmfPkg/Library/VirtioMmioDeviceLib")
>    binds virtio-mmio devices, and offers virtio 0.9.5 semantics.
> 
> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and offers
>    virtio 1.0.0 semantics.
> 
> The first two drivers should implement the AllocateSharedPages() and
> FreeSharedPages() member functions simply with the corresponding
> MemoryAllocationLib functions (using BootServicesData type memory), and
> implement the MapSharedPages() and UnmapSharedPages() member functions
> as no-ops (return the input addresses transparently).
> 
> The third driver should implement all four new member functions by
> respectively delegating the job to:
> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
> - EFI_PCI_IO_PROTOCOL.Unmap()
> 

I have working to implement patch per your recommendation. I assume you mean
map the buffers with EfiPciIoOperationBusMasterCommonBuffer [1]. If so, I see
one issue with SEV guest. When SEV is active, IOMMU driver uses a bounce buffer
to map host address to a device address. While creating bounce buffer we can map
it either for EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
Operation. If caller wants to map EfiPciIoOperationBusMasterCommonBuffer then it
must allocate the buffer using EFI_PCI_IO_PROTOCOL.AllocateBuffer() [2] otherwise
we will fail to map. I see that PciRootBridgeIo.c has similar check when using
a bounce buffer for < 4GB use cases [3].

Do you see any issue if we use EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
instead of EfiPciIoOperationBusMasterCommonBuffer ?

[1] https://github.com/tianocore/edk2/blob/master/EdkCompatibilityPkg/Foundation/Efi/Protocol/PciIo/PciIo.h#L169
[2] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciBusDxe/PciIo.c#L1082
[3] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1109

> The EFI_PCI_IO_PROTOCOL implementation will delegate these calls to the
> platform-specific PCI host bridge / root bridge driver, and *that*
> driver in turn is allowed to talk to an IOMMU protocol (if any).
> 
> (This last step is already covered by the following edk2 commits:
> - generally, c15da8eb3587 ("MdeModulePkg/PciHostBridge: Add IOMMU
> support.", 2017-04-29),
> - specifically for SEV in OVMF, c6ab9aecb71b ("OvmfPkg: update
> PciHostBridgeDxe to use PlatformHasIoMmuLib", 2017-07-06).)
> 
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 14:21   ` Brijesh Singh
@ 2017-07-27 17:16     ` Laszlo Ersek
  2017-07-27 17:56       ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-27 17:16 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: edk2-devel, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann, Ard Biesheuvel

On 07/27/17 16:21, Brijesh Singh wrote:
> Hi Laszlo,
>
>
>>
>> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM
>> simply in parallel with VIRTIO_F_VERSION_1, and don't act upon
>> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically
>> this is just my point (3) from above.
>>
>>
>> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
>>
>> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>>    virtio-pci devices, and offers virtio 0.9.5 semantics.
>>
>> - "ArmVirtPkg/VirtioFdtDxe" (via
>>    "OvmfPkg/Library/VirtioMmioDeviceLib") binds virtio-mmio devices,
>>    and offers virtio 0.9.5 semantics.
>>
>> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and
>>    offers virtio 1.0.0 semantics.
>>
>> The first two drivers should implement the AllocateSharedPages() and
>> FreeSharedPages() member functions simply with the corresponding
>> MemoryAllocationLib functions (using BootServicesData type memory),
>> and implement the MapSharedPages() and UnmapSharedPages() member
>> functions as no-ops (return the input addresses transparently).
>>
>> The third driver should implement all four new member functions by
>> respectively delegating the job to:
>> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
>> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
>> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
>> - EFI_PCI_IO_PROTOCOL.Unmap()
>>
>
> I have working to implement patch per your recommendation. I assume
> you mean map the buffers with EfiPciIoOperationBusMasterCommonBuffer
> [1].

Yes.

> If so, I see one issue with SEV guest. When SEV is active, IOMMU
> driver uses a bounce buffer to map host address to a device address.
> While creating bounce buffer we can map it either for
> EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
> Operation. If caller wants to map
> EfiPciIoOperationBusMasterCommonBuffer then it must allocate the
> buffer using EFI_PCI_IO_PROTOCOL.AllocateBuffer() [2] otherwise we
> will fail to map.

Correct.

> I see that PciRootBridgeIo.c has similar check when using a bounce
> buffer for < 4GB use cases [3].

Correct.

> Do you see any issue if we use EfiPciIoOperationBusMasterRead or
> EfiPciIoOperationBusMasterWrite instead of
> EfiPciIoOperationBusMasterCommonBuffer ?

Yes, I do. The BusMasterRead and BusMasterWrite operations are suitable
for one-shot, uni-directional transfers, where the bounce buffer
contents and the client code contents are exchanged on Map/Unmap.

However virtio transfers, generally speaking, are not like this. While
the individual memory areas pointed-to by separate virtio descriptors
can me associated with one specific transfer direction (see the
VRING_DESC_F_WRITE flag), the virtio ring area itself is long-lived, and
it is simultaneously read and written by both host and guest,
asynchronously and repeatedly. This calls for BusMasterCommonBuffer.
Once we implement BusMasterCommonBuffer, we can use it for everything
else.


Another reason why separate allocation and mapping (and conversely,
separate unmapping and deallocation) are required is the
ExitBootServices() callbacks. In that callback, unmapping must happen
*without* memory allocation or deallocation (because the IOMMU must be
de-programmed, but the UEFI memmap must not be changed), covering all
memory areas that are at that point shared between host and guest
(regardless of transfer direction).

Normally, Map allocates the bounce buffer internally, and Unmap releases
the bounce buffer internally (for BusMasterRead and BusMasterWrite),
which is not right here. If you use AllocateBuffer() separately, then
Map() -- with BusMasterCommonBuffer -- will not allocate internally, and
Unmap() will also not deallocate internally. So, in the
ExitBootServices() callback, you will be able to call Unmap() only, and
then *not* call FreeBuffer() at all.

This is why I suggest introducing all four functions (Allocate, Free,
Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio drivers
should use all four functions explicitly, not just Map and Unmap.

... Actually, I think there is a bug in
"OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following distribution of
operations at the moment:

- IoMmuAllocateBuffer() allocates pages and clears the memory encryption
  mask

- IoMmuFreeBuffer() re-sets the memory encryption mask, and deallocates
  pages

- IoMmuMap() does nothing at all when BusMasterCommonBuffer is requested
  (and IoMmuAllocateBuffer() was called previously). Otherwise,
  IoMmuMap() allocates pages, allocates MAP_INFO, and clears the memory
  encryption mask.

- IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
  operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
  encryption mask, and frees both the pages and MAP_INFO.

This distribution of operations seems wrong. The key point is that
AllocateBuffer() *need not* result in a buffer that is immediately
usable, and that client code is required to call Map()
*unconditionally*, even if BusMasterCommonBuffer is the desired
operation. Therefore, the right distribution of operations is:

- IoMmuAllocateBuffer() allocates pages and does not touch the
  encryption mask..

- IoMmuFreeBuffer() deallocates pages and does not touch the encryption
  mask.

- IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
  requested, and it allocates pages (bounce buffer) otherwise.

  *Regardless* of BusMaster operation, the following actions are carried
  out unconditionally:

  . the memory encryption mask is cleared in this function (and in this
    function only),

  . An attempt is made to grab a MAP_INFO structure from an internal
    free list (to be introduced!). The head of the list is a new static
    variable. If the free list is empty, then a MAP_INFO structure is
    allocated with AllocatePool(). The NO_MAPPING macro becomes unused
    and can be deleted from the source code.

- IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
  has to consult the MAP_INFO structure that is being passed in from the
  caller.) In addition:

  . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
    allocation was done separately in AllocateBuffer, so we do not
    release the pages. Otherwise, we do release the pages.

  . MapInfo is linked back on the internal free list (see above). It is
    *never* released with FreePool().

  This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
  re-set the memory encryption mask) without changing the UEFI memory
  map. (I trust that MemEncryptSevSetPageEncMask() will not split page
  tables internally when it *re*sets the encryption mask -- is that
  correct?)

I'm sorry that I didn't catch this earlier in your commit f9d129e68a45
("OvmfPkg: Add IoMmuDxe driver", 2017-07-06), but as you see, I gave you
only an Acked-by on that, not a Reviewed-by. I've really had to think it
through from the virtio perspective; I didn't foresee this use case back
then (I only felt that I wasn't getting the full picture about the IOMMU
protocol details).

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 17:16     ` Laszlo Ersek
@ 2017-07-27 17:56       ` Ard Biesheuvel
  2017-07-27 19:00         ` Brijesh Singh
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2017-07-27 17:56 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Brijesh Singh, edk2-devel@lists.01.org, Tom Lendacky,
	Jordan Justen, Jason Wang, Michael S . Tsirkin, Gerd Hoffmann

On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:
> On 07/27/17 16:21, Brijesh Singh wrote:
>> Hi Laszlo,
>>
>>
>>>
>>> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM
>>> simply in parallel with VIRTIO_F_VERSION_1, and don't act upon
>>> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically
>>> this is just my point (3) from above.
>>>
>>>
>>> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
>>>
>>> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>>>    virtio-pci devices, and offers virtio 0.9.5 semantics.
>>>
>>> - "ArmVirtPkg/VirtioFdtDxe" (via
>>>    "OvmfPkg/Library/VirtioMmioDeviceLib") binds virtio-mmio devices,
>>>    and offers virtio 0.9.5 semantics.
>>>
>>> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and
>>>    offers virtio 1.0.0 semantics.
>>>
>>> The first two drivers should implement the AllocateSharedPages() and
>>> FreeSharedPages() member functions simply with the corresponding
>>> MemoryAllocationLib functions (using BootServicesData type memory),
>>> and implement the MapSharedPages() and UnmapSharedPages() member
>>> functions as no-ops (return the input addresses transparently).
>>>
>>> The third driver should implement all four new member functions by
>>> respectively delegating the job to:
>>> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
>>> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
>>> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
>>> - EFI_PCI_IO_PROTOCOL.Unmap()
>>>
>>
>> I have working to implement patch per your recommendation. I assume
>> you mean map the buffers with EfiPciIoOperationBusMasterCommonBuffer
>> [1].
>
> Yes.
>
>> If so, I see one issue with SEV guest. When SEV is active, IOMMU
>> driver uses a bounce buffer to map host address to a device address.
>> While creating bounce buffer we can map it either for
>> EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
>> Operation. If caller wants to map
>> EfiPciIoOperationBusMasterCommonBuffer then it must allocate the
>> buffer using EFI_PCI_IO_PROTOCOL.AllocateBuffer() [2] otherwise we
>> will fail to map.
>
> Correct.
>
>> I see that PciRootBridgeIo.c has similar check when using a bounce
>> buffer for < 4GB use cases [3].
>
> Correct.
>
>> Do you see any issue if we use EfiPciIoOperationBusMasterRead or
>> EfiPciIoOperationBusMasterWrite instead of
>> EfiPciIoOperationBusMasterCommonBuffer ?
>
> Yes, I do. The BusMasterRead and BusMasterWrite operations are suitable
> for one-shot, uni-directional transfers, where the bounce buffer
> contents and the client code contents are exchanged on Map/Unmap.
>
> However virtio transfers, generally speaking, are not like this. While
> the individual memory areas pointed-to by separate virtio descriptors
> can me associated with one specific transfer direction (see the
> VRING_DESC_F_WRITE flag), the virtio ring area itself is long-lived, and
> it is simultaneously read and written by both host and guest,
> asynchronously and repeatedly. This calls for BusMasterCommonBuffer.
> Once we implement BusMasterCommonBuffer, we can use it for everything
> else.
>
>
> Another reason why separate allocation and mapping (and conversely,
> separate unmapping and deallocation) are required is the
> ExitBootServices() callbacks. In that callback, unmapping must happen
> *without* memory allocation or deallocation (because the IOMMU must be
> de-programmed, but the UEFI memmap must not be changed), covering all
> memory areas that are at that point shared between host and guest
> (regardless of transfer direction).
>
> Normally, Map allocates the bounce buffer internally, and Unmap releases
> the bounce buffer internally (for BusMasterRead and BusMasterWrite),
> which is not right here. If you use AllocateBuffer() separately, then
> Map() -- with BusMasterCommonBuffer -- will not allocate internally, and
> Unmap() will also not deallocate internally. So, in the
> ExitBootServices() callback, you will be able to call Unmap() only, and
> then *not* call FreeBuffer() at all.
>
> This is why I suggest introducing all four functions (Allocate, Free,
> Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio drivers
> should use all four functions explicitly, not just Map and Unmap.
>
> ... Actually, I think there is a bug in
> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following distribution of
> operations at the moment:
>
> - IoMmuAllocateBuffer() allocates pages and clears the memory encryption
>   mask
>
> - IoMmuFreeBuffer() re-sets the memory encryption mask, and deallocates
>   pages
>
> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is requested
>   (and IoMmuAllocateBuffer() was called previously). Otherwise,
>   IoMmuMap() allocates pages, allocates MAP_INFO, and clears the memory
>   encryption mask.
>
> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>   operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>   encryption mask, and frees both the pages and MAP_INFO.
>
> This distribution of operations seems wrong. The key point is that
> AllocateBuffer() *need not* result in a buffer that is immediately
> usable, and that client code is required to call Map()
> *unconditionally*, even if BusMasterCommonBuffer is the desired
> operation. Therefore, the right distribution of operations is:
>
> - IoMmuAllocateBuffer() allocates pages and does not touch the
>   encryption mask..
>
> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>   mask.
>
> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>   requested, and it allocates pages (bounce buffer) otherwise.
>
>   *Regardless* of BusMaster operation, the following actions are carried
>   out unconditionally:
>
>   . the memory encryption mask is cleared in this function (and in this
>     function only),
>
>   . An attempt is made to grab a MAP_INFO structure from an internal
>     free list (to be introduced!). The head of the list is a new static
>     variable. If the free list is empty, then a MAP_INFO structure is
>     allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>     and can be deleted from the source code.
>
> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>   has to consult the MAP_INFO structure that is being passed in from the
>   caller.) In addition:
>
>   . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>     allocation was done separately in AllocateBuffer, so we do not
>     release the pages. Otherwise, we do release the pages.
>
>   . MapInfo is linked back on the internal free list (see above). It is
>     *never* released with FreePool().
>
>   This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>   re-set the memory encryption mask) without changing the UEFI memory
>   map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>   tables internally when it *re*sets the encryption mask -- is that
>   correct?)
>
> I'm sorry that I didn't catch this earlier in your commit f9d129e68a45
> ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06), but as you see, I gave you
> only an Acked-by on that, not a Reviewed-by. I've really had to think it
> through from the virtio perspective; I didn't foresee this use case back
> then (I only felt that I wasn't getting the full picture about the IOMMU
> protocol details).
>

I have to concur with Laszlo here: the respective semantics of the
allocate/map/unmap/free operations should be identical to their
counterparts in the PCI I/O protocol, and in the DmaLib in
EmbeddedPkg.

Note that this is likely to cause problems with proprietary x86
drivers in option ROMs, which are used to getting away with using host
addresses for DMA, and fail to invoke Map() for common buffers (or
fail to invoke AllocateBuffer() altogether). The API is clear, and
drivers that abide by the rules should operate correctly even when
using encrypted memory or non-1:1 mapping between the host and PCI
address space (which is how I ran into these issues)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 17:56       ` Ard Biesheuvel
@ 2017-07-27 19:00         ` Brijesh Singh
  2017-07-27 20:55           ` Brijesh Singh
  2017-07-28 13:38           ` Laszlo Ersek
  0 siblings, 2 replies; 28+ messages in thread
From: Brijesh Singh @ 2017-07-27 19:00 UTC (permalink / raw)
  To: Ard Biesheuvel, Laszlo Ersek
  Cc: brijesh.singh, edk2-devel@lists.01.org, Tom Lendacky,
	Jordan Justen, Jason Wang, Michael S . Tsirkin, Gerd Hoffmann



On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 07/27/17 16:21, Brijesh Singh wrote:
>>> Hi Laszlo,
>>>
>>>
>>>>
>>>> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM
>>>> simply in parallel with VIRTIO_F_VERSION_1, and don't act upon
>>>> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically
>>>> this is just my point (3) from above.
>>>>
>>>>
>>>> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
>>>>
>>>> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>>>>     virtio-pci devices, and offers virtio 0.9.5 semantics.
>>>>
>>>> - "ArmVirtPkg/VirtioFdtDxe" (via
>>>>     "OvmfPkg/Library/VirtioMmioDeviceLib") binds virtio-mmio devices,
>>>>     and offers virtio 0.9.5 semantics.
>>>>
>>>> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and
>>>>     offers virtio 1.0.0 semantics.
>>>>
>>>> The first two drivers should implement the AllocateSharedPages() and
>>>> FreeSharedPages() member functions simply with the corresponding
>>>> MemoryAllocationLib functions (using BootServicesData type memory),
>>>> and implement the MapSharedPages() and UnmapSharedPages() member
>>>> functions as no-ops (return the input addresses transparently).
>>>>
>>>> The third driver should implement all four new member functions by
>>>> respectively delegating the job to:
>>>> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
>>>> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
>>>> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
>>>> - EFI_PCI_IO_PROTOCOL.Unmap()
>>>>
>>>
>>> I have working to implement patch per your recommendation. I assume
>>> you mean map the buffers with EfiPciIoOperationBusMasterCommonBuffer
>>> [1].
>>
>> Yes.
>>
>>> If so, I see one issue with SEV guest. When SEV is active, IOMMU
>>> driver uses a bounce buffer to map host address to a device address.
>>> While creating bounce buffer we can map it either for
>>> EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
>>> Operation. If caller wants to map
>>> EfiPciIoOperationBusMasterCommonBuffer then it must allocate the
>>> buffer using EFI_PCI_IO_PROTOCOL.AllocateBuffer() [2] otherwise we
>>> will fail to map.
>>
>> Correct.
>>
>>> I see that PciRootBridgeIo.c has similar check when using a bounce
>>> buffer for < 4GB use cases [3].
>>
>> Correct.
>>
>>> Do you see any issue if we use EfiPciIoOperationBusMasterRead or
>>> EfiPciIoOperationBusMasterWrite instead of
>>> EfiPciIoOperationBusMasterCommonBuffer ?
>>
>> Yes, I do. The BusMasterRead and BusMasterWrite operations are suitable
>> for one-shot, uni-directional transfers, where the bounce buffer
>> contents and the client code contents are exchanged on Map/Unmap.
>>
>> However virtio transfers, generally speaking, are not like this. While
>> the individual memory areas pointed-to by separate virtio descriptors
>> can me associated with one specific transfer direction (see the
>> VRING_DESC_F_WRITE flag), the virtio ring area itself is long-lived, and
>> it is simultaneously read and written by both host and guest,
>> asynchronously and repeatedly. This calls for BusMasterCommonBuffer.
>> Once we implement BusMasterCommonBuffer, we can use it for everything
>> else.
>>
>>
>> Another reason why separate allocation and mapping (and conversely,
>> separate unmapping and deallocation) are required is the
>> ExitBootServices() callbacks. In that callback, unmapping must happen
>> *without* memory allocation or deallocation (because the IOMMU must be
>> de-programmed, but the UEFI memmap must not be changed), covering all
>> memory areas that are at that point shared between host and guest
>> (regardless of transfer direction).
>>
>> Normally, Map allocates the bounce buffer internally, and Unmap releases
>> the bounce buffer internally (for BusMasterRead and BusMasterWrite),
>> which is not right here. If you use AllocateBuffer() separately, then
>> Map() -- with BusMasterCommonBuffer -- will not allocate internally, and
>> Unmap() will also not deallocate internally. So, in the
>> ExitBootServices() callback, you will be able to call Unmap() only, and
>> then *not* call FreeBuffer() at all.
>>
>> This is why I suggest introducing all four functions (Allocate, Free,
>> Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio drivers
>> should use all four functions explicitly, not just Map and Unmap.
>>
>> ... Actually, I think there is a bug in
>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following distribution of
>> operations at the moment:
>>
>> - IoMmuAllocateBuffer() allocates pages and clears the memory encryption
>>    mask
>>
>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and deallocates
>>    pages
>>
>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is requested
>>    (and IoMmuAllocateBuffer() was called previously). Otherwise,
>>    IoMmuMap() allocates pages, allocates MAP_INFO, and clears the memory
>>    encryption mask.
>>
>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>    operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>    encryption mask, and frees both the pages and MAP_INFO.
>>

I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And introduce
list to track if the buffer was allocated by us. If buffer was allocated by
us then we can safely say that it does not require a bounce buffer. While
implementing the initial code I was not able to see BusMasterCommonBuffer
usage. But with virtio things are getting a bit more clear in my mind.

>> This distribution of operations seems wrong. The key point is that
>> AllocateBuffer() *need not* result in a buffer that is immediately
>> usable, and that client code is required to call Map()
>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>> operation. Therefore, the right distribution of operations is:
>>
>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>    encryption mask..
>>
>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>    mask.
>>

Actually one of main reason why we cleared and restored the memory encryption mask
during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
as a method to allocate and free a DMA buffer. I am certainly open to suggestions.

[1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
[2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197

>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>    requested, and it allocates pages (bounce buffer) otherwise.
>>

I am trying to wrap my head around how we can support BusMasterCommonBuffer
when buffer was not allocated by us. Changing the memory encryption mask in
a page table will not update the contents. Also since the memory encryption
mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).

>>    *Regardless* of BusMaster operation, the following actions are carried
>>    out unconditionally:
>>
>>    . the memory encryption mask is cleared in this function (and in this
>>      function only),
>>
>>    . An attempt is made to grab a MAP_INFO structure from an internal
>>      free list (to be introduced!). The head of the list is a new static
>>      variable. If the free list is empty, then a MAP_INFO structure is
>>      allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>      and can be deleted from the source code.
>>
>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>    has to consult the MAP_INFO structure that is being passed in from the
>>    caller.) In addition:
>>
>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>      allocation was done separately in AllocateBuffer, so we do not
>>      release the pages. Otherwise, we do release the pages.
>>
>>    . MapInfo is linked back on the internal free list (see above). It is
>>      *never* released with FreePool().
>>
>>    This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>    re-set the memory encryption mask) without changing the UEFI memory
>>    map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>    tables internally when it *re*sets the encryption mask -- is that
>>    correct?)

Yes, MemEncryptSevSetPageEncMask() will not split pages when it restores mask.

>>>> I'm sorry that I didn't catch this earlier in your commit f9d129e68a45
>> ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06), but as you see, I gave you
>> only an Acked-by on that, not a Reviewed-by. I've really had to think it
>> through from the virtio perspective; I didn't foresee this use case back
>> then (I only felt that I wasn't getting the full picture about the IOMMU
>> protocol details).
>>

> 
> I have to concur with Laszlo here: the respective semantics of the
> allocate/map/unmap/free operations should be identical to their
> counterparts in the PCI I/O protocol, and in the DmaLib in
> EmbeddedPkg.
> 
> Note that this is likely to cause problems with proprietary x86
> drivers in option ROMs, which are used to getting away with using host
> addresses for DMA, and fail to invoke Map() for common buffers (or
> fail to invoke AllocateBuffer() altogether). The API is clear, and
> drivers that abide by the rules should operate correctly even when
> using encrypted memory or non-1:1 mapping between the host and PCI
> address space (which is how I ran into these issues)
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 19:00         ` Brijesh Singh
@ 2017-07-27 20:55           ` Brijesh Singh
  2017-07-27 21:31             ` Ard Biesheuvel
  2017-07-28 13:38           ` Laszlo Ersek
  1 sibling, 1 reply; 28+ messages in thread
From: Brijesh Singh @ 2017-07-27 20:55 UTC (permalink / raw)
  To: Ard Biesheuvel, Laszlo Ersek
  Cc: brijesh.singh, edk2-devel@lists.01.org, Tom Lendacky,
	Jordan Justen, Jason Wang, Michael S . Tsirkin, Gerd Hoffmann



On 07/27/2017 02:00 PM, Brijesh Singh wrote:

>>> This distribution of operations seems wrong. The key point is that
>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>> usable, and that client code is required to call Map()
>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>> operation. Therefore, the right distribution of operations is:
>>>
>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>    encryption mask..
>>>
>>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>>    mask.
>>>
> 
> Actually one of main reason why we cleared and restored the memory encryption mask
> during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
> as a method to allocate and free a DMA buffer. I am certainly open to suggestions.
> 
> [1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
> [2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
> 
>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>
> 
> I am trying to wrap my head around how we can support BusMasterCommonBuffer
> when buffer was not allocated by us. Changing the memory encryption mask in
> a page table will not update the contents. Also since the memory encryption
> mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
> 

I may be missing something in my understanding. Here is a flow I have in my
mind, please correct me.

OvmfPkg/VirtIoBlk.c:

VirtioBlkInit()
   ....
   ....
   VirtioRingInit
     Virtio->AllocateSharedPages(RingSize, &Ring->Base)
       PciIo->AllocatePages(RingSize, &RingAddress)
     Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base, RingSize, &RingDeviceAddress)
     .....
     .....

This case is straight forward and we can easily maps. No need for bounce buffering.

VirtioBlkReadBlocks(..., BufferSize, Buffer,)
   ......
   ......
   SynchronousRequest(..., BufferSize, Buffer)
     ....
     Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer, BufferSize, &DeviceAddress)
     VirtioAppendDesc(DeviceAddress, BufferSize, ...)
     VirtioFlush (...)
     

In the above case, "Buffer" was not allocated by us hence we will not able to change the
memory encryption attributes. Am I missing something in the flow ?


>>>    *Regardless* of BusMaster operation, the following actions are carried
>>>    out unconditionally:
>>>
>>>    . the memory encryption mask is cleared in this function (and in this
>>>      function only),
>>>
>>>    . An attempt is made to grab a MAP_INFO structure from an internal
>>>      free list (to be introduced!). The head of the list is a new static
>>>      variable. If the free list is empty, then a MAP_INFO structure is
>>>      allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>>      and can be deleted from the source code.
>>>
>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>>    has to consult the MAP_INFO structure that is being passed in from the
>>>    caller.) In addition:
>>>
>>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>>      allocation was done separately in AllocateBuffer, so we do not
>>>      release the pages. Otherwise, we do release the pages.
>>>
>>>    . MapInfo is linked back on the internal free list (see above). It is
>>>      *never* released with FreePool().
>>>
>>>    This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>>    re-set the memory encryption mask) without changing the UEFI memory
>>>    map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>>    tables internally when it *re*sets the encryption mask -- is that
>>>    correct?)
> 





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 20:55           ` Brijesh Singh
@ 2017-07-27 21:31             ` Ard Biesheuvel
  2017-07-27 21:38               ` Andrew Fish
  2017-07-27 22:10               ` Brijesh Singh
  0 siblings, 2 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2017-07-27 21:31 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Laszlo Ersek, edk2-devel@lists.01.org, Tom Lendacky,
	Jordan Justen, Jason Wang, Michael S . Tsirkin, Gerd Hoffmann


> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
> 
> 
> 
> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
> 
>>>> This distribution of operations seems wrong. The key point is that
>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>> usable, and that client code is required to call Map()
>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>> operation. Therefore, the right distribution of operations is:
>>>> 
>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>   encryption mask..
>>>> 
>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>>>   mask.
>>>> 
>> Actually one of main reason why we cleared and restored the memory encryption mask
>> during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
>> as a method to allocate and free a DMA buffer. I am certainly open to suggestions.
>> [1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>> [2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>   requested, and it allocates pages (bounce buffer) otherwise.
>>>> 
>> I am trying to wrap my head around how we can support BusMasterCommonBuffer
>> when buffer was not allocated by us. Changing the memory encryption mask in
>> a page table will not update the contents. Also since the memory encryption
>> mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
> 
> I may be missing something in my understanding. Here is a flow I have in my
> mind, please correct me.
> 
> OvmfPkg/VirtIoBlk.c:
> 
> VirtioBlkInit()
>  ....
>  ....
>  VirtioRingInit
>    Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>      PciIo->AllocatePages(RingSize, &RingAddress)
>    Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base, RingSize, &RingDeviceAddress)
>    .....
>    .....
> 
> This case is straight forward and we can easily maps. No need for bounce buffering.
> 
> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>  ......
>  ......
>  SynchronousRequest(..., BufferSize, Buffer)
>    ....
>    Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer, BufferSize, &DeviceAddress)
>    VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>    VirtioFlush (...)
>    
> In the above case, "Buffer" was not allocated by us hence we will not able to change the
> memory encryption attributes. Am I missing something in the flow ?
> 


Common buffer mappings may only be created from buffers that were allocated by AllocateBuffer(). In fact, that is its main purpose
> 
>>>>   *Regardless* of BusMaster operation, the following actions are carried
>>>>   out unconditionally:
>>>> 
>>>>   . the memory encryption mask is cleared in this function (and in this
>>>>     function only),
>>>> 
>>>>   . An attempt is made to grab a MAP_INFO structure from an internal
>>>>     free list (to be introduced!). The head of the list is a new static
>>>>     variable. If the free list is empty, then a MAP_INFO structure is
>>>>     allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>>>     and can be deleted from the source code.
>>>> 
>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>>>   has to consult the MAP_INFO structure that is being passed in from the
>>>>   caller.) In addition:
>>>> 
>>>>   . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>>>     allocation was done separately in AllocateBuffer, so we do not
>>>>     release the pages. Otherwise, we do release the pages.
>>>> 
>>>>   . MapInfo is linked back on the internal free list (see above). It is
>>>>     *never* released with FreePool().
>>>> 
>>>>   This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>>>   re-set the memory encryption mask) without changing the UEFI memory
>>>>   map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>>>   tables internally when it *re*sets the encryption mask -- is that
>>>>   correct?)
> 
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 21:31             ` Ard Biesheuvel
@ 2017-07-27 21:38               ` Andrew Fish
  2017-07-27 22:13                 ` Brijesh Singh
  2017-07-27 22:10               ` Brijesh Singh
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Fish @ 2017-07-27 21:38 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Tom Lendacky, Michael S . Tsirkin, Jordan Justen,
	edk2-devel@lists.01.org, Laszlo Ersek, Jason Wang, Ard Biesheuvel


> On Jul 27, 2017, at 2:31 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> 
>> 
>> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
>> 
>> 
>> 
>> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
>> 
>>>>> This distribution of operations seems wrong. The key point is that
>>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>>> usable, and that client code is required to call Map()
>>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>>> operation. Therefore, the right distribution of operations is:
>>>>> 
>>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>>  encryption mask..
>>>>> 
>>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>>>>  mask.
>>>>> 
>>> Actually one of main reason why we cleared and restored the memory encryption mask
>>> during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
>>> as a method to allocate and free a DMA buffer. I am certainly open to suggestions.
>>> [1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>> [2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>>  requested, and it allocates pages (bounce buffer) otherwise.
>>>>> 
>>> I am trying to wrap my head around how we can support BusMasterCommonBuffer
>>> when buffer was not allocated by us. Changing the memory encryption mask in
>>> a page table will not update the contents. Also since the memory encryption
>>> mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
>>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
>> 
>> I may be missing something in my understanding. Here is a flow I have in my
>> mind, please correct me.
>> 
>> OvmfPkg/VirtIoBlk.c:
>> 
>> VirtioBlkInit()
>> ....
>> ....
>> VirtioRingInit
>>   Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>>     PciIo->AllocatePages(RingSize, &RingAddress)
>>   Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base, RingSize, &RingDeviceAddress)
>>   .....
>>   .....
>> 
>> This case is straight forward and we can easily maps. No need for bounce buffering.
>> 
>> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>> ......
>> ......
>> SynchronousRequest(..., BufferSize, Buffer)
>>   ....
>>   Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer, BufferSize, &DeviceAddress)
>>   VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>>   VirtioFlush (...)
>> 
>> In the above case, "Buffer" was not allocated by us hence we will not able to change the
>> memory encryption attributes. Am I missing something in the flow ?
>> 
> 
> 
> Common buffer mappings may only be created from buffers that were allocated by AllocateBuffer(). In fact, that is its main purpose

Brijesh,

If you look in the UEFI Spec 13.4 EFI PCI I/O Protocol there is a good write on DMA. 

DMA Bus Master Read Operation
=========================
Call Map() for EfiPciIoOperationBusMasterRead.
Program the DMA Bus Master with the DeviceAddress returned by Map(). Start the DMA Bus Master.
Wait for DMA Bus Master to complete the read operation.
Call Unmap().


DMA Bus Master Write Operation
==========================
Call Map() for EfiPciOperationBusMasterWrite.
Program the DMA Bus Master with the DeviceAddress returned by Map().
Start the DMA Bus Master.
Wait for DMA Bus Master to complete the write operation.
Perform a PCI controller specific read transaction to flush all PCI write buffers (See PCI Specification Section 3.2.5.2) .
Call Flush().
Call Unmap().

DMA Bus Master Common Buffer Operation
==================================
Call AllocateBuffer() to allocate a common buffer.
Call Map() for EfiPciIoOperationBusMasterCommonBuffer.
Program the DMA Bus Master with the DeviceAddress returned by Map().
The common buffer can now be accessed equally by the processor and the DMA bus master. Call Unmap().
Call FreeBuffer().

Thanks,

Andrew Fish


>> 
>>>>>  *Regardless* of BusMaster operation, the following actions are carried
>>>>>  out unconditionally:
>>>>> 
>>>>>  . the memory encryption mask is cleared in this function (and in this
>>>>>    function only),
>>>>> 
>>>>>  . An attempt is made to grab a MAP_INFO structure from an internal
>>>>>    free list (to be introduced!). The head of the list is a new static
>>>>>    variable. If the free list is empty, then a MAP_INFO structure is
>>>>>    allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>>>>    and can be deleted from the source code.
>>>>> 
>>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>>>>  has to consult the MAP_INFO structure that is being passed in from the
>>>>>  caller.) In addition:
>>>>> 
>>>>>  . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>>>>    allocation was done separately in AllocateBuffer, so we do not
>>>>>    release the pages. Otherwise, we do release the pages.
>>>>> 
>>>>>  . MapInfo is linked back on the internal free list (see above). It is
>>>>>    *never* released with FreePool().
>>>>> 
>>>>>  This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>>>>  re-set the memory encryption mask) without changing the UEFI memory
>>>>>  map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>>>>  tables internally when it *re*sets the encryption mask -- is that
>>>>>  correct?)
>> 
>> 
>> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 21:31             ` Ard Biesheuvel
  2017-07-27 21:38               ` Andrew Fish
@ 2017-07-27 22:10               ` Brijesh Singh
  2017-07-28  8:39                 ` Ard Biesheuvel
  1 sibling, 1 reply; 28+ messages in thread
From: Brijesh Singh @ 2017-07-27 22:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: brijesh.singh, Laszlo Ersek, edk2-devel@lists.01.org,
	Tom Lendacky, Jordan Justen, Jason Wang, Michael S . Tsirkin,
	Gerd Hoffmann



On 07/27/2017 04:31 PM, Ard Biesheuvel wrote:
> 
>> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>
>>
>>
>> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
>>
>>>>> This distribution of operations seems wrong. The key point is that
>>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>>> usable, and that client code is required to call Map()
>>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>>> operation. Therefore, the right distribution of operations is:
>>>>>
>>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>>    encryption mask..
>>>>>
>>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>>>>    mask.
>>>>>
>>> Actually one of main reason why we cleared and restored the memory encryption mask
>>> during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
>>> as a method to allocate and free a DMA buffer. I am certainly open to suggestions.
>>> [1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>> [2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>>
>>> I am trying to wrap my head around how we can support BusMasterCommonBuffer
>>> when buffer was not allocated by us. Changing the memory encryption mask in
>>> a page table will not update the contents. Also since the memory encryption
>>> mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
>>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
>>
>> I may be missing something in my understanding. Here is a flow I have in my
>> mind, please correct me.
>>
>> OvmfPkg/VirtIoBlk.c:
>>
>> VirtioBlkInit()
>>   ....
>>   ....
>>   VirtioRingInit
>>     Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>>       PciIo->AllocatePages(RingSize, &RingAddress)
>>     Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base, RingSize, &RingDeviceAddress)
>>     .....
>>     .....
>>
>> This case is straight forward and we can easily maps. No need for bounce buffering.
>>
>> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>>   ......
>>   ......
>>   SynchronousRequest(..., BufferSize, Buffer)
>>     ....
>>     Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer, BufferSize, &DeviceAddress)
>>     VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>>     VirtioFlush (...)
>>     
>> In the above case, "Buffer" was not allocated by us hence we will not able to change the
>> memory encryption attributes. Am I missing something in the flow ?
>>
> 
> 
> Common buffer mappings may only be created from buffers that were allocated by AllocateBuffer(). In fact, that is its main purpose

Yes, that part is well understood. If the buffer was allocated by us (e.g vring, request/status
structure etc) then those should be mapped as "BusMasterCommonBuffer".

But I am trying to figure out, how to map a data buffers before issuing a virtio request. e.g when
VirtioBlkReadBlocks() is called, "Buffer" pointer is not a DMA address hence we need to map it.
I think it should be mapped using "BusMasterWrite" not "BusMasterCommonBuffer" before adding into vring.


>>
>>>>>    *Regardless* of BusMaster operation, the following actions are carried
>>>>>    out unconditionally:
>>>>>
>>>>>    . the memory encryption mask is cleared in this function (and in this
>>>>>      function only),
>>>>>
>>>>>    . An attempt is made to grab a MAP_INFO structure from an internal
>>>>>      free list (to be introduced!). The head of the list is a new static
>>>>>      variable. If the free list is empty, then a MAP_INFO structure is
>>>>>      allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>>>>      and can be deleted from the source code.
>>>>>
>>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>>>>    has to consult the MAP_INFO structure that is being passed in from the
>>>>>    caller.) In addition:
>>>>>
>>>>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>>>>      allocation was done separately in AllocateBuffer, so we do not
>>>>>      release the pages. Otherwise, we do release the pages.
>>>>>
>>>>>    . MapInfo is linked back on the internal free list (see above). It is
>>>>>      *never* released with FreePool().
>>>>>
>>>>>    This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>>>>    re-set the memory encryption mask) without changing the UEFI memory
>>>>>    map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>>>>    tables internally when it *re*sets the encryption mask -- is that
>>>>>    correct?)
>>
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 21:38               ` Andrew Fish
@ 2017-07-27 22:13                 ` Brijesh Singh
  0 siblings, 0 replies; 28+ messages in thread
From: Brijesh Singh @ 2017-07-27 22:13 UTC (permalink / raw)
  To: Andrew Fish
  Cc: brijesh.singh, Tom Lendacky, Michael S . Tsirkin, Jordan Justen,
	edk2-devel@lists.01.org, Laszlo Ersek, Jason Wang, Ard Biesheuvel

Hi Andrew,

On 07/27/2017 04:38 PM, Andrew Fish wrote:
> 
> Brijesh,
> 
> If you look in the UEFI Spec 13.4 EFI PCI I/O Protocol there is a good write on DMA.
> 
> DMA Bus Master Read Operation
> =========================
> Call Map() for EfiPciIoOperationBusMasterRead.
> Program the DMA Bus Master with the DeviceAddress returned by Map(). Start the DMA Bus Master.
> Wait for DMA Bus Master to complete the read operation.
> Call Unmap().
> 
> 
> DMA Bus Master Write Operation
> ==========================
> Call Map() for EfiPciOperationBusMasterWrite.
> Program the DMA Bus Master with the DeviceAddress returned by Map().
> Start the DMA Bus Master.
> Wait for DMA Bus Master to complete the write operation.
> Perform a PCI controller specific read transaction to flush all PCI write buffers (See PCI Specification Section 3.2.5.2) .
> Call Flush().
> Call Unmap().
> 
> DMA Bus Master Common Buffer Operation
> ==================================
> Call AllocateBuffer() to allocate a common buffer.
> Call Map() for EfiPciIoOperationBusMasterCommonBuffer.
> Program the DMA Bus Master with the DeviceAddress returned by Map().
> The common buffer can now be accessed equally by the processor and the DMA bus master. Call Unmap().
> Call FreeBuffer().
> 

Thanks for the link.

-Brijesh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 22:10               ` Brijesh Singh
@ 2017-07-28  8:39                 ` Ard Biesheuvel
  2017-07-28 15:27                   ` Laszlo Ersek
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2017-07-28  8:39 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Laszlo Ersek, edk2-devel@lists.01.org, Tom Lendacky,
	Jordan Justen, Jason Wang, Michael S . Tsirkin, Gerd Hoffmann

On 27 July 2017 at 23:10, Brijesh Singh <brijesh.singh@amd.com> wrote:
>
>
> On 07/27/2017 04:31 PM, Ard Biesheuvel wrote:
>>
>>
>>> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>>
>>>
>>>
>>> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
>>>
>>>>>> This distribution of operations seems wrong. The key point is that
>>>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>>>> usable, and that client code is required to call Map()
>>>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>>>> operation. Therefore, the right distribution of operations is:
>>>>>>
>>>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>>>    encryption mask..
>>>>>>
>>>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>>>> encryption
>>>>>>    mask.
>>>>>>
>>>> Actually one of main reason why we cleared and restored the memory
>>>> encryption mask
>>>> during allocate/free is because we also consume the IOMMU protocol in
>>>> QemuFwCfgLib
>>>> as a method to allocate and free a DMA buffer. I am certainly open to
>>>> suggestions.
>>>> [1]
>>>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>>> [2]
>>>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>>>>
>>>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>>>
>>>> I am trying to wrap my head around how we can support
>>>> BusMasterCommonBuffer
>>>> when buffer was not allocated by us. Changing the memory encryption mask
>>>> in
>>>> a page table will not update the contents. Also since the memory
>>>> encryption
>>>> mask works on PAGE_SIZE hence changing the encryption mask on not our
>>>> allocated
>>>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE
>>>> aligned).
>>>
>>>
>>> I may be missing something in my understanding. Here is a flow I have in
>>> my
>>> mind, please correct me.
>>>
>>> OvmfPkg/VirtIoBlk.c:
>>>
>>> VirtioBlkInit()
>>>   ....
>>>   ....
>>>   VirtioRingInit
>>>     Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>>>       PciIo->AllocatePages(RingSize, &RingAddress)
>>>     Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base,
>>> RingSize, &RingDeviceAddress)
>>>     .....
>>>     .....
>>>
>>> This case is straight forward and we can easily maps. No need for bounce
>>> buffering.
>>>
>>> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>>>   ......
>>>   ......
>>>   SynchronousRequest(..., BufferSize, Buffer)
>>>     ....
>>>     Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer,
>>> BufferSize, &DeviceAddress)
>>>     VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>>>     VirtioFlush (...)
>>>     In the above case, "Buffer" was not allocated by us hence we will not
>>> able to change the
>>> memory encryption attributes. Am I missing something in the flow ?
>>>
>>
>>
>> Common buffer mappings may only be created from buffers that were
>> allocated by AllocateBuffer(). In fact, that is its main purpose
>
>
> Yes, that part is well understood. If the buffer was allocated by us (e.g
> vring, request/status
> structure etc) then those should be mapped as "BusMasterCommonBuffer".
>
> But I am trying to figure out, how to map a data buffers before issuing a
> virtio request. e.g when
> VirtioBlkReadBlocks() is called, "Buffer" pointer is not a DMA address hence
> we need to map it.
> I think it should be mapped using "BusMasterWrite" not
> "BusMasterCommonBuffer" before adding into vring.
>

If the transfer is strictly unidirectional, then that should work. If
the transfer goes both ways, you may need to map/unmap for read and
then map/unmap for write


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-27 19:00         ` Brijesh Singh
  2017-07-27 20:55           ` Brijesh Singh
@ 2017-07-28 13:38           ` Laszlo Ersek
  2017-07-28 16:00             ` Brijesh Singh
  1 sibling, 1 reply; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-28 13:38 UTC (permalink / raw)
  To: Brijesh Singh, Ard Biesheuvel
  Cc: edk2-devel@lists.01.org, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann

On 07/27/17 21:00, Brijesh Singh wrote:
> On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
>> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:

>>> Normally, Map allocates the bounce buffer internally, and Unmap
>>> releases the bounce buffer internally (for BusMasterRead and
>>> BusMasterWrite), which is not right here. If you use
>>> AllocateBuffer() separately, then Map() -- with
>>> BusMasterCommonBuffer -- will not allocate internally, and Unmap()
>>> will also not deallocate internally. So, in the ExitBootServices()
>>> callback, you will be able to call Unmap() only, and then *not* call
>>> FreeBuffer() at all.
>>>
>>> This is why I suggest introducing all four functions (Allocate,
>>> Free, Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio
>>> drivers should use all four functions explicitly, not just Map and
>>> Unmap.
>>>
>>> ... Actually, I think there is a bug in
>>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following
>>> distribution of operations at the moment:
>>>
>>> - IoMmuAllocateBuffer() allocates pages and clears the memory
>>>   encryption mask
>>>
>>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and
>>>   deallocates pages
>>>
>>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is
>>>   requested (and IoMmuAllocateBuffer() was called previously).
>>>   Otherwise, IoMmuMap() allocates pages, allocates MAP_INFO, and
>>>   clears the memory encryption mask.
>>>
>>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>>   operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>>   encryption mask, and frees both the pages and MAP_INFO.
>>>
>
> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
> introduce list to track if the buffer was allocated by us. If buffer
> was allocated by us then we can safely say that it does not require a
> bounce buffer. While implementing the initial code I was not able to
> see BusMasterCommonBuffer usage. But with virtio things are getting a
> bit more clear in my mind.

The purpose of the internal list is a little bit different -- it is a
"free list", not a tracker list.

Under the proposed scheme, a MAP_INFO structure will have to be
allocated for *all* mappings: both for CommonBuffer (where the actual
pages come from the caller, who retrieved them earlier with an
AllocateBuffer() call), and for Read/Write (where the actual pages will
be allocated internally to Map). Allocating the MAP_INFO structure in
Map() is necessary in *both* cases because the Unmap() function can
*only* consult this MAP_INFO structure to learn the address and the size
at which it has to re-set the memory encryption mask. This is because
the Unmap() function gets no other input parameter.

Then, regardless of the original operation (CommonBuffer vs.
Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
CommonBuffer, the actual pages are owned by the caller, so we don't free
them here; for Read/Write, the actual pages are owned by Map/Unmap, so
we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
point is that MAP_INFO cannot be released with FreePool() because that
could change the UEFI memmap during ExitBootServices() processing. So
MAP_INFO has to be "released" to an internal *free* list.

And since we have an internal free list, the original MAP_INFO
allocation in Map() should consult the free list first, and only if it
is empty should it fall back to AllocatePool().

Whether the actual pages tracked by MAP_INFO were allocated internally
by Map(), or externally by the caller (via AllocateBuffer()) is an
independent question. (And it can be seen from the MAP_INFO.Operation
field.)

Does this make it clearer?

>
>>> This distribution of operations seems wrong. The key point is that
>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>> usable, and that client code is required to call Map()
>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>> operation. Therefore, the right distribution of operations is:
>>>
>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>   encryption mask..
>>>
>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>   encryption mask.
>>>
>
> Actually one of main reason why we cleared and restored the memory
> encryption mask during allocate/free is because we also consume the
> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a DMA
> buffer. I am certainly open to suggestions.

Argh. That's again my fault then, I should have noticed it. :( I
apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
for me as well.

As discussed earlier (and confirmed by Ard), calling *just*
AllocateBuffer() is never enough, Map()/Unmap() are always necessary.

So here's what I suggest (to keep the changes localized):

- Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
  function to output a "VOID *Mapping" parameter as well, in addition to
  the address.

- Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
  function to take a "VOID *Mapping" parameter in addition to the buffer
  address.

- In the DXE implementation of InternalQemuFwCfgSevDmaAllocateBuffer(),
  keep the current IOMMU AllocateBuffer() call, but also call IOMMU
  Map(), with CommonBuffer. Furthermore, propagate the Mapping output of
  Map() outwards. (Note that Map() may have to be called in a loop,
  dependent on "NumberOfBytes".)

- In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(), call
  the IOMMU Unmap() function first (using the new Mapping parameter).

>
> [1]
> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>
> [2]
> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>
>
>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>   requested, and it allocates pages (bounce buffer) otherwise.
>>>
>
> I am trying to wrap my head around how we can support
> BusMasterCommonBuffer when buffer was not allocated by us. Changing
> the memory encryption mask in a page table will not update the
> contents.

Thank you for the clarification. I've now tried to review the current
code for a better understanding. Are the below statements correct?

- For the guest, guest memory is always readable transparently.
- For the host, guest memory is always readable *as it was written
  last*.
- If the guest memory was last written with the C bit clear, then the
  host can read it as plaintext (regardless of the current state of the
  C bit).
- If the guest memory was last written with the C bit set, then the host
  can only read garbage (regardless of the current state of the C bit).

*If* this is the case, then:

(a) We already have a sort of guest->host information leak. Namely, in
IoMmuFreeBuffer() [OvmfPkg/IoMmuDxe/AmdSevIoMmu.c], the C bit is
restored, and the pages are released with gBS->FreePages(). However, the
pages being released are not rewritten in place (they are not actually
re-encrypted, only marked for encryption whenever they are next written
to). This means that pages released like this remain readable to the
hypervisor for an unpredictable time.

This is not necessarily a critical problem (after all the contents of
those pages were visible to the hypervisor at some point anyway!); but
in the longer term, such pages could accumulate, and if the hypervisor
is compromised later, it could potentially read an unbounded amount of
"past guest data" as plain-text.

(b) Plus, approaching the question from the Map() direction, we need to
consider two scenarios:

- Client code calls AllocateBuffer(), then Map(), and it writes to the
  buffer only then. This should be safe.
- client code calls AllocateBuffer(), writes to it, and then calls
  Map(). This will result in memory contents that look like garbage to
  the hypervisor. Bad.

I can imagine the following to handle these cases: in the Map() and
Unmap() functions, we have to decrypt and encrypt the memory contents
in-place, after changing the C bit (without allocating additional
memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes (this
will always remain in encrypted memory). Update the C bit with a single
function call for the entire range (like now) -- this will not affect
the guest-readability of the pages --, then bounce each page within the
range to the static buffer and back to its original place. In effect
this will in-place encrypt or decrypt the memory, and will be faster
than a byte-wise rewrite.

* BusMasterRead (host will want to read guest memory):
  - Client calls Map() without calling AllocateBuffer() first. Map()
    allocates an internal bounce buffer, clears the C bit, and does the
    copying.
  - Client fires off the PCI transaction.
  - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
    restores the C-bit, *zeros* the pages, and releases the pages.

* BusMasterWrite (host will want to write to guest memory):
  - Client calls Map() without calling AllocateBuffer() first. Map()
    allocates an internal bounce buffer and clears the C bit.
  - Client fires off the PCI transaction.
  - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
    copies the bounce buffer to crpyted (client-owned) memory, restores
    the C-bit, *zeros* the pages, and releases the pages.

* BusMasterCommonBuffer:
  - Client calls AllocateBuffer(), and places some data in the returned
    memory.
  - Client calls Map(). Map() clears the C bit in one fell swoop, and
    then decrypts the buffer in-place (by bouncing it page-wise to the
    static array and back).
  - Client communicates with the device.
  - Client calls Unmap(). Unmap() restores the C bit in one fell swoop,
    and encrypts the buffer in-place (by bouncing it page-wise to the
    static array and back).
  - Client reads some residual data from the buffer.
  - Client calls FreeBuffer(). FreeBuffer() relases the pages.

This is suitable for the discussed ExitBootServices() callback update
too, where the callback will reset the virtio device (like now) and then
call Unmap() for all shared areas, without calling FreeBuffer() on them.
The above logic will re-encrypt the contents without affecting the UEFI
memmap.

> Also since the memory encryption mask works on PAGE_SIZE hence
> changing the encryption mask on not our allocated buffer could mess
> things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).

This is not a problem for BusMasterRead and BusMasterWrite because for
those operations, Map() allocates the internal bounce buffer. Also not a
problem for BusMasterCommonBuffer, because then the client first calls
AllocateBuffer (whole number of pages), and so Map() can round up the
number of bytes and de-crypt full pages in-place (see above).

>
>>>    *Regardless* of BusMaster operation, the following actions are
>>>    carried out unconditionally:
>>>
>>>    . the memory encryption mask is cleared in this function (and in
>>>      this function only),
>>>
>>>    . An attempt is made to grab a MAP_INFO structure from an
>>>      internal free list (to be introduced!). The head of the list is
>>>      a new static variable. If the free list is empty, then a
>>>      MAP_INFO structure is allocated with AllocatePool(). The
>>>      NO_MAPPING macro becomes unused and can be deleted from the
>>>      source code.
>>>
>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For
>>>   this, it has to consult the MAP_INFO structure that is being
>>>   passed in from the caller.) In addition:
>>>
>>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know
>>>      the allocation was done separately in AllocateBuffer, so we do
>>>      not release the pages. Otherwise, we do release the pages.
>>>
>>>    . MapInfo is linked back on the internal free list (see above).
>>>      It is *never* released with FreePool().
>>>
>>>    This approach guarantees that IoMmuUnmap() can de-program the
>>>    IOMMU (= re-set the memory encryption mask) without changing the
>>>    UEFI memory map. (I trust that MemEncryptSevSetPageEncMask() will
>>>    not split page tables internally when it *re*sets the encryption
>>>    mask -- is that correct?)
>
> Yes, MemEncryptSevSetPageEncMask() will not split pages when it
> restores mask.

Great, thanks.
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-28  8:39                 ` Ard Biesheuvel
@ 2017-07-28 15:27                   ` Laszlo Ersek
  0 siblings, 0 replies; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-28 15:27 UTC (permalink / raw)
  To: Ard Biesheuvel, Brijesh Singh
  Cc: edk2-devel@lists.01.org, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann

On 07/28/17 10:39, Ard Biesheuvel wrote:
> On 27 July 2017 at 23:10, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>
>>
>> On 07/27/2017 04:31 PM, Ard Biesheuvel wrote:
>>>
>>>
>>>> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
>>>>
>>>>>>> This distribution of operations seems wrong. The key point is that
>>>>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>>>>> usable, and that client code is required to call Map()
>>>>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>>>>> operation. Therefore, the right distribution of operations is:
>>>>>>>
>>>>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>>>>    encryption mask..
>>>>>>>
>>>>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>>>>> encryption
>>>>>>>    mask.
>>>>>>>
>>>>> Actually one of main reason why we cleared and restored the memory
>>>>> encryption mask
>>>>> during allocate/free is because we also consume the IOMMU protocol in
>>>>> QemuFwCfgLib
>>>>> as a method to allocate and free a DMA buffer. I am certainly open to
>>>>> suggestions.
>>>>> [1]
>>>>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>>>> [2]
>>>>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>>>>>
>>>>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>>>>
>>>>> I am trying to wrap my head around how we can support
>>>>> BusMasterCommonBuffer
>>>>> when buffer was not allocated by us. Changing the memory encryption mask
>>>>> in
>>>>> a page table will not update the contents. Also since the memory
>>>>> encryption
>>>>> mask works on PAGE_SIZE hence changing the encryption mask on not our
>>>>> allocated
>>>>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE
>>>>> aligned).
>>>>
>>>>
>>>> I may be missing something in my understanding. Here is a flow I have in
>>>> my
>>>> mind, please correct me.
>>>>
>>>> OvmfPkg/VirtIoBlk.c:
>>>>
>>>> VirtioBlkInit()
>>>>   ....
>>>>   ....
>>>>   VirtioRingInit
>>>>     Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>>>>       PciIo->AllocatePages(RingSize, &RingAddress)
>>>>     Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base,
>>>> RingSize, &RingDeviceAddress)
>>>>     .....
>>>>     .....
>>>>
>>>> This case is straight forward and we can easily maps. No need for bounce
>>>> buffering.
>>>>
>>>> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>>>>   ......
>>>>   ......
>>>>   SynchronousRequest(..., BufferSize, Buffer)
>>>>     ....
>>>>     Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer,
>>>> BufferSize, &DeviceAddress)
>>>>     VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>>>>     VirtioFlush (...)
>>>>     In the above case, "Buffer" was not allocated by us hence we will not
>>>> able to change the
>>>> memory encryption attributes. Am I missing something in the flow ?
>>>>
>>>
>>>
>>> Common buffer mappings may only be created from buffers that were
>>> allocated by AllocateBuffer(). In fact, that is its main purpose
>>
>>
>> Yes, that part is well understood. If the buffer was allocated by us (e.g
>> vring, request/status
>> structure etc) then those should be mapped as "BusMasterCommonBuffer".

Brijesh, thanks for the clarification. Previously I replied (at length)
to your paragraph that said "trying to wrap my head around...", and it
wasn't clear what you meant by "allocated by us". In my previous
response, I assumed that you meant a distinction between "allocated in
Map()" vs. "allocated in AllocateBuffer()". I stand by my earlier answer
for that (assumed) distinction, but now I see that you meant something else.

>>
>> But I am trying to figure out, how to map a data buffers before issuing a
>> virtio request. e.g when
>> VirtioBlkReadBlocks() is called, "Buffer" pointer is not a DMA address hence
>> we need to map it.
>> I think it should be mapped using "BusMasterWrite" not
>> "BusMasterCommonBuffer" before adding into vring.
>>
> 
> If the transfer is strictly unidirectional, then that should work. If
> the transfer goes both ways, you may need to map/unmap for read and
> then map/unmap for write
> 

You (Brijesh and Ard) are both right. This question depends on the
outermost interface that the specific virtio driver provides. In this
case, VirtioBlkReadBlocks() implements
EFI_BLOCK_IO_PROTOCOL.ReadBlocks(). The buffer is owned by an
independent agent, and it is guaranteed by the ReadBlocks() interface
that the transfer is unidirectional. So a standalone Map() call with
BusMasterWrite is appropriate, followed by a standalone Unmap() call
*before* VirtioBlkReadBlocks() returns. In this sense, my earlier
request to "*always* use AllocateBuffer + Map" was too strict.

However, in the *general* case, the recommendation remains the same. For
the virtio-net driver for example, the interfaces are not synchronous.
E.g., while EFI_BLOCK_IO_PROTOCOL.WriteBlocks() is synchronous,
EFI_SIMPLE_NETWORK_PROTOCOL.Transmit() is not. So, although in
VirtioNetTransmit() we might be tempted to use BusMasterRead, that's not
right, because ExitBootServices() could be called before the SNP client
collects the buffer with VirtioNetGetStatus().

The ExitBootServices() callback will have to Unmap() the area without
freeing it, and that's only possible if BusMasterCommonBuffer is used in
VirtioNetTransmit() to begin with. This means that we'll have to save
the client's data -- after updating it according to HeaderSize -- with
AllocateBuffer() in VirtioNetTransmit(), Map() *that* as
BusMasterCommonBuffer, and undo both steps in VirtioNetGetStatus(). And,
in the exit-boot-services callback, it has to be Unmap()ped only, but
not freed.

Basically it depends upon whether you can complete the entire operation
synchronously, before the outermost protocol interface returns.

I recommend that we work on the IoMmuDxe and QemuFwCfgLib updates first.
(And, my apologies again for not catching these issues immediately; as I
said, this is my first time doing non-1:1 DMA.)

Thanks,
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-28 13:38           ` Laszlo Ersek
@ 2017-07-28 16:00             ` Brijesh Singh
  2017-07-28 16:16               ` Laszlo Ersek
                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Brijesh Singh @ 2017-07-28 16:00 UTC (permalink / raw)
  To: Laszlo Ersek, Ard Biesheuvel
  Cc: brijesh.singh, edk2-devel@lists.01.org, Tom Lendacky,
	Jordan Justen, Jason Wang, Michael S . Tsirkin, Gerd Hoffmann



On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
> On 07/27/17 21:00, Brijesh Singh wrote:
>> On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
>>> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:
> 
>>>> Normally, Map allocates the bounce buffer internally, and Unmap
>>>> releases the bounce buffer internally (for BusMasterRead and
>>>> BusMasterWrite), which is not right here. If you use
>>>> AllocateBuffer() separately, then Map() -- with
>>>> BusMasterCommonBuffer -- will not allocate internally, and Unmap()
>>>> will also not deallocate internally. So, in the ExitBootServices()
>>>> callback, you will be able to call Unmap() only, and then *not* call
>>>> FreeBuffer() at all.
>>>>
>>>> This is why I suggest introducing all four functions (Allocate,
>>>> Free, Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio
>>>> drivers should use all four functions explicitly, not just Map and
>>>> Unmap.
>>>>
>>>> ... Actually, I think there is a bug in
>>>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following
>>>> distribution of operations at the moment:
>>>>
>>>> - IoMmuAllocateBuffer() allocates pages and clears the memory
>>>>    encryption mask
>>>>
>>>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and
>>>>    deallocates pages
>>>>
>>>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is
>>>>    requested (and IoMmuAllocateBuffer() was called previously).
>>>>    Otherwise, IoMmuMap() allocates pages, allocates MAP_INFO, and
>>>>    clears the memory encryption mask.
>>>>
>>>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>>>    operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>>>    encryption mask, and frees both the pages and MAP_INFO.
>>>>
>>
>> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
>> introduce list to track if the buffer was allocated by us. If buffer
>> was allocated by us then we can safely say that it does not require a
>> bounce buffer. While implementing the initial code I was not able to
>> see BusMasterCommonBuffer usage. But with virtio things are getting a
>> bit more clear in my mind.
> 
> The purpose of the internal list is a little bit different -- it is a
> "free list", not a tracker list.
> 
> Under the proposed scheme, a MAP_INFO structure will have to be
> allocated for *all* mappings: both for CommonBuffer (where the actual
> pages come from the caller, who retrieved them earlier with an
> AllocateBuffer() call), and for Read/Write (where the actual pages will
> be allocated internally to Map). Allocating the MAP_INFO structure in
> Map() is necessary in *both* cases because the Unmap() function can
> *only* consult this MAP_INFO structure to learn the address and the size
> at which it has to re-set the memory encryption mask. This is because
> the Unmap() function gets no other input parameter.
> 
> Then, regardless of the original operation (CommonBuffer vs.
> Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
> CommonBuffer, the actual pages are owned by the caller, so we don't free
> them here; for Read/Write, the actual pages are owned by Map/Unmap, so
> we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
> point is that MAP_INFO cannot be released with FreePool() because that
> could change the UEFI memmap during ExitBootServices() processing. So
> MAP_INFO has to be "released" to an internal *free* list.
> 
> And since we have an internal free list, the original MAP_INFO
> allocation in Map() should consult the free list first, and only if it
> is empty should it fall back to AllocatePool().
> 
> Whether the actual pages tracked by MAP_INFO were allocated internally
> by Map(), or externally by the caller (via AllocateBuffer()) is an
> independent question. (And it can be seen from the MAP_INFO.Operation
> field.)
> 
> Does this make it clearer?
> 

It makes sense. thanks

One question, do we need to return EFI_NOT_SUPPORTED error when we get
request to map a buffer with CommonBuffer but the buffer was not
allocated using "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"?

At least as per spec, it seems if caller wants to map a buffer with
CommonBuffer then it must have called "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"


>>
>>>> This distribution of operations seems wrong. The key point is that
>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>> usable, and that client code is required to call Map()
>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>> operation. Therefore, the right distribution of operations is:
>>>>
>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>    encryption mask..
>>>>
>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>>    encryption mask.
>>>>
>>
>> Actually one of main reason why we cleared and restored the memory
>> encryption mask during allocate/free is because we also consume the
>> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a DMA
>> buffer. I am certainly open to suggestions.
> 
> Argh. That's again my fault then, I should have noticed it. :( I
> apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
> for me as well.
> 
> As discussed earlier (and confirmed by Ard), calling *just*
> AllocateBuffer() is never enough, Map()/Unmap() are always necessary.
> 
> So here's what I suggest (to keep the changes localized):
> 
> - Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
>    function to output a "VOID *Mapping" parameter as well, in addition to
>    the address.
> 
> - Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
>    function to take a "VOID *Mapping" parameter in addition to the buffer
>    address.
> 
> - In the DXE implementation of InternalQemuFwCfgSevDmaAllocateBuffer(),
>    keep the current IOMMU AllocateBuffer() call, but also call IOMMU
>    Map(), with CommonBuffer. Furthermore, propagate the Mapping output of
>    Map() outwards. (Note that Map() may have to be called in a loop,
>    dependent on "NumberOfBytes".)
> 
> - In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(), call
>    the IOMMU Unmap() function first (using the new Mapping parameter).
> 

I will update the code and send patch for review. thanks

>>
>> [1]
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>
>> [2]
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>
>>
>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>
>>
>> I am trying to wrap my head around how we can support
>> BusMasterCommonBuffer when buffer was not allocated by us. Changing
>> the memory encryption mask in a page table will not update the
>> contents.
> 
> Thank you for the clarification. I've now tried to review the current
> code for a better understanding. Are the below statements correct?
> 
> - For the guest, guest memory is always readable transparently.
> - For the host, guest memory is always readable *as it was written
>    last*.
> - If the guest memory was last written with the C bit clear, then the
>    host can read it as plaintext (regardless of the current state of the
>    C bit).
> - If the guest memory was last written with the C bit set, then the host
>    can only read garbage (regardless of the current state of the C bit).
> 

Your understanding is correct.

> *If* this is the case, then:
> 
> (a) We already have a sort of guest->host information leak. Namely, in
> IoMmuFreeBuffer() [OvmfPkg/IoMmuDxe/AmdSevIoMmu.c], the C bit is
> restored, and the pages are released with gBS->FreePages(). However, the
> pages being released are not rewritten in place (they are not actually
> re-encrypted, only marked for encryption whenever they are next written
> to). This means that pages released like this remain readable to the
> hypervisor for an unpredictable time.
> 
> This is not necessarily a critical problem (after all the contents of
> those pages were visible to the hypervisor at some point anyway!); but
> in the longer term, such pages could accumulate, and if the hypervisor
> is compromised later, it could potentially read an unbounded amount of
> "past guest data" as plain-text.
> 

Very good catch, I can *zero* the pages. IIRC, I did not consider doing
so because it may introduce unnecessary perform hit (mainly when dealing
with larger pages). I think when we load kernel or initrd using QemuFwCfg
DMA interface we allocate large buffers.

I will still go ahead and experiment *zeroing* page and measure the performance
impact before we integrate the solution.

  
> (b) Plus, approaching the question from the Map() direction, we need to
> consider two scenarios:
> 
> - Client code calls AllocateBuffer(), then Map(), and it writes to the
>    buffer only then. This should be safe.
> - client code calls AllocateBuffer(), writes to it, and then calls
>    Map(). This will result in memory contents that look like garbage to
>    the hypervisor. Bad.
> 
> I can imagine the following to handle these cases: in the Map() and
> Unmap() functions, we have to decrypt and encrypt the memory contents
> in-place, after changing the C bit (without allocating additional
> memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes (this
> will always remain in encrypted memory). Update the C bit with a single
> function call for the entire range (like now) -- this will not affect
> the guest-readability of the pages --, then bounce each page within the
> range to the static buffer and back to its original place. In effect
> this will in-place encrypt or decrypt the memory, and will be faster
> than a byte-wise rewrite.
> 
> * BusMasterRead (host will want to read guest memory):
>    - Client calls Map() without calling AllocateBuffer() first. Map()
>      allocates an internal bounce buffer, clears the C bit, and does the
>      copying.
>    - Client fires off the PCI transaction.
>    - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
>      restores the C-bit, *zeros* the pages, and releases the pages.
> 

Yes this will work just fine (see my previous comments on *zeroing* pages)

> * BusMasterWrite (host will want to write to guest memory):
>    - Client calls Map() without calling AllocateBuffer() first. Map()
>      allocates an internal bounce buffer and clears the C bit.
>    - Client fires off the PCI transaction.
>    - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
>      copies the bounce buffer to crpyted (client-owned) memory, restores
>      the C-bit, *zeros* the pages, and releases the pages.
> 

Yes, this will work just fine (see my previous comments on *zeroing* pages)

> * BusMasterCommonBuffer:
>    - Client calls AllocateBuffer(), and places some data in the returned
>      memory.
>    - Client calls Map(). Map() clears the C bit in one fell swoop, and
>      then decrypts the buffer in-place (by bouncing it page-wise to the
>      static array and back).
>    - Client communicates with the device.
>    - Client calls Unmap(). Unmap() restores the C bit in one fell swoop,
>      and encrypts the buffer in-place (by bouncing it page-wise to the
>      static array and back).
>    - Client reads some residual data from the buffer.
>    - Client calls FreeBuffer(). FreeBuffer() relases the pages.
> 

Yes this works fine as long as the client uses EFI_PCI_IO_PROTOCOL.AllocateBuffer()
to allocate the buffer.


> This is suitable for the discussed ExitBootServices() callback update
> too, where the callback will reset the virtio device (like now) and then
> call Unmap() for all shared areas, without calling FreeBuffer() on them.
> The above logic will re-encrypt the contents without affecting the UEFI
> memmap.
> 
>> Also since the memory encryption mask works on PAGE_SIZE hence
>> changing the encryption mask on not our allocated buffer could mess
>> things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
> 
> This is not a problem for BusMasterRead and BusMasterWrite because for
> those operations, Map() allocates the internal bounce buffer. Also not a
> problem for BusMasterCommonBuffer, because then the client first calls
> AllocateBuffer (whole number of pages), and so Map() can round up the
> number of bytes and de-crypt full pages in-place (see above).
> 
>>
>>>>     *Regardless* of BusMaster operation, the following actions are
>>>>     carried out unconditionally:
>>>>
>>>>     . the memory encryption mask is cleared in this function (and in
>>>>       this function only),
>>>>
>>>>     . An attempt is made to grab a MAP_INFO structure from an
>>>>       internal free list (to be introduced!). The head of the list is
>>>>       a new static variable. If the free list is empty, then a
>>>>       MAP_INFO structure is allocated with AllocatePool(). The
>>>>       NO_MAPPING macro becomes unused and can be deleted from the
>>>>       source code.
>>>>
>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For
>>>>    this, it has to consult the MAP_INFO structure that is being
>>>>    passed in from the caller.) In addition:
>>>>
>>>>     . If MapInfo->Operation is BusMasterCommonBuffer, then we know
>>>>       the allocation was done separately in AllocateBuffer, so we do
>>>>       not release the pages. Otherwise, we do release the pages.
>>>>
>>>>     . MapInfo is linked back on the internal free list (see above).
>>>>       It is *never* released with FreePool().
>>>>
>>>>     This approach guarantees that IoMmuUnmap() can de-program the
>>>>     IOMMU (= re-set the memory encryption mask) without changing the
>>>>     UEFI memory map. (I trust that MemEncryptSevSetPageEncMask() will
>>>>     not split page tables internally when it *re*sets the encryption
>>>>     mask -- is that correct?)
>>
>> Yes, MemEncryptSevSetPageEncMask() will not split pages when it
>> restores mask.
> 
> Great, thanks.
> Laszlo
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-28 16:00             ` Brijesh Singh
@ 2017-07-28 16:16               ` Laszlo Ersek
  2017-07-28 19:21               ` Laszlo Ersek
  2017-07-28 19:59               ` Laszlo Ersek
  2 siblings, 0 replies; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-28 16:16 UTC (permalink / raw)
  To: Brijesh Singh, Ard Biesheuvel
  Cc: edk2-devel@lists.01.org, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann

On 07/28/17 18:00, Brijesh Singh wrote:
> 
> 
> On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
>> On 07/27/17 21:00, Brijesh Singh wrote:

>>> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
>>> introduce list to track if the buffer was allocated by us. If buffer
>>> was allocated by us then we can safely say that it does not require a
>>> bounce buffer. While implementing the initial code I was not able to
>>> see BusMasterCommonBuffer usage. But with virtio things are getting a
>>> bit more clear in my mind.
>>
>> The purpose of the internal list is a little bit different -- it is a
>> "free list", not a tracker list.
>>
>> Under the proposed scheme, a MAP_INFO structure will have to be
>> allocated for *all* mappings: both for CommonBuffer (where the actual
>> pages come from the caller, who retrieved them earlier with an
>> AllocateBuffer() call), and for Read/Write (where the actual pages will
>> be allocated internally to Map). Allocating the MAP_INFO structure in
>> Map() is necessary in *both* cases because the Unmap() function can
>> *only* consult this MAP_INFO structure to learn the address and the size
>> at which it has to re-set the memory encryption mask. This is because
>> the Unmap() function gets no other input parameter.
>>
>> Then, regardless of the original operation (CommonBuffer vs.
>> Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
>> CommonBuffer, the actual pages are owned by the caller, so we don't free
>> them here; for Read/Write, the actual pages are owned by Map/Unmap, so
>> we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
>> point is that MAP_INFO cannot be released with FreePool() because that
>> could change the UEFI memmap during ExitBootServices() processing. So
>> MAP_INFO has to be "released" to an internal *free* list.
>>
>> And since we have an internal free list, the original MAP_INFO
>> allocation in Map() should consult the free list first, and only if it
>> is empty should it fall back to AllocatePool().
>>
>> Whether the actual pages tracked by MAP_INFO were allocated internally
>> by Map(), or externally by the caller (via AllocateBuffer()) is an
>> independent question. (And it can be seen from the MAP_INFO.Operation
>> field.)
>>
>> Does this make it clearer?
>>
> 
> It makes sense. thanks
> 
> One question, do we need to return EFI_NOT_SUPPORTED error when we get
> request to map a buffer with CommonBuffer but the buffer was not
> allocated using "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"?
> 
> At least as per spec, it seems if caller wants to map a buffer with
> CommonBuffer then it must have called
> "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"

Yes, that is it, exactly: if the caller violates a requirement in the
UEFI spec, covering the use of the APIs, then the firmware *may* make an
attempt to detect that (and to reject it), but the firmware is not
*required* to do so.

A much simpler example: on a double-free programming error, the second
gBS->FreePool() call is not *required* to detect the problem. ("The
Buffer that is freed must have been allocated by AllocatePool().")

So, I don't think we need to implement measures for checking that a
CommonBuffer Map() actually refers to a previously returned, active,
AllocateBuffer() address.

(Snipping the rest, I've read it all, thanks for the answers. Nothing to
add this time :) )

Cheers!
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-28 16:00             ` Brijesh Singh
  2017-07-28 16:16               ` Laszlo Ersek
@ 2017-07-28 19:21               ` Laszlo Ersek
  2017-07-28 19:59               ` Laszlo Ersek
  2 siblings, 0 replies; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-28 19:21 UTC (permalink / raw)
  To: Brijesh Singh, Ard Biesheuvel
  Cc: edk2-devel@lists.01.org, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann

On 07/28/17 18:00, Brijesh Singh wrote:
>
>
> On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
>> On 07/27/17 21:00, Brijesh Singh wrote:

>>> Actually one of main reason why we cleared and restored the memory
>>> encryption mask during allocate/free is because we also consume the
>>> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a
>>> DMA buffer. I am certainly open to suggestions.
>>
>> Argh. That's again my fault then, I should have noticed it. :( I
>> apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
>> for me as well.
>>
>> As discussed earlier (and confirmed by Ard), calling *just*
>> AllocateBuffer() is never enough, Map()/Unmap() are always necessary.
>>
>> So here's what I suggest (to keep the changes localized):
>>
>> - Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
>>   function to output a "VOID *Mapping" parameter as well, in addition
>>   to the address.
>>
>> - Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
>>   function to take a "VOID *Mapping" parameter in addition to the
>>   buffer address.
>>
>> - In the DXE implementation of
>>   InternalQemuFwCfgSevDmaAllocateBuffer(), keep the current IOMMU
>>   AllocateBuffer() call, but also call IOMMU Map(), with
>>   CommonBuffer. Furthermore, propagate the Mapping output of Map()
>>   outwards. (Note that Map() may have to be called in a loop,
>>   dependent on "NumberOfBytes".)
>>
>> - In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(),
>>   call the IOMMU Unmap() function first (using the new Mapping
>>   parameter).
>>
>
> I will update the code and send patch for review. thanks

Here's an alternative, given that you mentioned being
performance-conscious. I'm not "suggesting" this as preferred, just
offering it for your consideration. Pick whichever you like more.

In this approach, I'm going to go back on my original request, namely to
keep the InternalQemuFwCfgDmaBytes() implementation centralized, in the
"QemuFwCfgLib.c" file. I now realize that the differences are large
enough that this may not have been a good idea. So here goes:

* Internal APIs ("QemuFwCfgLibInternal.h"):

  - Remove the InternalQemuFwCfgSev*() function prototypes.

  - Introduce the InternalQemuFwCfgDmaBytes() function prototype.

* SEC instance ("QemuFwCfgSec.c"):

  - Remove the InternalQemuFwCfgSev*() function definitions.

  - Add the InternalQemuFwCfgDmaBytes() function definition with
    ASSERT(FALSE) + CpuDeadLoop().

* PEI instance ("QemuFwCfgPei.c):

  - Remove the InternalQemuFwCfgSev*() function definitions.

  - Copy the InternalQemuFwCfgDmaBytes() function definition from the
    currently shared code ("QemuFwCfgLib.c"), and remove all branches
    that are related to the SEV enabled case. IOW, specialize the
    implementation to the plaintext case.

  - In QemuFwCfgInitialize(), replace the
    InternalQemuFwCfgSevIsEnabled() function call with a direct call to
    MemEncryptSevIsEnabled().

* DXE instance ("QemuFwCfgDxe.c"):

  - Remove the InternalQemuFwCfgSev*() function definitions.

  - Copy the InternalQemuFwCfgDmaBytes() function definition from the
    currently shared code ("QemuFwCfgLib.c"), as a starting point.

  - Replace the InternalQemuFwCfgSevIsEnabled() call with a direct call
    to MemEncryptSevIsEnabled().

  - When SEV is enabled, split the control area ("FW_CFG_DMA_ACCESS") to
    a separate buffer. This control area should be allocated with IOMMU
    AllocateBuffer(), and Map()ped separately, as
    BusMasterCommonBuffer64.

  - For the data transfer buffer, use *only* Map() and Unmap(), without
    AllocateBuffer() and FreeBuffer(). Use BusMasterRead64 and
    BusMasterWrite64, dependent on FW_CFG_DMA_CTL_WRITE /
    FW_CFG_DMA_CTL_READ. The point is that the potentially large data
    area will be bounced only once, because Map()/Unmap() will own the
    bounce buffer, and the in-place (de)crypting can be avoided. (The
    fw_cfg DMA transfer is completed in one synchronous operation, as
    witnessed by the library client.) The explicit CopyMem() calls can
    be removed.

  - Remove the InternalQemuFwCfgSevDmaAllocateBuffer() and
    InternalQemuFwCfgSevDmaFreeBuffer() calls.

* shared code ("QemuFwCfgLib.c"):

  - remove the InternalQemuFwCfgDmaBytes() function definition.

Thanks,
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-28 16:00             ` Brijesh Singh
  2017-07-28 16:16               ` Laszlo Ersek
  2017-07-28 19:21               ` Laszlo Ersek
@ 2017-07-28 19:59               ` Laszlo Ersek
  2017-07-29  0:52                 ` Brijesh Singh
  2 siblings, 1 reply; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-28 19:59 UTC (permalink / raw)
  To: Brijesh Singh, Ard Biesheuvel
  Cc: edk2-devel@lists.01.org, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann

On 07/28/17 18:00, Brijesh Singh wrote:
> On 07/28/2017 08:38 AM, Laszlo Ersek wrote:

snip

>> (b) Plus, approaching the question from the Map() direction, we need
>> to consider two scenarios:
>>
>> - Client code calls AllocateBuffer(), then Map(), and it writes to
>>   the buffer only then. This should be safe.
>> - client code calls AllocateBuffer(), writes to it, and then calls
>>   Map(). This will result in memory contents that look like garbage
>>   to the hypervisor. Bad.
>>
>> I can imagine the following to handle these cases: in the Map() and
>> Unmap() functions, we have to decrypt and encrypt the memory contents
>> in-place, after changing the C bit (without allocating additional
>> memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes
>> (this will always remain in encrypted memory). Update the C bit with
>> a single function call for the entire range (like now) -- this will
>> not affect the guest-readability of the pages --, then bounce each
>> page within the range to the static buffer and back to its original
>> place. In effect this will in-place encrypt or decrypt the memory,
>> and will be faster than a byte-wise rewrite.

snip

>> * BusMasterCommonBuffer:
>>    - Client calls AllocateBuffer(), and places some data in the
>>      returned memory.
>>    - Client calls Map(). Map() clears the C bit in one fell swoop,
>>      and then decrypts the buffer in-place (by bouncing it page-wise
>>      to the static array and back).
>>    - Client communicates with the device.
>>    - Client calls Unmap(). Unmap() restores the C bit in one fell
>>      swoop, and encrypts the buffer in-place (by bouncing it
>>      page-wise to the static array and back).
>>    - Client reads some residual data from the buffer.
>>    - Client calls FreeBuffer(). FreeBuffer() relases the pages.
>>
>
> Yes this works fine as long as the client uses
> EFI_PCI_IO_PROTOCOL.AllocateBuffer() to allocate the buffer.

Again, a performance-oriented thought:

Above I suggested using a statically allocated page-sized buffer, for
the in-place encryption/decryption. Ultimately this means *two*
CopyMem()s for the entire buffer (just executed page-wise), in *each* of
Map() and Unmap().

Maybe we can do better: what if you perform the CopyMem() from the
buffer right back to the same buffer? CopyMem() is *required* to work
with overlapping source and target areas (similarly to memmove() in
standard C).

This would result in *one* CopyMem (for in-place de-/encryption) in each
of Map() and Unmap(), and thereby it would have identical performance
impact to the BusMasterRead and BusMasterWrite Map() operations (where
copying / crypting takes place between distinct memory areas).

The OVMF DSC files resolve "BaseMemoryLib" -- which provides CopyMem()
-- to "MdePkg/Library/BaseMemoryLibRepStr/BaseMemoryLibRepStr.inf";
regardless of module type. The actual implementation appears to reside
in "MdePkg/Library/BaseMemoryLibRepStr/X64/CopyMem.nasm":

> global ASM_PFX(InternalMemCopyMem)
> ASM_PFX(InternalMemCopyMem):
>     push    rsi
>     push    rdi
>     mov     rsi, rdx                    ; rsi <- Source
>     mov     rdi, rcx                    ; rdi <- Destination
>     lea     r9, [rsi + r8 - 1]          ; r9 <- End of Source
>     cmp     rsi, rdi
>     mov     rax, rdi                    ; rax <- Destination as return value
>     jae     .0
>     cmp     r9, rdi
>     jae     @CopyBackward               ; Copy backward if overlapped
> .0:
>     mov     rcx, r8
>     and     r8, 7
>     shr     rcx, 3
>     rep     movsq                       ; Copy as many Qwords as possible
>     jmp     @CopyBytes
> @CopyBackward:
>     mov     rsi, r9                     ; rsi <- End of Source
>     lea     rdi, [rdi + r8 - 1]         ; esi <- End of Destination
>     std                                 ; set direction flag
> @CopyBytes:
>     mov     rcx, r8
>     rep     movsb                       ; Copy bytes backward
>     cld
>     pop     rdi
>     pop     rsi
>     ret
>

However, I'm afraid even if this works on SEV (which I certainly
expect!), this code won't be reached, due to the following CopyMem()
wrapper implementation in
"MdePkg/Library/BaseMemoryLibRepStr/CopyMemWrapper.c":

> VOID *
> EFIAPI
> CopyMem (
>   OUT VOID       *DestinationBuffer,
>   IN CONST VOID  *SourceBuffer,
>   IN UINTN       Length
>   )
> {
>   if (Length == 0) {
>     return DestinationBuffer;
>   }
>   ASSERT ((Length - 1) <= (MAX_ADDRESS - (UINTN)DestinationBuffer));
>   ASSERT ((Length - 1) <= (MAX_ADDRESS - (UINTN)SourceBuffer));
>
>   if (DestinationBuffer == SourceBuffer) {
>     return DestinationBuffer;
>   }
>   return InternalMemCopyMem (DestinationBuffer, SourceBuffer, Length);
> }

As you see, (DestinationBuffer == SourceBuffer) is handled as a no-op
(quite justifiedly, except in the case of SEV).

Personally I think it would be OK to copy the wrapper function and the
assembly code to OvmfPkg/IoMmuDxe/X64, under the names SevCopyMem() and
InternalSevCopyMem(), and call SevCopyMem() in the CommonBuffer cases of
Map() and Unmap(), for the in-place flipping.

For the 32-bit case (OvmfPkgIa32.dsc), my understanding is that guests
cannot control the C bit at all (there is no C bit in the PTEs), and
memory is always encrypted. Is that correct? If so, then we only need to
ensure that SevCopyMem() compile, as it will never be called -- in the
entry point function of OvmfPkg/IoMmuDxe, MemEncryptSevIsEnabled() will
return FALSE, and so the IOMMU protocol will not be installed. Therefore
the 32-bit version (under OvmfPkg/IoMmuDxe/Ia32) of SevCopyMem() can be
stubbed out as an ASSERT(FALSE)+CpuDeadLoop().

If you can think of a better location for SevCopyMem(), that's fine as
well. For example, you could add it to
"OvmfPkg/Library/BaseMemEncryptSevLib" as well.

... I don't think this functionality should be added under MdePkg,
because it is *very* special to the IOMMU implementation, and
practically no other module should use a "busy" in-place CopyMem().

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-28 19:59               ` Laszlo Ersek
@ 2017-07-29  0:52                 ` Brijesh Singh
  2017-07-29  1:37                   ` Brijesh Singh
  0 siblings, 1 reply; 28+ messages in thread
From: Brijesh Singh @ 2017-07-29  0:52 UTC (permalink / raw)
  To: Laszlo Ersek, Ard Biesheuvel
  Cc: brijesh.singh, edk2-devel@lists.01.org, Tom Lendacky,
	Jordan Justen, Jason Wang, Michael S . Tsirkin, Gerd Hoffmann



On 7/28/17 2:59 PM, Laszlo Ersek wrote:
> On 07/28/17 18:00, Brijesh Singh wrote:
>> On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
> snip
>
>>> (b) Plus, approaching the question from the Map() direction, we need
>>> to consider two scenarios:
>>>
>>> - Client code calls AllocateBuffer(), then Map(), and it writes to
>>>   the buffer only then. This should be safe.
>>> - client code calls AllocateBuffer(), writes to it, and then calls
>>>   Map(). This will result in memory contents that look like garbage
>>>   to the hypervisor. Bad.
>>>
>>> I can imagine the following to handle these cases: in the Map() and
>>> Unmap() functions, we have to decrypt and encrypt the memory contents
>>> in-place, after changing the C bit (without allocating additional
>>> memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes
>>> (this will always remain in encrypted memory). Update the C bit with
>>> a single function call for the entire range (like now) -- this will
>>> not affect the guest-readability of the pages --, then bounce each
>>> page within the range to the static buffer and back to its original
>>> place. In effect this will in-place encrypt or decrypt the memory,
>>> and will be faster than a byte-wise rewrite.
> snip
>
>>> * BusMasterCommonBuffer:
>>>    - Client calls AllocateBuffer(), and places some data in the
>>>      returned memory.
>>>    - Client calls Map(). Map() clears the C bit in one fell swoop,
>>>      and then decrypts the buffer in-place (by bouncing it page-wise
>>>      to the static array and back).
>>>    - Client communicates with the device.
>>>    - Client calls Unmap(). Unmap() restores the C bit in one fell
>>>      swoop, and encrypts the buffer in-place (by bouncing it
>>>      page-wise to the static array and back).
>>>    - Client reads some residual data from the buffer.
>>>    - Client calls FreeBuffer(). FreeBuffer() relases the pages.
>>>
>> Yes this works fine as long as the client uses
>> EFI_PCI_IO_PROTOCOL.AllocateBuffer() to allocate the buffer.
> Again, a performance-oriented thought:
>
> Above I suggested using a statically allocated page-sized buffer, for
> the in-place encryption/decryption. Ultimately this means *two*
> CopyMem()s for the entire buffer (just executed page-wise), in *each* of
> Map() and Unmap().
>
> Maybe we can do better: what if you perform the CopyMem() from the
> buffer right back to the same buffer? CopyMem() is *required* to work
> with overlapping source and target areas (similarly to memmove() in
> standard C).
>
> This would result in *one* CopyMem (for in-place de-/encryption) in each
> of Map() and Unmap(), and thereby it would have identical performance
> impact to the BusMasterRead and BusMasterWrite Map() operations (where
> copying / crypting takes place between distinct memory areas).
>
> The OVMF DSC files resolve "BaseMemoryLib" -- which provides CopyMem()
> -- to "MdePkg/Library/BaseMemoryLibRepStr/BaseMemoryLibRepStr.inf";
> regardless of module type. The actual implementation appears to reside
> in "MdePkg/Library/BaseMemoryLibRepStr/X64/CopyMem.nasm":
>

AMD APM document a procedure which must be used to perform in-place
encryption/decryption. We must follow those steps to ensure that data is
flush into memory using the correct C-bit. Not doing so  may result in
unpredictable results.

http://support.amd.com/TechDocs/24593.pdf (Section 7.10.8)

>> global ASM_PFX(InternalMemCopyMem)
>> ASM_PFX(InternalMemCopyMem):
>>     push    rsi
>>     push    rdi
>>     mov     rsi, rdx                    ; rsi <- Source
>>     mov     rdi, rcx                    ; rdi <- Destination
>>     lea     r9, [rsi + r8 - 1]          ; r9 <- End of Source
>>     cmp     rsi, rdi
>>     mov     rax, rdi                    ; rax <- Destination as return value
>>     jae     .0
>>     cmp     r9, rdi
>>     jae     @CopyBackward               ; Copy backward if overlapped
>> .0:
>>     mov     rcx, r8
>>     and     r8, 7
>>     shr     rcx, 3
>>     rep     movsq                       ; Copy as many Qwords as possible
>>     jmp     @CopyBytes
>> @CopyBackward:
>>     mov     rsi, r9                     ; rsi <- End of Source
>>     lea     rdi, [rdi + r8 - 1]         ; esi <- End of Destination
>>     std                                 ; set direction flag
>> @CopyBytes:
>>     mov     rcx, r8
>>     rep     movsb                       ; Copy bytes backward
>>     cld
>>     pop     rdi
>>     pop     rsi
>>     ret
>>
> However, I'm afraid even if this works on SEV (which I certainly
> expect!), this code won't be reached, due to the following CopyMem()
> wrapper implementation in
> "MdePkg/Library/BaseMemoryLibRepStr/CopyMemWrapper.c":
>
>> VOID *
>> EFIAPI
>> CopyMem (
>>   OUT VOID       *DestinationBuffer,
>>   IN CONST VOID  *SourceBuffer,
>>   IN UINTN       Length
>>   )
>> {
>>   if (Length == 0) {
>>     return DestinationBuffer;
>>   }
>>   ASSERT ((Length - 1) <= (MAX_ADDRESS - (UINTN)DestinationBuffer));
>>   ASSERT ((Length - 1) <= (MAX_ADDRESS - (UINTN)SourceBuffer));
>>
>>   if (DestinationBuffer == SourceBuffer) {
>>     return DestinationBuffer;
>>   }
>>   return InternalMemCopyMem (DestinationBuffer, SourceBuffer, Length);
>> }
> As you see, (DestinationBuffer == SourceBuffer) is handled as a no-op
> (quite justifiedly, except in the case of SEV).
>
> Personally I think it would be OK to copy the wrapper function and the
> assembly code to OvmfPkg/IoMmuDxe/X64, under the names SevCopyMem() and
> InternalSevCopyMem(), and call SevCopyMem() in the CommonBuffer cases of
> Map() and Unmap(), for the in-place flipping.
>
> For the 32-bit case (OvmfPkgIa32.dsc), my understanding is that guests
> cannot control the C bit at all (there is no C bit in the PTEs), and
> memory is always encrypted. Is that correct? If so, then we only need to
> ensure that SevCopyMem() compile, as it will never be called -- in the
> entry point function of OvmfPkg/IoMmuDxe, MemEncryptSevIsEnabled() will
> return FALSE, and so the IOMMU protocol will not be installed. Therefore
> the 32-bit version (under OvmfPkg/IoMmuDxe/Ia32) of SevCopyMem() can be
> stubbed out as an ASSERT(FALSE)+CpuDeadLoop().
>
> If you can think of a better location for SevCopyMem(), that's fine as
> well. For example, you could add it to
> "OvmfPkg/Library/BaseMemEncryptSevLib" as well.
>
> ... I don't think this functionality should be added under MdePkg,
> because it is *very* special to the IOMMU implementation, and
> practically no other module should use a "busy" in-place CopyMem().
>
> Thanks
> Laszlo



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-29  0:52                 ` Brijesh Singh
@ 2017-07-29  1:37                   ` Brijesh Singh
  2017-07-31 18:20                     ` Laszlo Ersek
  0 siblings, 1 reply; 28+ messages in thread
From: Brijesh Singh @ 2017-07-29  1:37 UTC (permalink / raw)
  To: Laszlo Ersek, Ard Biesheuvel
  Cc: brijesh.singh, edk2-devel@lists.01.org, Tom Lendacky,
	Jordan Justen, Jason Wang, Michael S . Tsirkin, Gerd Hoffmann

On 7/28/17 7:52 PM, Brijesh Singh wrote:
[snip]
> AMD APM document a procedure which must be used to perform in-place
> encryption/decryption. We must follow those steps to ensure that data is
> flush into memory using the correct C-bit. Not doing so  may result in
> unpredictable results.
>
> http://support.amd.com/TechDocs/24593.pdf (Section 7.10.8)
I am wondering if UEFI provides APIs to  get two linear mapping of the
same physical address. The steps says we create two mapping of same
physical address with different C-bits. I will look into UEFI docs to
see if something like this exist otherwise we have to consider two
memcpy. Since its just for CommandBuffers (which usually are smaller
hence we may be okay from performance point of view. Also its just a
boot time thing, does not get used when guest OS is takes over. I will
investigate and see what works best without adding extra complexity in
the code :)

-Brijesh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
  2017-07-29  1:37                   ` Brijesh Singh
@ 2017-07-31 18:20                     ` Laszlo Ersek
  0 siblings, 0 replies; 28+ messages in thread
From: Laszlo Ersek @ 2017-07-31 18:20 UTC (permalink / raw)
  To: Brijesh Singh, Ard Biesheuvel
  Cc: edk2-devel@lists.01.org, Tom Lendacky, Jordan Justen, Jason Wang,
	Michael S . Tsirkin, Gerd Hoffmann

On 07/29/17 03:37, Brijesh Singh wrote:
> On 7/28/17 7:52 PM, Brijesh Singh wrote:
> [snip]
>> AMD APM document a procedure which must be used to perform in-place
>> encryption/decryption. We must follow those steps to ensure that data is
>> flush into memory using the correct C-bit. Not doing so  may result in
>> unpredictable results.
>>
>> http://support.amd.com/TechDocs/24593.pdf (Section 7.10.8)
> I am wondering if UEFI provides APIs to  get two linear mapping of the
> same physical address. The steps says we create two mapping of same
> physical address with different C-bits. I will look into UEFI docs to
> see if something like this exist otherwise we have to consider two
> memcpy. Since its just for CommandBuffers (which usually are smaller
> hence we may be okay from performance point of view.

Right, the separate static bounce buffer (4KB in size) and the two
matching CopyMem()s look easier to understand and safer. Hopefully there
won't be a performance issue in practice.

> Also its just a
> boot time thing, does not get used when guest OS is takes over. I will
> investigate and see what works best without adding extra complexity in
> the code :)

Thanks! (Sorry about the late reply.)
Laszlo


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-07-31 18:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-19 22:09 [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
2017-07-19 22:09 ` [RFC v1 1/3] OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit Brijesh Singh
2017-07-19 22:09 ` [RFC v1 2/3] OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support Brijesh Singh
2017-07-19 22:09 ` [RFC v1 3/3] OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support Brijesh Singh
     [not found] ` <62320c1a-0cec-947c-8c63-5eb0416e4e33@redhat.com>
2017-07-21 11:17   ` [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support Brijesh Singh
     [not found]     ` <20170722024318-mutt-send-email-mst@kernel.org>
2017-07-24  8:25       ` Gerd Hoffmann
2017-07-25 18:17 ` Laszlo Ersek
2017-07-25 23:42   ` Brijesh Singh
     [not found]   ` <904dae9f-e515-01ba-e16f-6561616c78af@redhat.com>
2017-07-26 15:30     ` Laszlo Ersek
2017-07-27 14:21   ` Brijesh Singh
2017-07-27 17:16     ` Laszlo Ersek
2017-07-27 17:56       ` Ard Biesheuvel
2017-07-27 19:00         ` Brijesh Singh
2017-07-27 20:55           ` Brijesh Singh
2017-07-27 21:31             ` Ard Biesheuvel
2017-07-27 21:38               ` Andrew Fish
2017-07-27 22:13                 ` Brijesh Singh
2017-07-27 22:10               ` Brijesh Singh
2017-07-28  8:39                 ` Ard Biesheuvel
2017-07-28 15:27                   ` Laszlo Ersek
2017-07-28 13:38           ` Laszlo Ersek
2017-07-28 16:00             ` Brijesh Singh
2017-07-28 16:16               ` Laszlo Ersek
2017-07-28 19:21               ` Laszlo Ersek
2017-07-28 19:59               ` Laszlo Ersek
2017-07-29  0:52                 ` Brijesh Singh
2017-07-29  1:37                   ` Brijesh Singh
2017-07-31 18:20                     ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox