public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
@ 2017-09-03 19:54 Laszlo Ersek
  2017-09-03 19:54 ` [PATCH 1/4] MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot services Laszlo Ersek
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-03 19:54 UTC (permalink / raw)
  To: edk2-devel-01; +Cc: Brijesh Singh, Eric Dong, Star Zeng

Repo:   https://github.com/lersek/edk2.git
Branch: pci_host_controllers_unmap

This series updates four PCI Host Controller drivers so that they don't
leave CommonBuffer mappings hanging when control is transfered to the
OS.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Star Zeng <star.zeng@intel.com>

Thanks
Laszlo

Laszlo Ersek (4):
  MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot
    services
  MdeModulePkg/EhciDxe: unmap BM common buffers when exiting boot
    services
  MdeModulePkg/XhciDxe: unmap BM common buffers when exiting boot
    services
  MdeModulePkg/AtaAtapiPassThru: unmap common buffers at
    ExitBootServices()

 MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h         | 18 ++++++
 MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h |  5 ++
 MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c | 67 +++++++++++++++++++-
 MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c                      | 25 +++++++-
 MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c                      | 25 +++++++-
 MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c                      | 31 +++++++++
 6 files changed, 168 insertions(+), 3 deletions(-)

-- 
2.14.1.3.gb7cf6e02401b



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot services
  2017-09-03 19:54 [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() Laszlo Ersek
@ 2017-09-03 19:54 ` Laszlo Ersek
  2017-09-03 19:54 ` [PATCH 2/4] MdeModulePkg/EhciDxe: " Laszlo Ersek
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-03 19:54 UTC (permalink / raw)
  To: edk2-devel-01; +Cc: Brijesh Singh, Eric Dong, Star Zeng

Section

  7.7 Adding the Exit Boot Services feature

in

  Driver Writer's Guide for UEFI 2.3.1 (v1.01)

writes,

> Examples from the EDK II that use this feature are the PCI device
> drivers for USB Host Controllers. Some USB Host Controllers are PCI Bus
> Masters that continuously access a memory buffer to poll for operation
> requests. Access to this memory buffer by a USB Host Controller may be
> required to boot an operation system, but this activity must be
> terminated when the OS calls ExitBootServices() . The typical action in
> the Exit Boot Services Event for these types of drivers is to disable
> the PCI bus master and place the USB Host Controller into a halted
> state.

UhcExitBootService() already resets the host controller, which causes the
host controller to forget about all common buffers configured previously.

However, if the system has a component that translates device addresses to
system memory addresses (such as an IOMMU), then the translations put in
place when the common buffers were originally mapped should also be
undone, before handing control to the OS.

UhciDxe maps two kinds of common buffers:

(1) The Frame List is initially mapped in

      UhciDriverBindingStart()
        UhciInitFrameList()

    and it is unmapped and mapped again, for an unspecified number of
    times, in

      Uhci2Reset()
        UhciDestoryFrameList()
        UhciInitFrameList()

(2) Memory blocks in the USB Host Controller memory pool are first added
    in

      UhciDriverBindingStart()
        UhciAllocateDev()
          UsbHcInitMemPool()
            UsbHcAllocMemBlock()

    and then for an unspecified number of times in

      UsbHcAllocateMem()
        UsbHcAllocMemBlock()

    whenever the pool has to be extended.

Both kinds of common buffers are unmapped in the following call tree:

  UhciDriverBindingStop()
    UhciCleanDevUp()
      UhciDestoryFrameList() <-
      UhciFreeDev()
        UsbHcFreeMemPool()
          UsbHcFreeMemBlock() <-

The common buffers should be unmapped at ExitBootServices() the same way.
In that context however, we must not free memory (because that could alter
the UEFI memory map, which is forbidden for ExitBootServices()
notification functions). So call PciIo->Unmap() without calling
PciIo->FreeBuffer().

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c | 25 +++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c b/MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c
index 1fcc8b5f320f..cd9407faf22a 100644
--- a/MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c
+++ b/MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c
@@ -1594,7 +1594,9 @@ UhcExitBootService (
   VOID                           *Context
   )
 {
-  USB_HC_DEV   *Uhc;
+  USB_HC_DEV      *Uhc;
+  USBHC_MEM_POOL  *MemPool;
+  USBHC_MEM_BLOCK *MemBlock;
 
   Uhc = (USB_HC_DEV *) Context;
 
@@ -1608,6 +1610,27 @@ UhcExitBootService (
   //
   UhciSetRegBit (Uhc->PciIo, USBCMD_OFFSET, USBCMD_HCRESET);
   gBS->Stall (UHC_ROOT_PORT_RECOVERY_STALL);
+
+  //
+  // Unmap the frame list.
+  //
+  if (Uhc->FrameBase != NULL) {
+    Uhc->PciIo->Unmap (Uhc->PciIo, Uhc->FrameMapping);
+  }
+
+  //
+  // Unmap the memory pool.
+  //
+  MemPool = Uhc->MemPool;
+  if (MemPool != NULL) {
+    for (MemBlock = MemPool->Head;
+         MemBlock != NULL;
+         MemBlock = MemBlock->Next) {
+      if (MemBlock->BufHost != NULL) {
+        MemPool->PciIo->Unmap (MemPool->PciIo, MemBlock->Mapping);
+      }
+    }
+  }
 }
 
 /**
-- 
2.14.1.3.gb7cf6e02401b




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] MdeModulePkg/EhciDxe: unmap BM common buffers when exiting boot services
  2017-09-03 19:54 [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() Laszlo Ersek
  2017-09-03 19:54 ` [PATCH 1/4] MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot services Laszlo Ersek
@ 2017-09-03 19:54 ` Laszlo Ersek
  2017-09-03 19:54 ` [PATCH 3/4] MdeModulePkg/XhciDxe: " Laszlo Ersek
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-03 19:54 UTC (permalink / raw)
  To: edk2-devel-01; +Cc: Brijesh Singh, Eric Dong, Star Zeng

Section

  7.7 Adding the Exit Boot Services feature

in

  Driver Writer's Guide for UEFI 2.3.1 (v1.01)

writes,

> Examples from the EDK II that use this feature are the PCI device
> drivers for USB Host Controllers. Some USB Host Controllers are PCI Bus
> Masters that continuously access a memory buffer to poll for operation
> requests. Access to this memory buffer by a USB Host Controller may be
> required to boot an operation system, but this activity must be
> terminated when the OS calls ExitBootServices() . The typical action in
> the Exit Boot Services Event for these types of drivers is to disable
> the PCI bus master and place the USB Host Controller into a halted
> state.

EhcExitBootService() already resets the host controller, which causes the
host controller to forget about all common buffers configured previously.

However, if the system has a component that translates device addresses to
system memory addresses (such as an IOMMU), then the translations put in
place when the common buffers were originally mapped should also be
undone, before handing control to the OS.

EhciDxe maps two kinds of common buffers:

(1) The Frame List is initially mapped in

      EhcDriverBindingStart()
        EhcInitHC()
          EhcInitSched()

    and it is unmapped and mapped again, for an unspecified number of
    times, in

      EhcReset()
        EhcFreeSched()
        EhcInitHC()
          EhcInitSched()

(2) Memory blocks in the USB Host Controller memory pool are first added
    in

      UhciDriverBindingStart()
        EhcInitHC()
          EhcInitSched()
            UsbHcInitMemPool()
              UsbHcAllocMemBlock()

    then for an unspecified number of times in

      UsbHcAllocateMem()
        UsbHcAllocMemBlock()

    whenever the pool has to be extended. Memory blocks are also unmapped
    and mapped again in

      EhcReset()
        EhcFreeSched()
          UsbHcFreeMemPool()
            UsbHcFreeMemBlock()
        EhcInitHC()
          EhcInitSched()
            UsbHcInitMemPool()
              UsbHcAllocMemBlock()

Both kinds of common buffers are unmapped in the following call tree:

  EhcDriverBindingStop()
    EhcFreeSched() <-
      UsbHcFreeMemPool()
        UsbHcFreeMemBlock()

The common buffers should be unmapped at ExitBootServices() the same way.
In that context however, we must not free memory (because that could alter
the UEFI memory map, which is forbidden for ExitBootServices()
notification functions). So call PciIo->Unmap() without calling
PciIo->FreeBuffer().

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 25 +++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
index 5173a9d599c5..441366244c9a 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
@@ -1644,7 +1644,9 @@ EhcExitBootService (
   )
 
 {
-  USB2_HC_DEV   *Ehc;
+  USB2_HC_DEV     *Ehc;
+  USBHC_MEM_POOL  *MemPool;
+  USBHC_MEM_BLOCK *MemBlock;
 
   Ehc = (USB2_HC_DEV *) Context;
 
@@ -1652,6 +1654,27 @@ EhcExitBootService (
   // Reset the Host Controller
   //
   EhcResetHC (Ehc, EHC_RESET_TIMEOUT);
+
+  //
+  // Unmap the frame list.
+  //
+  if (Ehc->PeriodFrame != NULL) {
+    Ehc->PciIo->Unmap (Ehc->PciIo, Ehc->PeriodFrameMap);
+  }
+
+  //
+  // Unmap the memory pool.
+  //
+  MemPool = Ehc->MemPool;
+  if (MemPool != NULL) {
+    for (MemBlock = MemPool->Head;
+         MemBlock != NULL;
+         MemBlock = MemBlock->Next) {
+      if (MemBlock->BufHost != NULL) {
+        MemPool->PciIo->Unmap (MemPool->PciIo, MemBlock->Mapping);
+      }
+    }
+  }
 }
 
 
-- 
2.14.1.3.gb7cf6e02401b




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] MdeModulePkg/XhciDxe: unmap BM common buffers when exiting boot services
  2017-09-03 19:54 [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() Laszlo Ersek
  2017-09-03 19:54 ` [PATCH 1/4] MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot services Laszlo Ersek
  2017-09-03 19:54 ` [PATCH 2/4] MdeModulePkg/EhciDxe: " Laszlo Ersek
@ 2017-09-03 19:54 ` Laszlo Ersek
  2017-09-03 19:54 ` [PATCH 4/4] MdeModulePkg/AtaAtapiPassThru: unmap common buffers at ExitBootServices() Laszlo Ersek
  2017-09-04 10:36 ` [PATCH 0/4] MdeModulePkg: some PCI HC drivers: " Zeng, Star
  4 siblings, 0 replies; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-03 19:54 UTC (permalink / raw)
  To: edk2-devel-01; +Cc: Brijesh Singh, Eric Dong, Star Zeng

Section

  7.7 Adding the Exit Boot Services feature

in

  Driver Writer's Guide for UEFI 2.3.1 (v1.01)

writes,

> Examples from the EDK II that use this feature are the PCI device
> drivers for USB Host Controllers. Some USB Host Controllers are PCI Bus
> Masters that continuously access a memory buffer to poll for operation
> requests. Access to this memory buffer by a USB Host Controller may be
> required to boot an operation system, but this activity must be
> terminated when the OS calls ExitBootServices() . The typical action in
> the Exit Boot Services Event for these types of drivers is to disable
> the PCI bus master and place the USB Host Controller into a halted
> state.

XhcExitBootService() already halts the host controller, which causes the
host controller to forget about all common buffers configured previously.

However, if the system has a component that translates device addresses to
system memory addresses (such as an IOMMU), then the translations put in
place when the common buffers were originally mapped should also be
undone, before handing control to the OS.

XhciDxe maps three kinds of common buffers:

(1) The Scratchpad Buffer Array, and

(2) the Scratchpad Buffers pointed-to by elements of the Scratchpad Buffer
    Array, are initially mapped in

      XhcDriverBindingStart()
        XhcInitSched()
          UsbHcAllocateAlignedPages()

    and they are unmapped and mapped again, for an unspecified number of
    times, in

      XhcReset()
        XhcFreeSched()
          UsbHcFreeAlignedPages()
        XhcInitSched()
          UsbHcAllocateAlignedPages()

(3) Memory blocks in the USB Host Controller memory pool are first added
    in

      XhcDriverBindingStart()
        XhcInitSched()
          UsbHcInitMemPool()
            UsbHcAllocMemBlock()

    and then for an unspecified number of times in

      UsbHcAllocateMem()
        UsbHcAllocMemBlock()

    whenever the pool has to be extended. Memory blocks are also unmapped
    and mapped again in

      XhcReset()
        XhcFreeSched()
          UsbHcFreeMemPool()
            UsbHcFreeMemBlock()
        XhcInitSched()
          UsbHcInitMemPool()
            UsbHcAllocMemBlock()

All three kinds of common buffers are unmapped in the following call tree:

  XhcDriverBindingStop()
    XhcFreeSched()
      UsbHcFreeAlignedPages() <-
      UsbHcFreeMemPool()
        UsbHcFreeMemBlock() <-

The common buffers should be unmapped at ExitBootServices() the same way.
In that context however, we must not free memory (because that could alter
the UEFI memory map, which is forbidden for ExitBootServices()
notification functions). So call PciIo->Unmap() without calling
PciIo->FreeBuffer().

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c | 31 ++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
index c884f4c3146c..d2a04492510f 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
@@ -1859,6 +1859,9 @@ XhcExitBootService (
 {
   USB_XHCI_INSTANCE    *Xhc;
   EFI_PCI_IO_PROTOCOL  *PciIo;
+  UINT32               Index;
+  USBHC_MEM_POOL       *MemPool;
+  USBHC_MEM_BLOCK      *MemBlock;
 
   Xhc = (USB_XHCI_INSTANCE*) Context;
   PciIo = Xhc->PciIo;
@@ -1885,6 +1888,34 @@ XhcExitBootService (
                   Xhc->OriginalPciAttributes,
                   NULL
                   );
+
+  //
+  // Unmap Scratchpad Buffers and Scratchpad Buffer Array.
+  //
+  if (Xhc->MaxScratchpadBufs > 0) {
+    for (Index = 0; Index < Xhc->MaxScratchpadBufs; Index++) {
+      if ((VOID *)(UINTN)Xhc->ScratchEntry[Index] != NULL) {
+        PciIo->Unmap (PciIo, (VOID *)Xhc->ScratchEntryMap[Index]);
+      }
+    }
+    if (Xhc->ScratchBuf != NULL) {
+      PciIo->Unmap (PciIo, Xhc->ScratchMap);
+    }
+  }
+
+  //
+  // Unmap the memory pool.
+  //
+  MemPool = Xhc->MemPool;
+  if (MemPool != NULL) {
+    for (MemBlock = MemPool->Head;
+         MemBlock != NULL;
+         MemBlock = MemBlock->Next) {
+      if (MemBlock->BufHost != NULL) {
+        MemPool->PciIo->Unmap (MemPool->PciIo, MemBlock->Mapping);
+      }
+    }
+  }
 }
 
 /**
-- 
2.14.1.3.gb7cf6e02401b




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] MdeModulePkg/AtaAtapiPassThru: unmap common buffers at ExitBootServices()
  2017-09-03 19:54 [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() Laszlo Ersek
                   ` (2 preceding siblings ...)
  2017-09-03 19:54 ` [PATCH 3/4] MdeModulePkg/XhciDxe: " Laszlo Ersek
@ 2017-09-03 19:54 ` Laszlo Ersek
  2017-09-04 10:36 ` [PATCH 0/4] MdeModulePkg: some PCI HC drivers: " Zeng, Star
  4 siblings, 0 replies; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-03 19:54 UTC (permalink / raw)
  To: edk2-devel-01; +Cc: Brijesh Singh, Eric Dong, Star Zeng

The AtaAtapiPassThru() driver maps three system memory regions for Bus
Master Common Buffer operation on the following call path, if the
controller being bound has PCI_CLASS_MASS_STORAGE_SATADPA class code:

  AtaAtapiPassThruStart()
    EnumerateAttachedDevice()
      AhciModeInitialization()
        AhciCreateTransferDescriptor()

The regions are unmapped when the controller is unbound, in
AtaAtapiPassThruStop().

The common buffers should also be de-programmed when we exit the boot
services and the OS gains ownership of the system memory. Introduce an
ExitBootServices() notification function that

- resets the controller so that it forgets the device addresses for the
  common buffers,

- unmaps the common buffers so that device-to-RAM address translations (in
  a potential IOMMU or another translator) are torn down.

ExitBootServices() notification functions must not alter the UEFI memory
map, thus call only PciIo->Unmap(), and not PciIo->FreeBuffer(). (Unlike
AtaAtapiPassThruStop() does.)

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h         | 18 ++++++
 MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h |  5 ++
 MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c | 67 +++++++++++++++++++-
 3 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h b/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h
index 809bcc307fc4..d085597fbdfe 100644
--- a/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h
+++ b/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h
@@ -367,5 +367,23 @@ AhciStopCommand (
   IN  UINT64                    Timeout
   );
 
+/**
+  Do AHCI HBA reset.
+
+  @param  PciIo              The PCI IO protocol instance.
+  @param  Timeout            The timeout value of reset, uses 100ns as a unit.
+
+  @retval EFI_DEVICE_ERROR   AHCI controller is failed to complete hardware
+                             reset.
+  @retval EFI_TIMEOUT        The reset operation is time out.
+  @retval EFI_SUCCESS        AHCI controller is reset successfully.
+
+**/
+EFI_STATUS
+EFIAPI
+AhciReset (
+  IN  EFI_PCI_IO_PROTOCOL       *PciIo,
+  IN  UINT64                    Timeout
+  );
 #endif
 
diff --git a/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h b/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h
index 4f327dc30b60..e51c66f392d2 100644
--- a/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h
+++ b/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h
@@ -120,6 +120,11 @@ typedef struct {
   //
   EFI_EVENT                         TimerEvent;
   LIST_ENTRY                        NonBlockingTaskList;
+
+  //
+  // For resetting AHCI and unmapping CommonBuffer DMA at ExitBootServices().
+  //
+  EFI_EVENT                         ExitBootEvent;
 } ATA_ATAPI_PASS_THRU_INSTANCE;
 
 //
diff --git a/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c b/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c
index 795443ef74f6..6972349b2b8f 100644
--- a/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c
+++ b/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c
@@ -103,7 +103,8 @@ ATA_ATAPI_PASS_THRU_INSTANCE gAtaAtapiPassThruInstanceTemplate = {
   {                   // NonBlocking TaskList
     NULL,
     NULL
-  }
+  },
+  NULL,               // ExitBootEvent
 };
 
 ATAPI_DEVICE_PATH    mAtapiDevicePathTemplate = {
@@ -477,6 +478,50 @@ InitializeAtaAtapiPassThru (
   return Status;
 }
 
+/**
+  If operating in AHCI mode, then reset the HBA and unmap CommonBuffer Bus
+  Master DMA when exiting the boot services.
+
+  @param[in] Event    Event for which this notification function is being
+                      called.
+  @param[in] Context  Pointer to the ATA_ATAPI_PASS_THRU_INSTANCE that
+                      represents the HBA.
+**/
+VOID
+EFIAPI
+AtaPassThruExitBootServices (
+  IN EFI_EVENT Event,
+  IN VOID      *Context
+  )
+{
+  ATA_ATAPI_PASS_THRU_INSTANCE *Instance;
+  EFI_PCI_IO_PROTOCOL          *PciIo;
+  EFI_AHCI_REGISTERS           *AhciRegisters;
+
+  Instance = Context;
+
+  if (Instance->Mode == EfiAtaAhciMode) {
+    PciIo = Instance->PciIo;
+    AhciRegisters = &Instance->AhciRegisters;
+    //
+    // De-program common buffer references from the HBA by resetting the HBA.
+    //
+    AhciReset (PciIo, EFI_AHCI_BUS_RESET_TIMEOUT);
+    //
+    // Unmap long-lived common buffers.
+    //
+    if (AhciRegisters->AhciRFis != NULL) {
+      PciIo->Unmap (PciIo, AhciRegisters->MapRFis);
+    }
+    if (AhciRegisters->AhciCmdList != NULL) {
+      PciIo->Unmap (PciIo, AhciRegisters->MapCmdList);
+    }
+    if (AhciRegisters->AhciCommandTable != NULL) {
+      PciIo->Unmap (PciIo, AhciRegisters->MapCommandTable);
+    }
+  }
+}
+
 /**
   Tests to see if this driver supports a given controller. If a child device is provided,
   it further tests to see if this driver supports creating a handle for the specified child device.
@@ -755,6 +800,17 @@ AtaAtapiPassThruStart (
   InitializeListHead(&Instance->DeviceList);
   InitializeListHead(&Instance->NonBlockingTaskList);
 
+  Status = gBS->CreateEvent (
+                  EVT_SIGNAL_EXIT_BOOT_SERVICES,
+                  TPL_CALLBACK,
+                  AtaPassThruExitBootServices,
+                  Instance,
+                  &Instance->ExitBootEvent
+                  );
+  if (EFI_ERROR (Status)) {
+    goto ErrorExit;
+  }
+
   Instance->TimerEvent = NULL;
 
   Status = gBS->CreateEvent (
@@ -808,6 +864,10 @@ ErrorExit:
     gBS->CloseEvent (Instance->TimerEvent);
   }
 
+  if ((Instance != NULL) && (Instance->ExitBootEvent != NULL)) {
+    gBS->CloseEvent (Instance->ExitBootEvent);
+  }
+
   //
   // Remove all inserted ATA devices.
   //
@@ -912,6 +972,11 @@ AtaAtapiPassThruStop (
   //
   DestroyDeviceInfoList (Instance);
 
+  if (Instance->ExitBootEvent != NULL) {
+    gBS->CloseEvent (Instance->ExitBootEvent);
+    Instance->ExitBootEvent = NULL;
+  }
+
   //
   // If the current working mode is AHCI mode, then pre-allocated resource
   // for AHCI initialization should be released.
-- 
2.14.1.3.gb7cf6e02401b



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-03 19:54 [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() Laszlo Ersek
                   ` (3 preceding siblings ...)
  2017-09-03 19:54 ` [PATCH 4/4] MdeModulePkg/AtaAtapiPassThru: unmap common buffers at ExitBootServices() Laszlo Ersek
@ 2017-09-04 10:36 ` Zeng, Star
  2017-09-04 21:19   ` Laszlo Ersek
  4 siblings, 1 reply; 21+ messages in thread
From: Zeng, Star @ 2017-09-04 10:36 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel-01; +Cc: Dong, Eric, Zeng, Star

Curious about one question when I am reviewing this patch series.

Does/Should the IOMMU component disable the address translation at ExitBootServices?

Thanks,
Star
-----Original Message-----
From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Monday, September 4, 2017 3:55 AM
To: edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Dong, Eric <eric.dong@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

Repo:   https://github.com/lersek/edk2.git
Branch: pci_host_controllers_unmap

This series updates four PCI Host Controller drivers so that they don't leave CommonBuffer mappings hanging when control is transfered to the OS.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Star Zeng <star.zeng@intel.com>

Thanks
Laszlo

Laszlo Ersek (4):
  MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot
    services
  MdeModulePkg/EhciDxe: unmap BM common buffers when exiting boot
    services
  MdeModulePkg/XhciDxe: unmap BM common buffers when exiting boot
    services
  MdeModulePkg/AtaAtapiPassThru: unmap common buffers at
    ExitBootServices()

 MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h         | 18 ++++++
 MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h |  5 ++  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c | 67 +++++++++++++++++++-
 MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c                      | 25 +++++++-
 MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c                      | 25 +++++++-
 MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c                      | 31 +++++++++
 6 files changed, 168 insertions(+), 3 deletions(-)

--
2.14.1.3.gb7cf6e02401b

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-04 10:36 ` [PATCH 0/4] MdeModulePkg: some PCI HC drivers: " Zeng, Star
@ 2017-09-04 21:19   ` Laszlo Ersek
  2017-09-05  2:18     ` Yao, Jiewen
  0 siblings, 1 reply; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-04 21:19 UTC (permalink / raw)
  To: Zeng, Star, edk2-devel-01; +Cc: Dong, Eric

On 09/04/17 12:36, Zeng, Star wrote:
> Curious about one question when I am reviewing this patch series.
> 
> Does/Should the IOMMU component disable the address translation at ExitBootServices?

It is a very good and interesting question.

My answer is, I don't know. The IOMMU protocol abstraction is specific
to edk2, and its behavior is not documented in the PI or UEFI specs (as
far as I know).

For one example, the VT-d IOMMU implementation in

  IntelSiliconPkg/IntelVTdDxe

zaps all translations at ExitBootServices():

VOID
EFIAPI
OnExitBootServices (
  IN EFI_EVENT                               Event,
  IN VOID                                    *Context
  )
{
  DEBUG ((DEBUG_INFO, "Vtd OnExitBootServices\n"));
  DumpVtdRegsAll ();
  DisableDmar ();
  DumpVtdRegsAll ();
}

Now, in case someone suggested that the same approach should be followed
by all IOMMU implementations, I'd have three counter-arguments:

(1) My series states the problem (and seeks to solve it) on the PciIo
protocol level, which is standardized. The IOMMU protocol is edk2-only.
I feel that this kind of "unmap on ExitBootServices" should be done by
all UEFI drivers that map DMA common buffers, regardless of
platform-specific PciIo implementations.

(2) ExitBootServices() notify functions are called in unspecified order
(there's no way to express dependencies between them). This means that,
following the above pattern, an IO address translation could be torn
down before a reliant PCI device is de-configured in its own UEFI_DRIVER
ExitBootServices() handler. The result can be a series of IOMMU faults.
I'm not sure if that's bad in practice, but it does look like a layering
violation. (We shouldn't tear down resources from the bottom up.)

(3) The DisableDmar() function is relatively simple. It uses VT-d
registers that exist independently of any PCI devices, and of any "live"
mappings.

Some IOMMU protocol implementations might be different -- for them the
live configuration might only be known by constantly tracking the full
set of Map() and Unmap() operations. That's a hurdle for the implementation.

Furthermore, what is supposed to happen if both the IOMMU driver and the
individual PCI driver destroy the mapping at ExitBootServices()? If the
PCI driver's notification function runs first, then the IOMMU driver can
mark the mapping as "unmapped", and ignore it in its own notification
function. If the IOMMU driver's notification function runs first, then
it can again only mark the mapping as "unmapped", so that when the PCI
driver tries to unmap it as well, things don't blow up. This seems to
imply that an unmapped mapping data structure can never be recycled,
because we never know what action might follow. This sounds very
confusing to me.


Now, there are two arguments in favor of the IOMMU ExitBootServices()
callback as well:

- Security. It doesn't matter if some driver forgets to de-configure a
translation, the IOMMU driver won't. Layering violations be damned.

- In ExitBootServices() notification functions, the UEFI memory map must
not be altered (memory cannot be released). This means that Bus Master
Write and Bus Master Read operations, for pending transfers, cannot be
unmapped (only CommonBuffer operations can be), because unmapping
Write/Read might free memory (bounce buffers) *implicitly*. For
CommonBuffer operations, separate PciIo.FreeBuffer() calls are necessary
for releasing memory, thus the notification functions can simply *not*
call FreeBuffer(), to stay safe. But this is not possible for pending
Read/Write operations.


My current thinking is that (a) PCI drivers should unmap pending Common
Buffer operations at ExitBootServices; (b) PCI drivers should never
return from their protocol member functions with pending Write/Read
operations, only CommonBuffer operations (put differently, if the
operation is asynchronous on the higher protocol level, then make the
underlying BMDMA operation CommonBuffer); (c) IOMMU protocol
implementations should not be required to clean up translations, only
allowed to, if they can do it without interfering with (a).

Basically, the question is, who *owns* the translations. Based on the
PciIo interfaces, and on the requirement that PCI drivers Map() and
Unmap() buffers, my opinion is that the PCI drivers own the translations.

If you think that we should take this question to the USWG, I'm OK with
that.

Thanks,
Laszlo



> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Monday, September 4, 2017 3:55 AM
> To: edk2-devel-01 <edk2-devel@lists.01.org>
> Cc: Dong, Eric <eric.dong@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
> 
> Repo:   https://github.com/lersek/edk2.git
> Branch: pci_host_controllers_unmap
> 
> This series updates four PCI Host Controller drivers so that they don't leave CommonBuffer mappings hanging when control is transfered to the OS.
> 
> Cc: Brijesh Singh <brijesh.singh@amd.com>
> Cc: Eric Dong <eric.dong@intel.com>
> Cc: Star Zeng <star.zeng@intel.com>
> 
> Thanks
> Laszlo
> 
> Laszlo Ersek (4):
>   MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot
>     services
>   MdeModulePkg/EhciDxe: unmap BM common buffers when exiting boot
>     services
>   MdeModulePkg/XhciDxe: unmap BM common buffers when exiting boot
>     services
>   MdeModulePkg/AtaAtapiPassThru: unmap common buffers at
>     ExitBootServices()
> 
>  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h         | 18 ++++++
>  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h |  5 ++  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c | 67 +++++++++++++++++++-
>  MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c                      | 25 +++++++-
>  MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c                      | 25 +++++++-
>  MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c                      | 31 +++++++++
>  6 files changed, 168 insertions(+), 3 deletions(-)
> 
> --
> 2.14.1.3.gb7cf6e02401b
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-04 21:19   ` Laszlo Ersek
@ 2017-09-05  2:18     ` Yao, Jiewen
  2017-09-05  9:15       ` Laszlo Ersek
  0 siblings, 1 reply; 21+ messages in thread
From: Yao, Jiewen @ 2017-09-05  2:18 UTC (permalink / raw)
  To: Laszlo Ersek, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric

HI Laszlo
Thank you very much to raise this DMA topic.

I have a chat with Star. We have couples of question, from high level to low level.
Hope you can clarify:


1)       About disable DMA

As you mentioned, in UEFI_Driver_Writer_Guide_V1.0.1_120308.pdf, 7.7 Adding the Exit Boot Services feature
=========================
Examples from the EDK II that use this feature are the PCI device drivers for USB Host
Controllers. Some USB Host Controllers are PCI Bus Masters that continuously access a
memory buffer to poll for operation requests. Access to this memory buffer by a USB
Host Controller may be required to boot an operation system, but this activity must be
terminated when the OS calls ExitBootServices(). The typical action in the Exit Boot
Services Event for these types of drivers is to disable the PCI bus master and place the
USB Host Controller into a halted state
=========================

I believe the fundamental requirement is to "disable DMA".
As such, the simplest action is to clear PCI Bus Master Enable bit in the command register.

Do you think clearing BME is enough?
Is there any special reason that we must unmap the DMA buffer?


2)       About common buffer
If there is a special requirement that we must unmap the DMA buffer, why we only unmap the common buffer?

I understand your statement that unmapping a read buffer or write buffer may cause free memory action. But that is implementation choice.

If we have strong requirement to unmap the DMA buffer, we should achieve that.


3)       About interaction with IOMMU
I am not worried about IOMMU interaction.
I believe an proper IOMMU implementation should not block any action taken by a PCI device driver.
Current IntelVTdDxe driver disables VTd engine in ExitBootService, which means the DMA access is no longer managed by VTd engine. It is NOT related to disable DMA buffer.


4)       About the driver you modified
I believe we have some other PCI device driver (NVMe/SDMMC/UFS), besides the ones you modified in this patch (ATA/UCHI/EHCI/XHCI).
If we need take this approach, I assume they also need update, right?

Thank you
Yao Jiewen


From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Tuesday, September 5, 2017 5:20 AM
To: Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Dong, Eric <eric.dong@intel.com>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

On 09/04/17 12:36, Zeng, Star wrote:
> Curious about one question when I am reviewing this patch series.
>
> Does/Should the IOMMU component disable the address translation at ExitBootServices?

It is a very good and interesting question.

My answer is, I don't know. The IOMMU protocol abstraction is specific
to edk2, and its behavior is not documented in the PI or UEFI specs (as
far as I know).

For one example, the VT-d IOMMU implementation in

  IntelSiliconPkg/IntelVTdDxe

zaps all translations at ExitBootServices():

VOID
EFIAPI
OnExitBootServices (
  IN EFI_EVENT                               Event,
  IN VOID                                    *Context
  )
{
  DEBUG ((DEBUG_INFO, "Vtd OnExitBootServices\n"));
  DumpVtdRegsAll ();
  DisableDmar ();
  DumpVtdRegsAll ();
}

Now, in case someone suggested that the same approach should be followed
by all IOMMU implementations, I'd have three counter-arguments:

(1) My series states the problem (and seeks to solve it) on the PciIo
protocol level, which is standardized. The IOMMU protocol is edk2-only.
I feel that this kind of "unmap on ExitBootServices" should be done by
all UEFI drivers that map DMA common buffers, regardless of
platform-specific PciIo implementations.

(2) ExitBootServices() notify functions are called in unspecified order
(there's no way to express dependencies between them). This means that,
following the above pattern, an IO address translation could be torn
down before a reliant PCI device is de-configured in its own UEFI_DRIVER
ExitBootServices() handler. The result can be a series of IOMMU faults.
I'm not sure if that's bad in practice, but it does look like a layering
violation. (We shouldn't tear down resources from the bottom up.)

(3) The DisableDmar() function is relatively simple. It uses VT-d
registers that exist independently of any PCI devices, and of any "live"
mappings.

Some IOMMU protocol implementations might be different -- for them the
live configuration might only be known by constantly tracking the full
set of Map() and Unmap() operations. That's a hurdle for the implementation.

Furthermore, what is supposed to happen if both the IOMMU driver and the
individual PCI driver destroy the mapping at ExitBootServices()? If the
PCI driver's notification function runs first, then the IOMMU driver can
mark the mapping as "unmapped", and ignore it in its own notification
function. If the IOMMU driver's notification function runs first, then
it can again only mark the mapping as "unmapped", so that when the PCI
driver tries to unmap it as well, things don't blow up. This seems to
imply that an unmapped mapping data structure can never be recycled,
because we never know what action might follow. This sounds very
confusing to me.


Now, there are two arguments in favor of the IOMMU ExitBootServices()
callback as well:

- Security. It doesn't matter if some driver forgets to de-configure a
translation, the IOMMU driver won't. Layering violations be damned.

- In ExitBootServices() notification functions, the UEFI memory map must
not be altered (memory cannot be released). This means that Bus Master
Write and Bus Master Read operations, for pending transfers, cannot be
unmapped (only CommonBuffer operations can be), because unmapping
Write/Read might free memory (bounce buffers) *implicitly*. For
CommonBuffer operations, separate PciIo.FreeBuffer() calls are necessary
for releasing memory, thus the notification functions can simply *not*
call FreeBuffer(), to stay safe. But this is not possible for pending
Read/Write operations.


My current thinking is that (a) PCI drivers should unmap pending Common
Buffer operations at ExitBootServices; (b) PCI drivers should never
return from their protocol member functions with pending Write/Read
operations, only CommonBuffer operations (put differently, if the
operation is asynchronous on the higher protocol level, then make the
underlying BMDMA operation CommonBuffer); (c) IOMMU protocol
implementations should not be required to clean up translations, only
allowed to, if they can do it without interfering with (a).

Basically, the question is, who *owns* the translations. Based on the
PciIo interfaces, and on the requirement that PCI drivers Map() and
Unmap() buffers, my opinion is that the PCI drivers own the translations.

If you think that we should take this question to the USWG, I'm OK with
that.

Thanks,
Laszlo



> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Monday, September 4, 2017 3:55 AM
> To: edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Subject: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>
> Repo:   https://github.com/lersek/edk2.git
> Branch: pci_host_controllers_unmap
>
> This series updates four PCI Host Controller drivers so that they don't leave CommonBuffer mappings hanging when control is transfered to the OS.
>
> Cc: Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>>
> Cc: Eric Dong <eric.dong@intel.com<mailto:eric.dong@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>
> Thanks
> Laszlo
>
> Laszlo Ersek (4):
>   MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot
>     services
>   MdeModulePkg/EhciDxe: unmap BM common buffers when exiting boot
>     services
>   MdeModulePkg/XhciDxe: unmap BM common buffers when exiting boot
>     services
>   MdeModulePkg/AtaAtapiPassThru: unmap common buffers at
>     ExitBootServices()
>
>  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h         | 18 ++++++
>  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h |  5 ++  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c | 67 +++++++++++++++++++-
>  MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c                      | 25 +++++++-
>  MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c                      | 25 +++++++-
>  MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c                      | 31 +++++++++
>  6 files changed, 168 insertions(+), 3 deletions(-)
>
> --
> 2.14.1.3.gb7cf6e02401b
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-05  2:18     ` Yao, Jiewen
@ 2017-09-05  9:15       ` Laszlo Ersek
  2017-09-05 13:44         ` Yao, Jiewen
  0 siblings, 1 reply; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-05  9:15 UTC (permalink / raw)
  To: Yao, Jiewen, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric

On 09/05/17 04:18, Yao, Jiewen wrote:
> HI Laszlo
> Thank you very much to raise this DMA topic.
> 
> I have a chat with Star. We have couples of question, from high level to low level.
> Hope you can clarify:
> 
> 
> 1)       About disable DMA
> 
> As you mentioned, in UEFI_Driver_Writer_Guide_V1.0.1_120308.pdf, 7.7 Adding the Exit Boot Services feature
> =========================
> Examples from the EDK II that use this feature are the PCI device drivers for USB Host
> Controllers. Some USB Host Controllers are PCI Bus Masters that continuously access a
> memory buffer to poll for operation requests. Access to this memory buffer by a USB
> Host Controller may be required to boot an operation system, but this activity must be
> terminated when the OS calls ExitBootServices(). The typical action in the Exit Boot
> Services Event for these types of drivers is to disable the PCI bus master and place the
> USB Host Controller into a halted state
> =========================
> 
> I believe the fundamental requirement is to "disable DMA".
> As such, the simplest action is to clear PCI Bus Master Enable bit in the command register.
> 
> Do you think clearing BME is enough?
> Is there any special reason that we must unmap the DMA buffer?

Modifying the device state is enough to ask the device *nicely* to stop
doing DMA.

But, I think, if we are on an IOMMU-protected system, where part of
PciIo.Map() is to add a translation (= system memory access permission)
to the IOMMU, then we should also revoke this permission explicitly,
*after* we asked the device nicely to stop doing DMA.

Basically, ask the device nicely first, and then enforce the protection.

... BTW, I mentioned "IOMMU faults" earlier, and I have some further
thoughts on them. This is about tearing down IOMMU translations *first*,
in the IOMMU driver's ExitBootServices() handler, *before* the PCI
drivers get a chance -- in *their* ExitBootServices() handlers -- to ask
the PCI devices nicely to stop doing DMA.

I think a quite problematic failure mode is possible here. Assume that a
disk controller has some write commands queued (which are mapped with
Bus Master Read or Bus Master CommonBuffer operations). If the
ExitBootServices() handler for the IOMMU driver runs first, then the
device could act upon these queued commands with missing IOMMU
translations, before the device's own PCI driver asked the device to
shut down DMA. In some cases, I guess this could result in those queued
write commands simply failing, without ill effects. However, dependent
on the IOMMU implementation, the write commands might not fail
explicitly, but read garbage from system memory, and write garbage to
disk blocks. That would be bad.

I think this is exactly what could happen with the SEV IOMMU we have.
The memory would be first re-encrypted (this is equivalent to revoking a
translation), and then the write commands (queued in the emulated AHCI
controller, in the hypervisor) would read garbage from guest memory, and
write garbage to the virtual disk.

> 
> 
> 2)       About common buffer
> If there is a special requirement that we must unmap the DMA buffer, why we only unmap the common buffer?
> 
> I understand your statement that unmapping a read buffer or write buffer may cause free memory action. But that is implementation choice.

I believe I disagree with this (i.e., I think it is not an
implementation choice).

In the general case, we can assume that PciIo.Map() allocates a bounce
buffer internally, when Read or Write action is requested.

In turn, PciIo.Unmap() has to release the same buffer, internally.
PciIo.Unmap() does not know if it is called from an ExitBootServices()
notification function, or from a normal context. It cannot decide if it
is allowed to release memory, or not. There's no additional parameter to
provide this hint either. If PciIo.Unmap() never releases buffers, then
ExitBootServices() will be happy, but we're going to leak memory.

Now, PciIo.Unmap() could move the bounce buffer to some internal "free
list" instead, rather than freeing it. And, the next PciIo.Map() call
could try to reuse a buffer from the "free list". But this is difficult
to do, because the buffers may have any size, and recycling them would
require an internal, general "page pool", in parallel to the Boot
Services page pool (AllocatePages, FreePages).

If PciIo.Unmap() could be informed that it is running in
ExitBootServices() notification context -- called from PCI drivers --,
then unmapping Read and Write operations would be simple.

> If we have strong requirement to unmap the DMA buffer, we should achieve that.

I totally agree that we *should* achieve that, but I don't see how the
current PciIo.Map() and PciIo.Unmap() APIs make it possible, for Read
and Write operations.

For CommonBuffer operations, it is possible, because AllocateBuffer()
and FreeBuffer() are separate actions, and an ExitBootServices()
notification function can *avoid* calling FreeBuffer. (But only for
CommonBuffer operations.) PciIo.Unmap() cannot be told *not* to release
a bounce buffer for Read/Write.

> 
> 
> 3)       About interaction with IOMMU
> I am not worried about IOMMU interaction.
> I believe an proper IOMMU implementation should not block any action taken by a PCI device driver.
> Current IntelVTdDxe driver disables VTd engine in ExitBootService, which means the DMA access is no longer managed by VTd engine. It is NOT related to disable DMA buffer.

Ah, I understand!

So, on one hand, that is totally safe, for PCI device drivers. It means
all pending DMA can get through, until the PCI drivers ask the devices
(nicely) to stop doing DMA.

On the other hand, it is the opposite of the security goal of SEV. With
SEV, the default state is "all DMA forbidden" (= all guest memory
encrypted). This is also the status that we'd like to achieve when
ExitBootServices() completes. When control is transferred to the OS, all
guest memory should be encrypted again, even those areas that were once
used as CommonBuffers.

So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
are opposites -- VT-d permits all DMA unless configured otherwise, while
SEV forbids all DMA unless configured otherwise.

> 4)       About the driver you modified
> I believe we have some other PCI device driver (NVMe/SDMMC/UFS), besides the ones you modified in this patch (ATA/UCHI/EHCI/XHCI).
> If we need take this approach, I assume they also need update, right?

Yes, they should be updated similarly.

SDMMC and UFS are not included in OVMF, so I couldn't test their behavior.

NVMe is included in OVMF, but it is used very rarely in practice (we can
call it a corner case), so I thought I would first submit patches for
the four main drivers (UHCI, EHCI, XHCI, AHCI) that are frequently used
with OVMF and QEMU. Tracking down the CommonBuffer lifecycles was
difficult :)

Thank you!
Laszlo


> 
> Thank you
> Yao Jiewen
> 
> 
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Tuesday, September 5, 2017 5:20 AM
> To: Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
> Cc: Dong, Eric <eric.dong@intel.com>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
> 
> On 09/04/17 12:36, Zeng, Star wrote:
>> Curious about one question when I am reviewing this patch series.
>>
>> Does/Should the IOMMU component disable the address translation at ExitBootServices?
> 
> It is a very good and interesting question.
> 
> My answer is, I don't know. The IOMMU protocol abstraction is specific
> to edk2, and its behavior is not documented in the PI or UEFI specs (as
> far as I know).
> 
> For one example, the VT-d IOMMU implementation in
> 
>   IntelSiliconPkg/IntelVTdDxe
> 
> zaps all translations at ExitBootServices():
> 
> VOID
> EFIAPI
> OnExitBootServices (
>   IN EFI_EVENT                               Event,
>   IN VOID                                    *Context
>   )
> {
>   DEBUG ((DEBUG_INFO, "Vtd OnExitBootServices\n"));
>   DumpVtdRegsAll ();
>   DisableDmar ();
>   DumpVtdRegsAll ();
> }
> 
> Now, in case someone suggested that the same approach should be followed
> by all IOMMU implementations, I'd have three counter-arguments:
> 
> (1) My series states the problem (and seeks to solve it) on the PciIo
> protocol level, which is standardized. The IOMMU protocol is edk2-only.
> I feel that this kind of "unmap on ExitBootServices" should be done by
> all UEFI drivers that map DMA common buffers, regardless of
> platform-specific PciIo implementations.
> 
> (2) ExitBootServices() notify functions are called in unspecified order
> (there's no way to express dependencies between them). This means that,
> following the above pattern, an IO address translation could be torn
> down before a reliant PCI device is de-configured in its own UEFI_DRIVER
> ExitBootServices() handler. The result can be a series of IOMMU faults.
> I'm not sure if that's bad in practice, but it does look like a layering
> violation. (We shouldn't tear down resources from the bottom up.)
> 
> (3) The DisableDmar() function is relatively simple. It uses VT-d
> registers that exist independently of any PCI devices, and of any "live"
> mappings.
> 
> Some IOMMU protocol implementations might be different -- for them the
> live configuration might only be known by constantly tracking the full
> set of Map() and Unmap() operations. That's a hurdle for the implementation.
> 
> Furthermore, what is supposed to happen if both the IOMMU driver and the
> individual PCI driver destroy the mapping at ExitBootServices()? If the
> PCI driver's notification function runs first, then the IOMMU driver can
> mark the mapping as "unmapped", and ignore it in its own notification
> function. If the IOMMU driver's notification function runs first, then
> it can again only mark the mapping as "unmapped", so that when the PCI
> driver tries to unmap it as well, things don't blow up. This seems to
> imply that an unmapped mapping data structure can never be recycled,
> because we never know what action might follow. This sounds very
> confusing to me.
> 
> 
> Now, there are two arguments in favor of the IOMMU ExitBootServices()
> callback as well:
> 
> - Security. It doesn't matter if some driver forgets to de-configure a
> translation, the IOMMU driver won't. Layering violations be damned.
> 
> - In ExitBootServices() notification functions, the UEFI memory map must
> not be altered (memory cannot be released). This means that Bus Master
> Write and Bus Master Read operations, for pending transfers, cannot be
> unmapped (only CommonBuffer operations can be), because unmapping
> Write/Read might free memory (bounce buffers) *implicitly*. For
> CommonBuffer operations, separate PciIo.FreeBuffer() calls are necessary
> for releasing memory, thus the notification functions can simply *not*
> call FreeBuffer(), to stay safe. But this is not possible for pending
> Read/Write operations.
> 
> 
> My current thinking is that (a) PCI drivers should unmap pending Common
> Buffer operations at ExitBootServices; (b) PCI drivers should never
> return from their protocol member functions with pending Write/Read
> operations, only CommonBuffer operations (put differently, if the
> operation is asynchronous on the higher protocol level, then make the
> underlying BMDMA operation CommonBuffer); (c) IOMMU protocol
> implementations should not be required to clean up translations, only
> allowed to, if they can do it without interfering with (a).
> 
> Basically, the question is, who *owns* the translations. Based on the
> PciIo interfaces, and on the requirement that PCI drivers Map() and
> Unmap() buffers, my opinion is that the PCI drivers own the translations.
> 
> If you think that we should take this question to the USWG, I'm OK with
> that.
> 
> Thanks,
> Laszlo
> 
> 
> 
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
>> Sent: Monday, September 4, 2017 3:55 AM
>> To: edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Subject: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>
>> Repo:   https://github.com/lersek/edk2.git
>> Branch: pci_host_controllers_unmap
>>
>> This series updates four PCI Host Controller drivers so that they don't leave CommonBuffer mappings hanging when control is transfered to the OS.
>>
>> Cc: Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>>
>> Cc: Eric Dong <eric.dong@intel.com<mailto:eric.dong@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>>
>> Thanks
>> Laszlo
>>
>> Laszlo Ersek (4):
>>   MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot
>>     services
>>   MdeModulePkg/EhciDxe: unmap BM common buffers when exiting boot
>>     services
>>   MdeModulePkg/XhciDxe: unmap BM common buffers when exiting boot
>>     services
>>   MdeModulePkg/AtaAtapiPassThru: unmap common buffers at
>>     ExitBootServices()
>>
>>  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h         | 18 ++++++
>>  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h |  5 ++  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c | 67 +++++++++++++++++++-
>>  MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c                      | 25 +++++++-
>>  MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c                      | 25 +++++++-
>>  MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c                      | 31 +++++++++
>>  6 files changed, 168 insertions(+), 3 deletions(-)
>>
>> --
>> 2.14.1.3.gb7cf6e02401b
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-05  9:15       ` Laszlo Ersek
@ 2017-09-05 13:44         ` Yao, Jiewen
  2017-09-05 17:57           ` Laszlo Ersek
  0 siblings, 1 reply; 21+ messages in thread
From: Yao, Jiewen @ 2017-09-05 13:44 UTC (permalink / raw)
  To: Laszlo Ersek, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric

Good discussion. Comment in line.

From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Tuesday, September 5, 2017 5:16 PM
To: Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Dong, Eric <eric.dong@intel.com>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

On 09/05/17 04:18, Yao, Jiewen wrote:
> HI Laszlo
> Thank you very much to raise this DMA topic.
>
> I have a chat with Star. We have couples of question, from high level to low level.
> Hope you can clarify:
>
>
> 1)       About disable DMA
>
> As you mentioned, in UEFI_Driver_Writer_Guide_V1.0.1_120308.pdf, 7.7 Adding the Exit Boot Services feature
> =========================
> Examples from the EDK II that use this feature are the PCI device drivers for USB Host
> Controllers. Some USB Host Controllers are PCI Bus Masters that continuously access a
> memory buffer to poll for operation requests. Access to this memory buffer by a USB
> Host Controller may be required to boot an operation system, but this activity must be
> terminated when the OS calls ExitBootServices(). The typical action in the Exit Boot
> Services Event for these types of drivers is to disable the PCI bus master and place the
> USB Host Controller into a halted state
> =========================
>
> I believe the fundamental requirement is to "disable DMA".
> As such, the simplest action is to clear PCI Bus Master Enable bit in the command register.
>
> Do you think clearing BME is enough?
> Is there any special reason that we must unmap the DMA buffer?

Modifying the device state is enough to ask the device *nicely* to stop
doing DMA.

But, I think, if we are on an IOMMU-protected system, where part of
PciIo.Map() is to add a translation (= system memory access permission)
to the IOMMU, then we should also revoke this permission explicitly,
*after* we asked the device nicely to stop doing DMA.

Basically, ask the device nicely first, and then enforce the protection.

... BTW, I mentioned "IOMMU faults" earlier, and I have some further
thoughts on them. This is about tearing down IOMMU translations *first*,
in the IOMMU driver's ExitBootServices() handler, *before* the PCI
drivers get a chance -- in *their* ExitBootServices() handlers -- to ask
the PCI devices nicely to stop doing DMA.

I think a quite problematic failure mode is possible here. Assume that a
disk controller has some write commands queued (which are mapped with
Bus Master Read or Bus Master CommonBuffer operations). If the
ExitBootServices() handler for the IOMMU driver runs first, then the
device could act upon these queued commands with missing IOMMU
translations, before the device's own PCI driver asked the device to
shut down DMA. In some cases, I guess this could result in those queued
write commands simply failing, without ill effects. However, dependent
on the IOMMU implementation, the write commands might not fail
explicitly, but read garbage from system memory, and write garbage to
disk blocks. That would be bad.

I think this is exactly what could happen with the SEV IOMMU we have.
The memory would be first re-encrypted (this is equivalent to revoking a
translation), and then the write commands (queued in the emulated AHCI
controller, in the hypervisor) would read garbage from guest memory, and
write garbage to the virtual disk.

[Jiewen] I am not clear on how it happens in SEV here.
Below is my thought, please correct me if I am wrong.

First, the fundamental requirement is to disable BME at exit boot service,
or make sure the controller is in halt state.
If a device driver fails to meet such requirement, it is a bug we need to fix at first.

Now, let's assume all device driver is in halt state at ExitBootService. - that is my
understanding for all rest discussion.

I checked OVMF and I do not see any code in IOMMU driver to register exit boot services event,
which means the DMA buffer is still decrypted.

If that is the case, the DMA transfer before BME disable is safety (DMA buffer is decrypted) and
there will be no DMA transfer any more at ExitBootService (BME disabled).
Then how does the write command queued in the emulated AHCI controller start read garbage?

The SEV IOMMU driver only modifies the page table.
Then after ExitBootService, the OS will take control of CR3 and set correct
encryption bit.
NOTE: The device should still be halt state, because the device is
also controlled by OS.







>
>
> 2)       About common buffer
> If there is a special requirement that we must unmap the DMA buffer, why we only unmap the common buffer?
>
> I understand your statement that unmapping a read buffer or write buffer may cause free memory action. But that is implementation choice.

I believe I disagree with this (i.e., I think it is not an
implementation choice).

In the general case, we can assume that PciIo.Map() allocates a bounce
buffer internally, when Read or Write action is requested.

[Jiewen] When I say implementation choice, I mean Map() *may* allocates
a bounce buffer or it just uses the same address, if the platform is capable of
using DRAM as DMA buffer directly.




In turn, PciIo.Unmap() has to release the same buffer, internally.
PciIo.Unmap() does not know if it is called from an ExitBootServices()
notification function, or from a normal context. It cannot decide if it
is allowed to release memory, or not. There's no additional parameter to
provide this hint either. If PciIo.Unmap() never releases buffers, then
ExitBootServices() will be happy, but we're going to leak memory.

Now, PciIo.Unmap() could move the bounce buffer to some internal "free
list" instead, rather than freeing it. And, the next PciIo.Map() call
could try to reuse a buffer from the "free list". But this is difficult
to do, because the buffers may have any size, and recycling them would
require an internal, general "page pool", in parallel to the Boot
Services page pool (AllocatePages, FreePages).

If PciIo.Unmap() could be informed that it is running in
ExitBootServices() notification context -- called from PCI drivers --,
then unmapping Read and Write operations would be simple.

> If we have strong requirement to unmap the DMA buffer, we should achieve that.

I totally agree that we *should* achieve that, but I don't see how the
current PciIo.Map() and PciIo.Unmap() APIs make it possible, for Read
and Write operations.

For CommonBuffer operations, it is possible, because AllocateBuffer()
and FreeBuffer() are separate actions, and an ExitBootServices()
notification function can *avoid* calling FreeBuffer. (But only for
CommonBuffer operations.) PciIo.Unmap() cannot be told *not* to release
a bounce buffer for Read/Write.

[Jiewen] Again, it is an implementation choice. It is possible that a host
bridge allocate common buffer without calling any Allocate().
But another implementation may also allocate internal data structure to
record every map() action, including common buffer. (For event log, or
tracing purpose, et etc.)

I do not think Free() should bring difference between common buffer or
read/write buffer.
I am a little worried that we have too many assumption here:
e.g. Map() common buffer never involves Allocate() even for driver internal
data structure , and Map() read/write buffer always involves Allocate() even
there is no need to allocate anything.
With that assumption, we purposely avoid Unmap() read/write buffer,
but force Unmap() common buffer.

I am not sure if that assumption is correct.





>
>
> 3)       About interaction with IOMMU
> I am not worried about IOMMU interaction.
> I believe an proper IOMMU implementation should not block any action taken by a PCI device driver.
> Current IntelVTdDxe driver disables VTd engine in ExitBootService, which means the DMA access is no longer managed by VTd engine. It is NOT related to disable DMA buffer.

Ah, I understand!

So, on one hand, that is totally safe, for PCI device drivers. It means
all pending DMA can get through, until the PCI drivers ask the devices
(nicely) to stop doing DMA.
[Jiewen] Yes. I think any IOMMU implementation should take that
into consideration.



On the other hand, it is the opposite of the security goal of SEV. With
SEV, the default state is "all DMA forbidden" (= all guest memory
encrypted). This is also the status that we'd like to achieve when
ExitBootServices() completes. When control is transferred to the OS, all
guest memory should be encrypted again, even those areas that were once
used as CommonBuffers.
[Jiewen] I am not sure who will control "When control is transferred to the OS, all
guest memory should be encrypted again, even those areas that were once
used as CommonBuffers."
For SEV case, I think it should be controlled by OS, because it is just CR3.
If so, we should not meet any problem, because we already disable BME
in device driver's ExitBootService event.





So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
are opposites -- VT-d permits all DMA unless configured otherwise, while
SEV forbids all DMA unless configured otherwise.
[Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
I setup translation table, but mark all entry to be NOT-PRESENT.
I grant DMA access only if there is a specific request by a device.

I open DMA access in ExitBootServices, just want to make sure it is compatible to
the existing OS, which might not have knowledge on IOMMU.
I will provide a patch to make it a policy very soon. As such VTd IOMMU may
turn off IOMMU, or keep it enabled at ExitBootService.
An IOMMU aware OS may take over IOMMU directly and reprogram it.





> 4)       About the driver you modified
> I believe we have some other PCI device driver (NVMe/SDMMC/UFS), besides the ones you modified in this patch (ATA/UCHI/EHCI/XHCI).
> If we need take this approach, I assume they also need update, right?

Yes, they should be updated similarly.

SDMMC and UFS are not included in OVMF, so I couldn't test their behavior.

NVMe is included in OVMF, but it is used very rarely in practice (we can
call it a corner case), so I thought I would first submit patches for
the four main drivers (UHCI, EHCI, XHCI, AHCI) that are frequently used
with OVMF and QEMU. Tracking down the CommonBuffer lifecycles was
difficult :)
[Jiewen] Understood. That is OK. We can handle that in separate patch.


Thank you!
Laszlo


>
> Thank you
> Yao Jiewen
>
>
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Tuesday, September 5, 2017 5:20 AM
> To: Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>
> On 09/04/17 12:36, Zeng, Star wrote:
>> Curious about one question when I am reviewing this patch series.
>>
>> Does/Should the IOMMU component disable the address translation at ExitBootServices?
>
> It is a very good and interesting question.
>
> My answer is, I don't know. The IOMMU protocol abstraction is specific
> to edk2, and its behavior is not documented in the PI or UEFI specs (as
> far as I know).
>
> For one example, the VT-d IOMMU implementation in
>
>   IntelSiliconPkg/IntelVTdDxe
>
> zaps all translations at ExitBootServices():
>
> VOID
> EFIAPI
> OnExitBootServices (
>   IN EFI_EVENT                               Event,
>   IN VOID                                    *Context
>   )
> {
>   DEBUG ((DEBUG_INFO, "Vtd OnExitBootServices\n"));
>   DumpVtdRegsAll ();
>   DisableDmar ();
>   DumpVtdRegsAll ();
> }
>
> Now, in case someone suggested that the same approach should be followed
> by all IOMMU implementations, I'd have three counter-arguments:
>
> (1) My series states the problem (and seeks to solve it) on the PciIo
> protocol level, which is standardized. The IOMMU protocol is edk2-only.
> I feel that this kind of "unmap on ExitBootServices" should be done by
> all UEFI drivers that map DMA common buffers, regardless of
> platform-specific PciIo implementations.
>
> (2) ExitBootServices() notify functions are called in unspecified order
> (there's no way to express dependencies between them). This means that,
> following the above pattern, an IO address translation could be torn
> down before a reliant PCI device is de-configured in its own UEFI_DRIVER
> ExitBootServices() handler. The result can be a series of IOMMU faults.
> I'm not sure if that's bad in practice, but it does look like a layering
> violation. (We shouldn't tear down resources from the bottom up.)
>
> (3) The DisableDmar() function is relatively simple. It uses VT-d
> registers that exist independently of any PCI devices, and of any "live"
> mappings.
>
> Some IOMMU protocol implementations might be different -- for them the
> live configuration might only be known by constantly tracking the full
> set of Map() and Unmap() operations. That's a hurdle for the implementation.
>
> Furthermore, what is supposed to happen if both the IOMMU driver and the
> individual PCI driver destroy the mapping at ExitBootServices()? If the
> PCI driver's notification function runs first, then the IOMMU driver can
> mark the mapping as "unmapped", and ignore it in its own notification
> function. If the IOMMU driver's notification function runs first, then
> it can again only mark the mapping as "unmapped", so that when the PCI
> driver tries to unmap it as well, things don't blow up. This seems to
> imply that an unmapped mapping data structure can never be recycled,
> because we never know what action might follow. This sounds very
> confusing to me.
>
>
> Now, there are two arguments in favor of the IOMMU ExitBootServices()
> callback as well:
>
> - Security. It doesn't matter if some driver forgets to de-configure a
> translation, the IOMMU driver won't. Layering violations be damned.
>
> - In ExitBootServices() notification functions, the UEFI memory map must
> not be altered (memory cannot be released). This means that Bus Master
> Write and Bus Master Read operations, for pending transfers, cannot be
> unmapped (only CommonBuffer operations can be), because unmapping
> Write/Read might free memory (bounce buffers) *implicitly*. For
> CommonBuffer operations, separate PciIo.FreeBuffer() calls are necessary
> for releasing memory, thus the notification functions can simply *not*
> call FreeBuffer(), to stay safe. But this is not possible for pending
> Read/Write operations.
>
>
> My current thinking is that (a) PCI drivers should unmap pending Common
> Buffer operations at ExitBootServices; (b) PCI drivers should never
> return from their protocol member functions with pending Write/Read
> operations, only CommonBuffer operations (put differently, if the
> operation is asynchronous on the higher protocol level, then make the
> underlying BMDMA operation CommonBuffer); (c) IOMMU protocol
> implementations should not be required to clean up translations, only
> allowed to, if they can do it without interfering with (a).
>
> Basically, the question is, who *owns* the translations. Based on the
> PciIo interfaces, and on the requirement that PCI drivers Map() and
> Unmap() buffers, my opinion is that the PCI drivers own the translations.
>
> If you think that we should take this question to the USWG, I'm OK with
> that.
>
> Thanks,
> Laszlo
>
>
>
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
>> Sent: Monday, September 4, 2017 3:55 AM
>> To: edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>>
>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com<mailto:eric.dong@intel.com%3cmailto:eric.dong@intel.com>>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>> Subject: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>
>> Repo:   https://github.com/lersek/edk2.git
>> Branch: pci_host_controllers_unmap
>>
>> This series updates four PCI Host Controller drivers so that they don't leave CommonBuffer mappings hanging when control is transfered to the OS.
>>
>> Cc: Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com<mailto:brijesh.singh@amd.com%3cmailto:brijesh.singh@amd.com>>>
>> Cc: Eric Dong <eric.dong@intel.com<mailto:eric.dong@intel.com<mailto:eric.dong@intel.com%3cmailto:eric.dong@intel.com>>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>>
>> Thanks
>> Laszlo
>>
>> Laszlo Ersek (4):
>>   MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot
>>     services
>>   MdeModulePkg/EhciDxe: unmap BM common buffers when exiting boot
>>     services
>>   MdeModulePkg/XhciDxe: unmap BM common buffers when exiting boot
>>     services
>>   MdeModulePkg/AtaAtapiPassThru: unmap common buffers at
>>     ExitBootServices()
>>
>>  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h         | 18 ++++++
>>  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h |  5 ++  MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c | 67 +++++++++++++++++++-
>>  MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c                      | 25 +++++++-
>>  MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c                      | 25 +++++++-
>>  MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c                      | 31 +++++++++
>>  6 files changed, 168 insertions(+), 3 deletions(-)
>>
>> --
>> 2.14.1.3.gb7cf6e02401b
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
> https://lists.01.org/mailman/listinfo/edk2-devel
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-05 13:44         ` Yao, Jiewen
@ 2017-09-05 17:57           ` Laszlo Ersek
  2017-09-06  4:37             ` Yao, Jiewen
  0 siblings, 1 reply; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-05 17:57 UTC (permalink / raw)
  To: Yao, Jiewen, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric, Brijesh Singh

On 09/05/17 15:44, Yao, Jiewen wrote:
> Good discussion. Comment in line.
> 
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Tuesday, September 5, 2017 5:16 PM
> To: Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
> Cc: Dong, Eric <eric.dong@intel.com>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
> 
> On 09/05/17 04:18, Yao, Jiewen wrote:
>> HI Laszlo
>> Thank you very much to raise this DMA topic.
>>
>> I have a chat with Star. We have couples of question, from high level to low level.
>> Hope you can clarify:
>>
>>
>> 1)       About disable DMA
>>
>> As you mentioned, in UEFI_Driver_Writer_Guide_V1.0.1_120308.pdf, 7.7 Adding the Exit Boot Services feature
>> =========================
>> Examples from the EDK II that use this feature are the PCI device drivers for USB Host
>> Controllers. Some USB Host Controllers are PCI Bus Masters that continuously access a
>> memory buffer to poll for operation requests. Access to this memory buffer by a USB
>> Host Controller may be required to boot an operation system, but this activity must be
>> terminated when the OS calls ExitBootServices(). The typical action in the Exit Boot
>> Services Event for these types of drivers is to disable the PCI bus master and place the
>> USB Host Controller into a halted state
>> =========================
>>
>> I believe the fundamental requirement is to "disable DMA".
>> As such, the simplest action is to clear PCI Bus Master Enable bit in the command register.
>>
>> Do you think clearing BME is enough?
>> Is there any special reason that we must unmap the DMA buffer?
> 
> Modifying the device state is enough to ask the device *nicely* to stop
> doing DMA.
> 
> But, I think, if we are on an IOMMU-protected system, where part of
> PciIo.Map() is to add a translation (= system memory access permission)
> to the IOMMU, then we should also revoke this permission explicitly,
> *after* we asked the device nicely to stop doing DMA.
> 
> Basically, ask the device nicely first, and then enforce the protection.
> 
> ... BTW, I mentioned "IOMMU faults" earlier, and I have some further
> thoughts on them. This is about tearing down IOMMU translations *first*,
> in the IOMMU driver's ExitBootServices() handler, *before* the PCI
> drivers get a chance -- in *their* ExitBootServices() handlers -- to ask
> the PCI devices nicely to stop doing DMA.
> 
> I think a quite problematic failure mode is possible here. Assume that a
> disk controller has some write commands queued (which are mapped with
> Bus Master Read or Bus Master CommonBuffer operations). If the
> ExitBootServices() handler for the IOMMU driver runs first, then the
> device could act upon these queued commands with missing IOMMU
> translations, before the device's own PCI driver asked the device to
> shut down DMA. In some cases, I guess this could result in those queued
> write commands simply failing, without ill effects. However, dependent
> on the IOMMU implementation, the write commands might not fail
> explicitly, but read garbage from system memory, and write garbage to
> disk blocks. That would be bad.
> 
> I think this is exactly what could happen with the SEV IOMMU we have.
> The memory would be first re-encrypted (this is equivalent to revoking a
> translation), and then the write commands (queued in the emulated AHCI
> controller, in the hypervisor) would read garbage from guest memory, and
> write garbage to the virtual disk.
> 
> [Jiewen] I am not clear on how it happens in SEV here.
> Below is my thought, please correct me if I am wrong.
> 
> First, the fundamental requirement is to disable BME at exit boot service,
> or make sure the controller is in halt state.
> If a device driver fails to meet such requirement, it is a bug we need to fix at first.
> 
> Now, let's assume all device driver is in halt state at ExitBootService. - that is my
> understanding for all rest discussion.

OK.

> I checked OVMF and I do not see any code in IOMMU driver to register exit boot services event,
> which means the DMA buffer is still decrypted.

More precisely, the IOMMU driver does not take it upon itself to
re-encrypt DMA buffers at ExitBootServices().

Instead it relies on the PCI drivers to Unmap() CommonBuffers at
ExitBootServices().

> If that is the case, the DMA transfer before BME disable is safety (DMA buffer is decrypted) and
> there will be no DMA transfer any more at ExitBootService (BME disabled).
> Then how does the write command queued in the emulated AHCI controller start read garbage?

It is not happening right now. When I wrote that, I was just speculating
about possible consequences; i.e., what *could* happen if the IOMMU
driver itself re-encrypted memory at ExitBootServices().

*Then* the garbage case could happen.

In other words, the case I described does not happen right now; I only
described it as a counter-argument against *adding* an
ExitBootServices() handler to the IOMMU driver.

Under OvmfPkg, the IOMMU behavior for SEV, and the PCI drivers for the
virtio devices, are fully in synch. The PCI drivers *alone* are
responsible for unmapping CommonBuffers at ExitBootServices(), and so
the IOMMU driver doesn't do it.

The question is if the same distribution of responsibilities applies to
PCI drivers that live outside of OvmfPkg, such as MdeModulePkg.

The current patch set suggests "yes".

> The SEV IOMMU driver only modifies the page table.

Not exactly. It modifies page table entries, and then re-writes the
memory covered by those page table entries.

Simply flipping the C-bit in page table entries does not reencrypt
memory contents, it only sets up the next read and/or write that happens
through that page mapping, for "decrypted view" or "encrypted view",
from the virtualization host side.

> Then after ExitBootService, the OS will take control of CR3 and set correct
> encryption bit.

This is true, the guest OS will set up new page tables, and in those
PTEs, the C-bit ("encrypt") will likely be set by default.

However, the guest OS will not *rewrite* all of the memory, with the
C-bit set. This means that any memory that the firmware didn't actually
re-encrypt (in the IOMMU driver) will still expose stale data to the
hypervisor.

> NOTE: The device should still be halt state, because the device is
> also controlled by OS.

This is the key point -- the device is *not* controlled by the guest OS,
in the worst case.

The virtual device is ultimately implemented by the hypervisor. We don't
expect the virtual device (= the hypervisor) to mis-behave on purpose.
However, if the hypervisor is compromised by an external attacker (for
example over the network, or via privilege escalation from another
guest), then all guest RAM that has not been encrypted up to that point
becomes visible to the attacker. In other words, the hypervisor ("the
device") becomes retro-actively malicious.

The point of SEV is to keep as much guest data encrypted at all times as
possible, so that *whenever* the hypervisor is compromised by an
attacker, the guest memory -- which the attacker can inevitably dump
from the host side -- remains un-intellegible to the attacker.


>> 2)       About common buffer
>> If there is a special requirement that we must unmap the DMA buffer, why we only unmap the common buffer?
>>
>> I understand your statement that unmapping a read buffer or write buffer may cause free memory action. But that is implementation choice.
> 
> I believe I disagree with this (i.e., I think it is not an
> implementation choice).
> 
> In the general case, we can assume that PciIo.Map() allocates a bounce
> buffer internally, when Read or Write action is requested.
> 
> [Jiewen] When I say implementation choice, I mean Map() *may* allocates
> a bounce buffer or it just uses the same address, if the platform is capable of
> using DRAM as DMA buffer directly.

I have two counter-arguments:

(a) As long as we don't want to reimplement PciIo.Map() from the
scratch, the current layering of protocols in edk2 implies that even if
the PCI device driver sets DUAL_ADDRESS_CYCLE, this flag can be cleared
on the way to the IOMMU protocol, in PciHostBridgeDxe, if
PciHostBridgeLib says "no DMA above 4GB".

OvmfPkg's PciHostBridgeLib currently says "no DMA above 4GB", simply
because that's always been the case, and I'm not sure if we can lift the
limitation without regressions.


(b) The more important counter-argument is that Map() and Unmap() need
auxiliary memory (a temporary buffer) for actually encrypting and
decrypting memory, so that the virtual device (implemented by the
hypervisor) can read it or write it (temporarily).

As I wrote above, encrypting or decrypting the memory contents works by
writing the original data through a new page mapping that has the
desired C-bit setting.

In a guest operating system, in-place encryption is possible by having
two virtual mappings to the same physical address range, one with the
C-bit set, and another with the C-bit clear. For encrypting, the guest
OS reads a cache-line sized chunk of data through the "C-bit clear"
mapping, and writes it back through the "C-bit set" mapping. For
decrypting, it does the inverse.

In UEFI, we don't have two virtual page mappings for the same physical
address, we only have a 1:1 mapping, on which we can flip the C-bit.
This means that, for decryption for example, we have to copy the data
from its original location (currently encrypted) to a "stash" (which is
always encrypted), modify the PTEs for the original location (clear the
C-bit, flush the TLB), then finally copy the data back from the "stash"
(still encrypted) to the original location (which will effectively
decrypt it, for the hypervisor to read).

This "stash" requires room. Currently the "stash" has an equal number of
pages to the actual region being mapped. For CommonBuffer operations,
the stash is allocated in AllocateBuffer(), together with the normal
buffer's allocation. It is freed similarly in FreeBuffer().

For Read/Write BMDMA operations, for which the PCI driver only calls
Map()/Unmap(), we cannot avoid allocating the stash in Map(), and
freeing it in Unmap(). We are practically forced to allocate a bounce
buffer not due to address width limitations (32 vs 64), but due to C-bit
status.

Now, earlier I had an idea to add a 1-page sized global array variable
to the IOMMU driver, and do the encryption/decryption page by page. In
other words, "unroll" the PTE management and the copying into page-sized
chunks. This would eliminate the need to allocate a "stash" dynamically.

However, the drawback of this approach is that the PTE's must be toggled
one by one even for a larger region. It could cause undesirably high
page table splitting and bad performance.


So, the SEV IOMMU could only eliminate dynamic memory alloc/dealloc for
Read/Write DMA operations allocation if both conditions were changed:

- if we knew for sure that 64-bit DMA was safe to advertize in
  PciHostBridgeLib,

- if we replaced the dynamically allocated stash, and the "batch" PTE
  management + copying, with a single-page static buffer, and "unrolled"
  PTE management + copying.

More generally speaking, 64-bit clean DMA might simply not be possible
on a given hardware platform, and on that platform, PciIo.Map() could
not avoid allocating bounce buffers for Read/Write DMA operations,
generally speaking. Thus, we shouldn't require a PCI driver to call
Unmap() at ExitBootServices().


> In turn, PciIo.Unmap() has to release the same buffer, internally.
> PciIo.Unmap() does not know if it is called from an ExitBootServices()
> notification function, or from a normal context. It cannot decide if it
> is allowed to release memory, or not. There's no additional parameter to
> provide this hint either. If PciIo.Unmap() never releases buffers, then
> ExitBootServices() will be happy, but we're going to leak memory.
> 
> Now, PciIo.Unmap() could move the bounce buffer to some internal "free
> list" instead, rather than freeing it. And, the next PciIo.Map() call
> could try to reuse a buffer from the "free list". But this is difficult
> to do, because the buffers may have any size, and recycling them would
> require an internal, general "page pool", in parallel to the Boot
> Services page pool (AllocatePages, FreePages).
> 
> If PciIo.Unmap() could be informed that it is running in
> ExitBootServices() notification context -- called from PCI drivers --,
> then unmapping Read and Write operations would be simple.
> 
>> If we have strong requirement to unmap the DMA buffer, we should achieve that.
> 
> I totally agree that we *should* achieve that, but I don't see how the
> current PciIo.Map() and PciIo.Unmap() APIs make it possible, for Read
> and Write operations.
> 
> For CommonBuffer operations, it is possible, because AllocateBuffer()
> and FreeBuffer() are separate actions, and an ExitBootServices()
> notification function can *avoid* calling FreeBuffer. (But only for
> CommonBuffer operations.) PciIo.Unmap() cannot be told *not* to release
> a bounce buffer for Read/Write.
> 
> [Jiewen] Again, it is an implementation choice. It is possible that a host
> bridge allocate common buffer without calling any Allocate().
> But another implementation may also allocate internal data structure to
> record every map() action, including common buffer. (For event log, or
> tracing purpose, et etc.)

True. My point is that
- we could reasonably require that Unmap() work for CommonBuffer
  operations without releasing memory,
- but we couldn't reasonably require the same for Read and Write
  operations.

This is because for CommonBuffer, a platform that right now allocates
extra stuff in Map(), could move the same allocations into
AllocateBuffer(). Similarly, move the current de-allocations from
Unmap() into FreeBuffer(). The same is not possible for Read/Write, so
we shouldn't require that.

> I do not think Free() should bring difference between common buffer or
> read/write buffer.
> I am a little worried that we have too many assumption here:
> e.g. Map() common buffer never involves Allocate() even for driver internal
> data structure , and Map() read/write buffer always involves Allocate() even
> there is no need to allocate anything.
> With that assumption, we purposely avoid Unmap() read/write buffer,
> but force Unmap() common buffer.

Exactly.

> I am not sure if that assumption is correct.

I'm just trying to operate with the APIs currently defined by the UEFI
spec, and these assumptions were the best I could come up with.

For example, if PciIo.Unmap() took one more parameter:

  IN BOOLEAN  CalledAtExitBootServices

then all this would be solved at once. Namely, PciIo implementations
would be required to call Unmap() for *all* DMA operations (read, write,
commonbuffer) in their ExitBootServices() notification functions, with
the parameter set to TRUE. All other contexts should pass FALSE.

In turn, CalledAtExitBootServices==TRUE would require PciIo.Unmap() to
work exactly as otherwise, but all memory allocated earlier with
gBS->AllocatePool() and gBS->AllocatePages() would have to be leaked on
purpose.

This is all.

Given that this parameter does not exist, I'm trying to find other ways to:
- perform the unmapping (re-encryption) at ExitBootServices,
- without releasing memory,
- and not in the IOMMU driver (because that could remove the translation
  before the PCI driver's own callback could disable the device, and
  that would be wrong)


>> 3)       About interaction with IOMMU
>> I am not worried about IOMMU interaction.
>> I believe an proper IOMMU implementation should not block any action taken by a PCI device driver.
>> Current IntelVTdDxe driver disables VTd engine in ExitBootService, which means the DMA access is no longer managed by VTd engine. It is NOT related to disable DMA buffer.
> 
> Ah, I understand!
> 
> So, on one hand, that is totally safe, for PCI device drivers. It means
> all pending DMA can get through, until the PCI drivers ask the devices
> (nicely) to stop doing DMA.
> [Jiewen] Yes. I think any IOMMU implementation should take that
> into consideration.
> 
> 
> 
> On the other hand, it is the opposite of the security goal of SEV. With
> SEV, the default state is "all DMA forbidden" (= all guest memory
> encrypted). This is also the status that we'd like to achieve when
> ExitBootServices() completes. When control is transferred to the OS, all
> guest memory should be encrypted again, even those areas that were once
> used as CommonBuffers.
> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
> guest memory should be encrypted again, even those areas that were once
> used as CommonBuffers."
> For SEV case, I think it should be controlled by OS, because it is just CR3.

If it was indeed just CR3, then I would fully agree with you.

But -- to my understanding --, ensuring that the memory is actually
encrypted requires that the guest *write* to the memory through a
virtual address whose PTE has the C-bit set.

And I don't think the guest OS can be expected to rewrite all of its
memory at launch. :(

> If so, we should not meet any problem, because we already disable BME
> in device driver's ExitBootService event.

I agree, but installing new page tables is not enough.

> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
> are opposites -- VT-d permits all DMA unless configured otherwise, while
> SEV forbids all DMA unless configured otherwise.
> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
> I setup translation table, but mark all entry to be NOT-PRESENT.
> I grant DMA access only if there is a specific request by a device.
> 
> I open DMA access in ExitBootServices, just want to make sure it is compatible to
> the existing OS, which might not have knowledge on IOMMU.
> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
> turn off IOMMU, or keep it enabled at ExitBootService.
> An IOMMU aware OS may take over IOMMU directly and reprogram it.

Thanks for the clarification.

But, again, will this not lead to the possibility that the VT-d IOMMU's
ExitBootServices() handler disables all DMA *before* the PCI drivers get
a chance to run their own ExitBootServices() handlers, disabling the
individual devices?

If that happens, then a series of IOMMU faults could be generated, which
I described above. (I.e., such IOMMU faults could result, at least in
the case of SEV, in garbage being written to disk, via queued commands.)

>> 4)       About the driver you modified
>> I believe we have some other PCI device driver (NVMe/SDMMC/UFS), besides the ones you modified in this patch (ATA/UCHI/EHCI/XHCI).
>> If we need take this approach, I assume they also need update, right?
> 
> Yes, they should be updated similarly.
> 
> SDMMC and UFS are not included in OVMF, so I couldn't test their behavior.
> 
> NVMe is included in OVMF, but it is used very rarely in practice (we can
> call it a corner case), so I thought I would first submit patches for
> the four main drivers (UHCI, EHCI, XHCI, AHCI) that are frequently used
> with OVMF and QEMU. Tracking down the CommonBuffer lifecycles was
> difficult :)
> [Jiewen] Understood. That is OK. We can handle that in separate patch.

Thanks!

Now, to finish up, here's an idea I just had.

Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
notification function?

If we are, then we could do the following:

* PciIo.Unmap() and friends would work as always (no restrictions on
  dynamic memory allocation or freeing, for any kind of DMA operation).

* PCI drivers would not be expected to call PciIo.Unmap() in their
  ExitBootServices() handlers.

* The IOMMU driver would have an ExitBootServices() notification
  function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
  (doesn't matter which).

* In this notification function, the IOMMU driver would signal *another*
  event (a private one). The notification function for this event would
  be enqueued strictly at the TPL_CALLBACK level.

* The notification function for the second event (private to the IOMMU)
  would iterate over all existent mappings, and unmap them, without
  allocating or freeing memory.

The key point is that by signaling the second event, the IOMMU driver
could order its own cleanup code after all other ExitBootServices()
callbacks. (Assuming, at least, that no other ExitBootServices()
callback employs the same trick to defer itself!) Therefore by the time
the second callback runs, all PCI devices have been halted, and it is
safe to tear down the translations.

The ordering would be ensured by the following:

- The UEFI spec says under CreateEventEx(), "All events are guaranteed
  to be signaled before the first notification action is taken." This
  means that, by the time the IOMMU's ExitBootServices() handler is
  entered, all other ExitBootServices() handlers have been *queued* at
  least, at TPL_CALLBACK or TPL_NOTIFY.

- Therefore, when we signal our second callback from there, for
  TPL_CALLBACK, the second callback will be queued at the end of the
  TPL_CALLBACK queue.

- CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
  functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
  for the Type argument of CreateEvent." So it wouldn't matter if a
  driver used CreateEvent() or CreateEventEx() for setting up its own
  handler, the handler would be queued just the same.

I think this is ugly and brittle, but perhaps the only way to clean up
*all* translations safely, with regard to PciIo.Unmap() +
ExitBootServices(), without updating the UEFI spec.

What do you think?

Thanks,
Laszlo


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-05 17:57           ` Laszlo Ersek
@ 2017-09-06  4:37             ` Yao, Jiewen
  2017-09-06 12:14               ` Laszlo Ersek
  0 siblings, 1 reply; 21+ messages in thread
From: Yao, Jiewen @ 2017-09-06  4:37 UTC (permalink / raw)
  To: 'Laszlo Ersek', Zeng, Star, edk2-devel-01
  Cc: Dong, Eric, Brijesh Singh

Thanks for the clarification. Comment in line.

From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Wednesday, September 6, 2017 1:57 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Dong, Eric <eric.dong@intel.com>; Brijesh Singh <brijesh.singh@amd.com>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

On 09/05/17 15:44, Yao, Jiewen wrote:
> Good discussion. Comment in line.
>
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Tuesday, September 5, 2017 5:16 PM
> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>
> On 09/05/17 04:18, Yao, Jiewen wrote:
>> HI Laszlo
>> Thank you very much to raise this DMA topic.
>>
>> I have a chat with Star. We have couples of question, from high level to low level.
>> Hope you can clarify:
>>
>>
>> 1)       About disable DMA
>>
>> As you mentioned, in UEFI_Driver_Writer_Guide_V1.0.1_120308.pdf, 7.7 Adding the Exit Boot Services feature
>> =========================
>> Examples from the EDK II that use this feature are the PCI device drivers for USB Host
>> Controllers. Some USB Host Controllers are PCI Bus Masters that continuously access a
>> memory buffer to poll for operation requests. Access to this memory buffer by a USB
>> Host Controller may be required to boot an operation system, but this activity must be
>> terminated when the OS calls ExitBootServices(). The typical action in the Exit Boot
>> Services Event for these types of drivers is to disable the PCI bus master and place the
>> USB Host Controller into a halted state
>> =========================
>>
>> I believe the fundamental requirement is to "disable DMA".
>> As such, the simplest action is to clear PCI Bus Master Enable bit in the command register.
>>
>> Do you think clearing BME is enough?
>> Is there any special reason that we must unmap the DMA buffer?
>
> Modifying the device state is enough to ask the device *nicely* to stop
> doing DMA.
>
> But, I think, if we are on an IOMMU-protected system, where part of
> PciIo.Map() is to add a translation (= system memory access permission)
> to the IOMMU, then we should also revoke this permission explicitly,
> *after* we asked the device nicely to stop doing DMA.
>
> Basically, ask the device nicely first, and then enforce the protection.
>
> ... BTW, I mentioned "IOMMU faults" earlier, and I have some further
> thoughts on them. This is about tearing down IOMMU translations *first*,
> in the IOMMU driver's ExitBootServices() handler, *before* the PCI
> drivers get a chance -- in *their* ExitBootServices() handlers -- to ask
> the PCI devices nicely to stop doing DMA.
>
> I think a quite problematic failure mode is possible here. Assume that a
> disk controller has some write commands queued (which are mapped with
> Bus Master Read or Bus Master CommonBuffer operations). If the
> ExitBootServices() handler for the IOMMU driver runs first, then the
> device could act upon these queued commands with missing IOMMU
> translations, before the device's own PCI driver asked the device to
> shut down DMA. In some cases, I guess this could result in those queued
> write commands simply failing, without ill effects. However, dependent
> on the IOMMU implementation, the write commands might not fail
> explicitly, but read garbage from system memory, and write garbage to
> disk blocks. That would be bad.
>
> I think this is exactly what could happen with the SEV IOMMU we have.
> The memory would be first re-encrypted (this is equivalent to revoking a
> translation), and then the write commands (queued in the emulated AHCI
> controller, in the hypervisor) would read garbage from guest memory, and
> write garbage to the virtual disk.
>
> [Jiewen] I am not clear on how it happens in SEV here.
> Below is my thought, please correct me if I am wrong.
>
> First, the fundamental requirement is to disable BME at exit boot service,
> or make sure the controller is in halt state.
> If a device driver fails to meet such requirement, it is a bug we need to fix at first.
>
> Now, let's assume all device driver is in halt state at ExitBootService. - that is my
> understanding for all rest discussion.

OK.

> I checked OVMF and I do not see any code in IOMMU driver to register exit boot services event,
> which means the DMA buffer is still decrypted.

More precisely, the IOMMU driver does not take it upon itself to
re-encrypt DMA buffers at ExitBootServices().

Instead it relies on the PCI drivers to Unmap() CommonBuffers at
ExitBootServices().

> If that is the case, the DMA transfer before BME disable is safety (DMA buffer is decrypted) and
> there will be no DMA transfer any more at ExitBootService (BME disabled).
> Then how does the write command queued in the emulated AHCI controller start read garbage?

It is not happening right now. When I wrote that, I was just speculating
about possible consequences; i.e., what *could* happen if the IOMMU
driver itself re-encrypted memory at ExitBootServices().
[Jiewen] OK. Got it.





*Then* the garbage case could happen.

In other words, the case I described does not happen right now; I only
described it as a counter-argument against *adding* an
ExitBootServices() handler to the IOMMU driver.
[Jiewen] I agree with you.
A IOMMU driver should NOT use ExitBootServices to stop DMA transfer.




Under OvmfPkg, the IOMMU behavior for SEV, and the PCI drivers for the
virtio devices, are fully in synch. The PCI drivers *alone* are
responsible for unmapping CommonBuffers at ExitBootServices(), and so
the IOMMU driver doesn't do it.
[Jiewen] I agree that IOMMU does not.
But I hold my opinion on if the PCI driver is responsible for unmapping
CommonBuffer at ExitBootServices().




The question is if the same distribution of responsibilities applies to
PCI drivers that live outside of OvmfPkg, such as MdeModulePkg.
[Jiewen] Yes. If we want to define a rule, the rule should be for *any* PCI device driver,
no matter where it is from. It could be in OVMF package, MdeModulePkg,
a private repo from Independent BIOS Vendor (IBV) or Original Equipment Manufacturer
(OEM), and even from Independent Hardware Vendor (IHV).

A special notice: The IHV driver should only follow the interface defined
in UEFI specification, such as PCI_IO. The IHV driver should NOT consume
any interface defined in MdeModulePkg, such as EDKII_IOMMU.





The current patch set suggests "yes".

> The SEV IOMMU driver only modifies the page table.

Not exactly. It modifies page table entries, and then re-writes the
memory covered by those page table entries.
[Jiewen] Yes, you are right.
There is StashBuffer associated with every common buffer.




Simply flipping the C-bit in page table entries does not reencrypt
memory contents, it only sets up the next read and/or write that happens
through that page mapping, for "decrypted view" or "encrypted view",
from the virtualization host side.
[Jiewen] Thanks for the info.



> Then after ExitBootService, the OS will take control of CR3 and set correct
> encryption bit.

This is true, the guest OS will set up new page tables, and in those
PTEs, the C-bit ("encrypt") will likely be set by default.

However, the guest OS will not *rewrite* all of the memory, with the
C-bit set. This means that any memory that the firmware didn't actually
re-encrypt (in the IOMMU driver) will still expose stale data to the
hypervisor.
[Jiewen] That is an interesting question.
Is there any security concern associated?

If the C-bit is cleared for a read/write buffer, is the content in the
read/write buffer also exposed to hypervisor?

I means if there is security concern, the concern should be applied to
both common buffer and read/write buffer.
Then we have to figure a way to resolve both buffer.

To be honest, that is exactly my biggest question on this patch:
why do we only handle the common buffer specially?





> NOTE: The device should still be halt state, because the device is
> also controlled by OS.

This is the key point -- the device is *not* controlled by the guest OS,
in the worst case.

The virtual device is ultimately implemented by the hypervisor. We don't
expect the virtual device (= the hypervisor) to mis-behave on purpose.
However, if the hypervisor is compromised by an external attacker (for
example over the network, or via privilege escalation from another
guest), then all guest RAM that has not been encrypted up to that point
becomes visible to the attacker. In other words, the hypervisor ("the
device") becomes retro-actively malicious.
[Jiewen] If that is the threat model, I have a question on the exposure:
1) If the concern is for existing data, it means all DMA data already written.
However, the DMA data is already consumed or produced by virtual device
or a hypervisor. If the hypervisor is attacked, it already gets all the data content.
2) if the concern is for future data, it means all user data to be written.
However, the C-bit already prevent that.

Maybe I do not fully understand the threat model defined.
If you can explain more, that would be great.





The point of SEV is to keep as much guest data encrypted at all times as
possible, so that *whenever* the hypervisor is compromised by an
attacker, the guest memory -- which the attacker can inevitably dump
from the host side -- remains un-intellegible to the attacker.
[Jiewen] OK. If this is a security question, I suggest we define a clear
threat model at first.
Or what we are doing might be unnecessary or insufficient.






>> 2)       About common buffer
>> If there is a special requirement that we must unmap the DMA buffer, why we only unmap the common buffer?
>>
>> I understand your statement that unmapping a read buffer or write buffer may cause free memory action. But that is implementation choice.
>
> I believe I disagree with this (i.e., I think it is not an
> implementation choice).
>
> In the general case, we can assume that PciIo.Map() allocates a bounce
> buffer internally, when Read or Write action is requested.
>
> [Jiewen] When I say implementation choice, I mean Map() *may* allocates
> a bounce buffer or it just uses the same address, if the platform is capable of
> using DRAM as DMA buffer directly.

I have two counter-arguments:

(a) As long as we don't want to reimplement PciIo.Map() from the
scratch, the current layering of protocols in edk2 implies that even if
the PCI device driver sets DUAL_ADDRESS_CYCLE, this flag can be cleared
on the way to the IOMMU protocol, in PciHostBridgeDxe, if
PciHostBridgeLib says "no DMA above 4GB".

OvmfPkg's PciHostBridgeLib currently says "no DMA above 4GB", simply
because that's always been the case, and I'm not sure if we can lift the
limitation without regressions.


(b) The more important counter-argument is that Map() and Unmap() need
auxiliary memory (a temporary buffer) for actually encrypting and
decrypting memory, so that the virtual device (implemented by the
hypervisor) can read it or write it (temporarily).

As I wrote above, encrypting or decrypting the memory contents works by
writing the original data through a new page mapping that has the
desired C-bit setting.

In a guest operating system, in-place encryption is possible by having
two virtual mappings to the same physical address range, one with the
C-bit set, and another with the C-bit clear. For encrypting, the guest
OS reads a cache-line sized chunk of data through the "C-bit clear"
mapping, and writes it back through the "C-bit set" mapping. For
decrypting, it does the inverse.

In UEFI, we don't have two virtual page mappings for the same physical
address, we only have a 1:1 mapping, on which we can flip the C-bit.
This means that, for decryption for example, we have to copy the data
from its original location (currently encrypted) to a "stash" (which is
always encrypted), modify the PTEs for the original location (clear the
C-bit, flush the TLB), then finally copy the data back from the "stash"
(still encrypted) to the original location (which will effectively
decrypt it, for the hypervisor to read).

This "stash" requires room. Currently the "stash" has an equal number of
pages to the actual region being mapped. For CommonBuffer operations,
the stash is allocated in AllocateBuffer(), together with the normal
buffer's allocation. It is freed similarly in FreeBuffer().

For Read/Write BMDMA operations, for which the PCI driver only calls
Map()/Unmap(), we cannot avoid allocating the stash in Map(), and
freeing it in Unmap(). We are practically forced to allocate a bounce
buffer not due to address width limitations (32 vs 64), but due to C-bit
status.

Now, earlier I had an idea to add a 1-page sized global array variable
to the IOMMU driver, and do the encryption/decryption page by page. In
other words, "unroll" the PTE management and the copying into page-sized
chunks. This would eliminate the need to allocate a "stash" dynamically.

However, the drawback of this approach is that the PTE's must be toggled
one by one even for a larger region. It could cause undesirably high
page table splitting and bad performance.


So, the SEV IOMMU could only eliminate dynamic memory alloc/dealloc for
Read/Write DMA operations allocation if both conditions were changed:

- if we knew for sure that 64-bit DMA was safe to advertize in
  PciHostBridgeLib,

- if we replaced the dynamically allocated stash, and the "batch" PTE
  management + copying, with a single-page static buffer, and "unrolled"
  PTE management + copying.

More generally speaking, 64-bit clean DMA might simply not be possible
on a given hardware platform, and on that platform, PciIo.Map() could
not avoid allocating bounce buffers for Read/Write DMA operations,
generally speaking. Thus, we shouldn't require a PCI driver to call
Unmap() at ExitBootServices().

[Jiewen] I agree what you said above.
And I did not treat that to be counter-arguments. :-)





> In turn, PciIo.Unmap() has to release the same buffer, internally.
> PciIo.Unmap() does not know if it is called from an ExitBootServices()
> notification function, or from a normal context. It cannot decide if it
> is allowed to release memory, or not. There's no additional parameter to
> provide this hint either. If PciIo.Unmap() never releases buffers, then
> ExitBootServices() will be happy, but we're going to leak memory.
>
> Now, PciIo.Unmap() could move the bounce buffer to some internal "free
> list" instead, rather than freeing it. And, the next PciIo.Map() call
> could try to reuse a buffer from the "free list". But this is difficult
> to do, because the buffers may have any size, and recycling them would
> require an internal, general "page pool", in parallel to the Boot
> Services page pool (AllocatePages, FreePages).
>
> If PciIo.Unmap() could be informed that it is running in
> ExitBootServices() notification context -- called from PCI drivers --,
> then unmapping Read and Write operations would be simple.
>
>> If we have strong requirement to unmap the DMA buffer, we should achieve that.
>
> I totally agree that we *should* achieve that, but I don't see how the
> current PciIo.Map() and PciIo.Unmap() APIs make it possible, for Read
> and Write operations.
>
> For CommonBuffer operations, it is possible, because AllocateBuffer()
> and FreeBuffer() are separate actions, and an ExitBootServices()
> notification function can *avoid* calling FreeBuffer. (But only for
> CommonBuffer operations.) PciIo.Unmap() cannot be told *not* to release
> a bounce buffer for Read/Write.
>
> [Jiewen] Again, it is an implementation choice. It is possible that a host
> bridge allocate common buffer without calling any Allocate().
> But another implementation may also allocate internal data structure to
> record every map() action, including common buffer. (For event log, or
> tracing purpose, et etc.)

True. My point is that
- we could reasonably require that Unmap() work for CommonBuffer
  operations without releasing memory,
- but we couldn't reasonably require the same for Read and Write
  operations.

This is because for CommonBuffer, a platform that right now allocates
extra stuff in Map(), could move the same allocations into
AllocateBuffer(). Similarly, move the current de-allocations from
Unmap() into FreeBuffer(). The same is not possible for Read/Write, so
we shouldn't require that.

[Jiewen] For "require that Unmap() work for CommonBuffer
operations without releasing memory", I would hold my opinion until
it is documented clearly in UEFI spec.

My current concern is:
First, this sentence is NOT defined in UEFI specification.
Second, it might break an existing implementation or existing feature, such as tracing.

Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
to call memory services.
The only restriction is
1) TPL restriction, where memory service must be <= TPL_NOTIFY.
2) AP restriction, where no UEFI service/protocol is allowed for AP.





> I do not think Free() should bring difference between common buffer or
> read/write buffer.
> I am a little worried that we have too many assumption here:
> e.g. Map() common buffer never involves Allocate() even for driver internal
> data structure , and Map() read/write buffer always involves Allocate() even
> there is no need to allocate anything.
> With that assumption, we purposely avoid Unmap() read/write buffer,
> but force Unmap() common buffer.

Exactly.

> I am not sure if that assumption is correct.

I'm just trying to operate with the APIs currently defined by the UEFI
spec, and these assumptions were the best I could come up with.
[Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
Especially the IHV Option ROM should not consume any private API.






For example, if PciIo.Unmap() took one more parameter:

  IN BOOLEAN  CalledAtExitBootServices

then all this would be solved at once. Namely, PciIo implementations
would be required to call Unmap() for *all* DMA operations (read, write,
commonbuffer) in their ExitBootServices() notification functions, with
the parameter set to TRUE. All other contexts should pass FALSE.

In turn, CalledAtExitBootServices==TRUE would require PciIo.Unmap() to
work exactly as otherwise, but all memory allocated earlier with
gBS->AllocatePool() and gBS->AllocatePages() would have to be leaked on
purpose.

This is all.

Given that this parameter does not exist, I'm trying to find other ways to:
- perform the unmapping (re-encryption) at ExitBootServices,
- without releasing memory,
- and not in the IOMMU driver (because that could remove the translation
  before the PCI driver's own callback could disable the device, and
  that would be wrong)


>> 3)       About interaction with IOMMU
>> I am not worried about IOMMU interaction.
>> I believe an proper IOMMU implementation should not block any action taken by a PCI device driver.
>> Current IntelVTdDxe driver disables VTd engine in ExitBootService, which means the DMA access is no longer managed by VTd engine. It is NOT related to disable DMA buffer.
>
> Ah, I understand!
>
> So, on one hand, that is totally safe, for PCI device drivers. It means
> all pending DMA can get through, until the PCI drivers ask the devices
> (nicely) to stop doing DMA.
> [Jiewen] Yes. I think any IOMMU implementation should take that
> into consideration.
>
>
>
> On the other hand, it is the opposite of the security goal of SEV. With
> SEV, the default state is "all DMA forbidden" (= all guest memory
> encrypted). This is also the status that we'd like to achieve when
> ExitBootServices() completes. When control is transferred to the OS, all
> guest memory should be encrypted again, even those areas that were once
> used as CommonBuffers.
> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
> guest memory should be encrypted again, even those areas that were once
> used as CommonBuffers."
> For SEV case, I think it should be controlled by OS, because it is just CR3.

If it was indeed just CR3, then I would fully agree with you.

But -- to my understanding --, ensuring that the memory is actually
encrypted requires that the guest *write* to the memory through a
virtual address whose PTE has the C-bit set.

And I don't think the guest OS can be expected to rewrite all of its
memory at launch. :(

[Jiewen] That is fine.
I suggest we get clear on the threat model as the first step.
The threat model for SEV usage in OVMF.

I am sorry if that is already discussed before. I might ignore the conversation.
If you or any SEV owner can share the insight, that will be great.
See my question above "If that is the threat model, I have a question on the exposure:..."





> If so, we should not meet any problem, because we already disable BME
> in device driver's ExitBootService event.

I agree, but installing new page tables is not enough.

> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
> are opposites -- VT-d permits all DMA unless configured otherwise, while
> SEV forbids all DMA unless configured otherwise.
> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
> I setup translation table, but mark all entry to be NOT-PRESENT.
> I grant DMA access only if there is a specific request by a device.
>
> I open DMA access in ExitBootServices, just want to make sure it is compatible to
> the existing OS, which might not have knowledge on IOMMU.
> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
> turn off IOMMU, or keep it enabled at ExitBootService.
> An IOMMU aware OS may take over IOMMU directly and reprogram it.

Thanks for the clarification.

But, again, will this not lead to the possibility that the VT-d IOMMU's
ExitBootServices() handler disables all DMA *before* the PCI drivers get
a chance to run their own ExitBootServices() handlers, disabling the
individual devices?
[Jiewen] Sorry for the confusing. Let me explain:
No, VTd never disables *all* DMA buffer at ExitBootService.

"disable VTd" means to "disable IOMMU protection" and "allow all DMA".
"Keep VTd enabled" means to "keep IOMMU protection enabled" and
"only allow the DMA from Map() request".




If that happens, then a series of IOMMU faults could be generated, which
I described above. (I.e., such IOMMU faults could result, at least in
the case of SEV, in garbage being written to disk, via queued commands.)
[Jiewen] You are right. That would not happen.
IOMMU driver should not bring any compatibility problem for the PCI driver,
iff the PCI driver follows the UEFI specification.





>> 4)       About the driver you modified
>> I believe we have some other PCI device driver (NVMe/SDMMC/UFS), besides the ones you modified in this patch (ATA/UCHI/EHCI/XHCI).
>> If we need take this approach, I assume they also need update, right?
>
> Yes, they should be updated similarly.
>
> SDMMC and UFS are not included in OVMF, so I couldn't test their behavior.
>
> NVMe is included in OVMF, but it is used very rarely in practice (we can
> call it a corner case), so I thought I would first submit patches for
> the four main drivers (UHCI, EHCI, XHCI, AHCI) that are frequently used
> with OVMF and QEMU. Tracking down the CommonBuffer lifecycles was
> difficult :)
> [Jiewen] Understood. That is OK. We can handle that in separate patch.

Thanks!

Now, to finish up, here's an idea I just had.

Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
notification function?

If we are, then we could do the following:

* PciIo.Unmap() and friends would work as always (no restrictions on
  dynamic memory allocation or freeing, for any kind of DMA operation).

* PCI drivers would not be expected to call PciIo.Unmap() in their
  ExitBootServices() handlers.

* The IOMMU driver would have an ExitBootServices() notification
  function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
  (doesn't matter which).

* In this notification function, the IOMMU driver would signal *another*
  event (a private one). The notification function for this event would
  be enqueued strictly at the TPL_CALLBACK level.

* The notification function for the second event (private to the IOMMU)
  would iterate over all existent mappings, and unmap them, without
  allocating or freeing memory.

The key point is that by signaling the second event, the IOMMU driver
could order its own cleanup code after all other ExitBootServices()
callbacks. (Assuming, at least, that no other ExitBootServices()
callback employs the same trick to defer itself!) Therefore by the time
the second callback runs, all PCI devices have been halted, and it is
safe to tear down the translations.

The ordering would be ensured by the following:

- The UEFI spec says under CreateEventEx(), "All events are guaranteed
  to be signaled before the first notification action is taken." This
  means that, by the time the IOMMU's ExitBootServices() handler is
  entered, all other ExitBootServices() handlers have been *queued* at
  least, at TPL_CALLBACK or TPL_NOTIFY.

- Therefore, when we signal our second callback from there, for
  TPL_CALLBACK, the second callback will be queued at the end of the
  TPL_CALLBACK queue.

- CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
  functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
  for the Type argument of CreateEvent." So it wouldn't matter if a
  driver used CreateEvent() or CreateEventEx() for setting up its own
  handler, the handler would be queued just the same.

I think this is ugly and brittle, but perhaps the only way to clean up
*all* translations safely, with regard to PciIo.Unmap() +
ExitBootServices(), without updating the UEFI spec.

What do you think?

[Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
We do not need update PCI driver and we do not need update UEFI spec.
We only need update IOMMU driver which is concerned, based upon the threat model.
That probably is best solution. :-)

I assume you want to handle both common buffer and read/write buffer, right?
And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
If you or any SEV owner can share ...




Thanks,
Laszlo


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-06  4:37             ` Yao, Jiewen
@ 2017-09-06 12:14               ` Laszlo Ersek
  2017-09-06 15:39                 ` Brijesh Singh
  2017-09-07  4:34                 ` Yao, Jiewen
  0 siblings, 2 replies; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-06 12:14 UTC (permalink / raw)
  To: Yao, Jiewen, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric, Brijesh Singh

On 09/06/17 06:37, Yao, Jiewen wrote:
> Thanks for the clarification. Comment in line.
> 
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Wednesday, September 6, 2017 1:57 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
> Cc: Dong, Eric <eric.dong@intel.com>; Brijesh Singh <brijesh.singh@amd.com>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

>> Then after ExitBootService, the OS will take control of CR3 and set correct
>> encryption bit.
> 
> This is true, the guest OS will set up new page tables, and in those
> PTEs, the C-bit ("encrypt") will likely be set by default.
> 
> However, the guest OS will not *rewrite* all of the memory, with the
> C-bit set. This means that any memory that the firmware didn't actually
> re-encrypt (in the IOMMU driver) will still expose stale data to the
> hypervisor.
> [Jiewen] That is an interesting question.
> Is there any security concern associated?

I can't tell for sure. Answering this question is up to the proponents
of SEV, who have designed the threat model for SEV.

My operating assumption is that we should keep memory as tightly
encrypted as possible at the firmware --> OS control transfer. It could
be an exaggerated expectation from my side; I'd just better be safe than
sorry :)


> If the C-bit is cleared for a read/write buffer, is the content in the
> read/write buffer also exposed to hypervisor?

Not immediately. Only when the guest also rewrites the memory through
the appropriate virtual address.

Basically, in the virtual machine, the C-bit in a PTE that covers a
particular guest virtual address (GVA) controls whether a guest write
through that GVA will result in memory encryption, and if a gues read
through that GVA will result in memory decryption.

The current state of the C-bit doesn't matter for the hypervisor, what
matters is the last guest write through a GVA whose PTE had the C-bit
set, or clear. If the C-bit was clear when the guest last wrote the
area, then the hypervisor can read the data. If the C-bit was set, then
the hypervisor can only read garbage.


> I means if there is security concern, the concern should be applied to
> both common buffer and read/write buffer.
> Then we have to figure a way to resolve both buffer.

Yes, this is correct. The PciIo.Unmap operation, as currently
implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
correctly for all operations, but it can only guarantee *not* freeing
memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
is safe, while Unmap()ing Read/Write is not. The encryption would be
re-established just fine for Read/Write as well, but we would change the
UEFI memmap.

In OVMF, we currently manage this problem by making all asynchronous DMA
mappings CommonBuffer, even if they could othewise be Read or Write. We
use Read and Write only if the DMA operation completes before the
higher-level protocol function returns (so we can immediately Unmap the
operation, and the ExitBootServices() handler doesn't have to care).


> To be honest, that is exactly my biggest question on this patch:
> why do we only handle the common buffer specially?

For the following reason:

- Given that CommonBuffer mappings take a separate AllocateBuffer() /
FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
implementors -- beyond what the UEFI spec requries -- to keep *all*
dynamic memory management out of Map/Unmap. If they need dynamic memory
management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
This way Unmap is safe in ExitBootServices handlers.

- We cannot *reasonably ask* PciIo implementors to provide the same
guarantee for Read and Write mappings, simply because there are no other
APIs that could manage the dynamic memory for Map and Unmap --
AllocateBuffer and FreeBuffer are not required for Read and Write
mappings. PciIo implementors couldn't agree to our request even if they
wanted to. Therefore Unmapping Read/Write operations is inherently
unsafe in EBS handlers.


>> NOTE: The device should still be halt state, because the device is
>> also controlled by OS.
> 
> This is the key point -- the device is *not* controlled by the guest OS,
> in the worst case.
> 
> The virtual device is ultimately implemented by the hypervisor. We don't
> expect the virtual device (= the hypervisor) to mis-behave on purpose.
> However, if the hypervisor is compromised by an external attacker (for
> example over the network, or via privilege escalation from another
> guest), then all guest RAM that has not been encrypted up to that point
> becomes visible to the attacker. In other words, the hypervisor ("the
> device") becomes retro-actively malicious.
> [Jiewen] If that is the threat model, I have a question on the exposure:
> 1) If the concern is for existing data, it means all DMA data already written.
> However, the DMA data is already consumed or produced by virtual device
> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.

Maybe, maybe not. The data may have been sent to a different host over
the network, and wiped from memory.

Or, the data may have been written to a disk image that is separately
encrypted by the host OS, and has been detached (unplugged) from the
guest, and also unmounted from the host, with the disk key securely
wiped from host memory.

We can also imagine the reverse direction. Assume that the user of the
virtual machine goes into the UEFI shell in OVMF, starts the EDIT
program, and types some secret information into a text file on the ESP.
Further assume that the disk image is encrypted on the host OS. It is
conceivable that fragments of the secret could remain stuck in the AHCI
(disk) and USB (keyboard) DMA buffers.

Maybe you think that these are "made up" examples, and I agree, I just
made them up. The point is, it is not my place to discount possible
attack vectors (I haven't designed SEV). Attackers will be limited by
their *own* imaginations only, not by mine :)


> 2) if the concern is for future data, it means all user data to be written.
> However, the C-bit already prevent that.

As far as I understand SEV, future data is out of scope. The goal is to
prevent *retroactive* guest data leaks (= whatever is currently in host
OS memory) if an attacker compromises an otherwise non-malicious hypervisor.

If an attacker compromises a virtualization host, then they can just sit
silent and watch data flow into and out of guests from that point
onward, because they can then monitor all DMA (which always happens in
clear-text) real-time.


> Maybe I do not fully understand the threat model defined.
> If you can explain more, that would be great.

Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
correct me if I'm wrong.


> The point of SEV is to keep as much guest data encrypted at all times as
> possible, so that *whenever* the hypervisor is compromised by an
> attacker, the guest memory -- which the attacker can inevitably dump
> from the host side -- remains un-intellegible to the attacker.
> [Jiewen] OK. If this is a security question, I suggest we define a clear
> threat model at first.
> Or what we are doing might be unnecessary or insufficient.

I agree.

As I said above, my operating principle has been to re-encrypt all DMA
buffers as soon as possible. For long-standing DMA buffers, re-encrypt
them at ExitBootServices at the latest.


> [Jiewen] For "require that Unmap() work for CommonBuffer
> operations without releasing memory", I would hold my opinion until
> it is documented clearly in UEFI spec.

You are right, of course.

It's just that we cannot avoid exploring ways, for this feature, that
currently fall outside of the spec. Whatever we do here will be outside
of the spec, one way or another. I suggested what I thought could be a
reasonable extension to the spec, one that could be implemented by PciIo
implementors even before the spec is modified.

edk2's PciIo.Unmap already behaves like this, without breaking the spec.

If there's a better way -- for example modifying CoreExitBootServices()
in edk2, to signal IOMMU drivers separately, *after* notifying other
ExitBootServices() handlers --, that might work as well.


> My current concern is:
> First, this sentence is NOT defined in UEFI specification.

Correct.


> Second, it might break an existing implementation or existing feature, such as tracing.

Correct.

> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
> to call memory services.
> The only restriction is
> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
> 2) AP restriction, where no UEFI service/protocol is allowed for AP.

I agree.


> I'm just trying to operate with the APIs currently defined by the UEFI
> spec, and these assumptions were the best I could come up with.
> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
> Especially the IHV Option ROM should not consume any private API.

I disagree about "only follow". If there are additional requirements
that can be satisfied without breaking the spec, driver implementors can
choose to conform to both sets of requirements.

For example (if I understand correctly), Microsoft / Windows present a
bunch of requirements *beyond* the UEFI spec, for both platform and
add-on card firmware. Most vendors conform :)


>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>> guest memory should be encrypted again, even those areas that were once
>> used as CommonBuffers."
>> For SEV case, I think it should be controlled by OS, because it is just CR3.
> 
> If it was indeed just CR3, then I would fully agree with you.
> 
> But -- to my understanding --, ensuring that the memory is actually
> encrypted requires that the guest *write* to the memory through a
> virtual address whose PTE has the C-bit set.
> 
> And I don't think the guest OS can be expected to rewrite all of its
> memory at launch. :(
> 
> [Jiewen] That is fine.
> I suggest we get clear on the threat model as the first step.
> The threat model for SEV usage in OVMF.
> 
> I am sorry if that is already discussed before. I might ignore the conversation.

No problem; it's always good to re-investigate our assumptions and
operating principles.


> If you or any SEV owner can share the insight, that will be great.
> See my question above "If that is the threat model, I have a question on the exposure:..."

I shared *my* impressions of the threat model (which are somewhat
unclear, admittedly, and thus could make me overly cautious).

I hope Brijesh can extend and/or correct my description.


>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>> SEV forbids all DMA unless configured otherwise.
>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>> I setup translation table, but mark all entry to be NOT-PRESENT.
>> I grant DMA access only if there is a specific request by a device.
>>
>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>> the existing OS, which might not have knowledge on IOMMU.
>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>> turn off IOMMU, or keep it enabled at ExitBootService.
>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
> 
> Thanks for the clarification.
> 
> But, again, will this not lead to the possibility that the VT-d IOMMU's
> ExitBootServices() handler disables all DMA *before* the PCI drivers get
> a chance to run their own ExitBootServices() handlers, disabling the
> individual devices?
> [Jiewen] Sorry for the confusing. Let me explain:
> No, VTd never disables *all* DMA buffer at ExitBootService.
> 
> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
> "only allow the DMA from Map() request".

Okay, but this interpretation was exactly what I thought of first (see
above): "VT-d permits all DMA unless configured otherwise". You answered
that it wasn't the case.

So basically, if I understand it correctly now, at ExitBootServices the
VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?

That is equivalent to decrypting all memory under SEV, and is the exact
opposite of what we want. (As I understand it.)


> If that happens, then a series of IOMMU faults could be generated, which
> I described above. (I.e., such IOMMU faults could result, at least in
> the case of SEV, in garbage being written to disk, via queued commands.)
> [Jiewen] You are right. That would not happen.
> IOMMU driver should not bring any compatibility problem for the PCI driver,
> iff the PCI driver follows the UEFI specification.

Agreed.


> Now, to finish up, here's an idea I just had.
> 
> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
> notification function?
> 
> If we are, then we could do the following:
> 
> * PciIo.Unmap() and friends would work as always (no restrictions on
>   dynamic memory allocation or freeing, for any kind of DMA operation).
> 
> * PCI drivers would not be expected to call PciIo.Unmap() in their
>   ExitBootServices() handlers.
> 
> * The IOMMU driver would have an ExitBootServices() notification
>   function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>   (doesn't matter which).
> 
> * In this notification function, the IOMMU driver would signal *another*
>   event (a private one). The notification function for this event would
>   be enqueued strictly at the TPL_CALLBACK level.
> 
> * The notification function for the second event (private to the IOMMU)
>   would iterate over all existent mappings, and unmap them, without
>   allocating or freeing memory.
> 
> The key point is that by signaling the second event, the IOMMU driver
> could order its own cleanup code after all other ExitBootServices()
> callbacks. (Assuming, at least, that no other ExitBootServices()
> callback employs the same trick to defer itself!) Therefore by the time
> the second callback runs, all PCI devices have been halted, and it is
> safe to tear down the translations.
> 
> The ordering would be ensured by the following:
> 
> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>   to be signaled before the first notification action is taken." This
>   means that, by the time the IOMMU's ExitBootServices() handler is
>   entered, all other ExitBootServices() handlers have been *queued* at
>   least, at TPL_CALLBACK or TPL_NOTIFY.
> 
> - Therefore, when we signal our second callback from there, for
>   TPL_CALLBACK, the second callback will be queued at the end of the
>   TPL_CALLBACK queue.
> 
> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>   functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>   for the Type argument of CreateEvent." So it wouldn't matter if a
>   driver used CreateEvent() or CreateEventEx() for setting up its own
>   handler, the handler would be queued just the same.
> 
> I think this is ugly and brittle, but perhaps the only way to clean up
> *all* translations safely, with regard to PciIo.Unmap() +
> ExitBootServices(), without updating the UEFI spec.
> 
> What do you think?
> 
> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
> We do not need update PCI driver and we do not need update UEFI spec.
> We only need update IOMMU driver which is concerned, based upon the threat model.
> That probably is best solution. :-)

I'm very glad to hear that you like the idea.

However, it depends on whether we are permitted, by the UEFI spec, to
signal another event in an ExitBootServices() notification function.

Are we permitted to do that?

Does the UEFI spec guarantee that the notification function for the
*second* event will be queued just like it would be under "normal"
circumstances?

(I know we must not allocate or free memory in the notification function
of the *second* event either; I just want to know if the second event's
handler is *queued* like it would normally be.)


> I assume you want to handle both common buffer and read/write buffer, right?

Yes. Under this idea, all kinds of operations would be cleaned up.


> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
> If you or any SEV owner can share ...

Absolutely. Please see above.

Thank you!
Laszlo


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-06 12:14               ` Laszlo Ersek
@ 2017-09-06 15:39                 ` Brijesh Singh
  2017-09-07  4:46                   ` Yao, Jiewen
  2017-09-07  4:34                 ` Yao, Jiewen
  1 sibling, 1 reply; 21+ messages in thread
From: Brijesh Singh @ 2017-09-06 15:39 UTC (permalink / raw)
  To: Laszlo Ersek, Yao, Jiewen, Zeng, Star, edk2-devel-01
  Cc: brijesh.singh, Dong, Eric



On 09/06/2017 07:14 AM, Laszlo Ersek wrote:
> On 09/06/17 06:37, Yao, Jiewen wrote:
>> Thanks for the clarification. Comment in line.
>>
>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>> Sent: Wednesday, September 6, 2017 1:57 AM
>> To: Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
>> Cc: Dong, Eric <eric.dong@intel.com>; Brijesh Singh <brijesh.singh@amd.com>
>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
> 
>>> Then after ExitBootService, the OS will take control of CR3 and set correct
>>> encryption bit.
>>
>> This is true, the guest OS will set up new page tables, and in those
>> PTEs, the C-bit ("encrypt") will likely be set by default.
>>
>> However, the guest OS will not *rewrite* all of the memory, with the
>> C-bit set. This means that any memory that the firmware didn't actually
>> re-encrypt (in the IOMMU driver) will still expose stale data to the
>> hypervisor.
>> [Jiewen] That is an interesting question.
>> Is there any security concern associated?
> 
> I can't tell for sure. Answering this question is up to the proponents
> of SEV, who have designed the threat model for SEV.
> 
> My operating assumption is that we should keep memory as tightly
> encrypted as possible at the firmware --> OS control transfer. It could
> be an exaggerated expectation from my side; I'd just better be safe than
> sorry :)
> 
> 

Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then
we can discuss a security model (if needed).

SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have their
pages (code and data) secured such that only the guest itself has access to the
unencrypted version. Each encrypted VMs is associated with a unique encryption
key; if its data is accessed by a different entity using a different key the
encrypted guest data will be incorrectly decrypted, leading to unintelligible data.
You can also find some detail for SEV in white paper [1].

SEV encrypted Vs have the concept of private and shared memory. The private memory
is encrypted with the guest-specific key, while shared memory may be encrypted
with hypervisor key. SEV guest VMs can choose which pages they would like to
be private. But the instruction pages and guest page tables are always treated
as private by the hardware. The DMA operation inside the guest must be performed
on shared pages -- this is mainly because in virtualization world the device
DMA needs some assistance from the hypervisor.

The security model provided by the SEV ensure that hypervisor will no longer able
to inspect or alter any guest code or data. The guest OS controls what it want to
share with outside world (in this case with Hypervisor).

In software implementation we took approach to encrypt everything possible starting
early in boot. In this approach whenever OVMF wants to perform certain DMA operations
it allocate a shared page, issues the command, free the shared page after the command
is completed (in other word we use sw bounce buffer to complete the DMA operation).

We have implemented IOMMU driver to provide the following functions:

AllocateBuffer():
--------------------
it allocate a private pages, as per UEFI spec the driver will map the buffer allocated
from this routine using BusMasterCommonBuffer hence we allocate extra stash pages
in addition to requested pages.


Map
----
If requested operation is BusMasterRead or BusMasterWrite then we allocate a shared buffer
and DeviceAddress point to shared buffer.

If requested operation is BusMasterCommonBuffer then we perform in-place decryption of the
contents and update the page-table to clear the C-bit (basically make it shared page)

Unmap
------
If operation was BusMasterRead or BusMasterWrite then we complete the unmapping and free
the shared buffer allocated in Map().

If operation was BusMasterCommonBuffer then we perform in-place encryption and set the C-bit
(basically make the page private)

FreeBuffer:
-----------
Free the pages


Lets run with the below scenario:

1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA read)
2) hypervisor completes the request operation
3) hypervisor caches the shared physical address and monitor it for information leak
4) sometime later if guest write data in its "shared" physical address then hypervisor can
    easily read it without guest knowledge.

SEV hardware can protect us against the attack where someone tries to inject or alter the
guest code. As I noted above any instruction fetch is forced as private hence if attacker
write some code into a shared buffer and point the RIP to his/her code then instruction
fetch will try to decrypt it and get unintelligible opcode. If a page was marked as "shared"
then its guest responsibility to make it "private" after its done communicating with
hypervisor.

[1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf


>> If the C-bit is cleared for a read/write buffer, is the content in the
>> read/write buffer also exposed to hypervisor?
> 
> Not immediately. Only when the guest also rewrites the memory through
> the appropriate virtual address.
> 
> Basically, in the virtual machine, the C-bit in a PTE that covers a
> particular guest virtual address (GVA) controls whether a guest write
> through that GVA will result in memory encryption, and if a gues read
> through that GVA will result in memory decryption.
> 
> The current state of the C-bit doesn't matter for the hypervisor, what
> matters is the last guest write through a GVA whose PTE had the C-bit
> set, or clear. If the C-bit was clear when the guest last wrote the
> area, then the hypervisor can read the data. If the C-bit was set, then
> the hypervisor can only read garbage.
> 
> 
>> I means if there is security concern, the concern should be applied to
>> both common buffer and read/write buffer.
>> Then we have to figure a way to resolve both buffer.
> 
> Yes, this is correct. The PciIo.Unmap operation, as currently
> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
> correctly for all operations, but it can only guarantee *not* freeing
> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
> is safe, while Unmap()ing Read/Write is not. The encryption would be
> re-established just fine for Read/Write as well, but we would change the
> UEFI memmap.
> 
> In OVMF, we currently manage this problem by making all asynchronous DMA
> mappings CommonBuffer, even if they could othewise be Read or Write. We
> use Read and Write only if the DMA operation completes before the
> higher-level protocol function returns (so we can immediately Unmap the
> operation, and the ExitBootServices() handler doesn't have to care).
> 
> 
>> To be honest, that is exactly my biggest question on this patch:
>> why do we only handle the common buffer specially?
> 
> For the following reason:
> 
> - Given that CommonBuffer mappings take a separate AllocateBuffer() /
> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
> implementors -- beyond what the UEFI spec requries -- to keep *all*
> dynamic memory management out of Map/Unmap. If they need dynamic memory
> management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
> This way Unmap is safe in ExitBootServices handlers.
> 
> - We cannot *reasonably ask* PciIo implementors to provide the same
> guarantee for Read and Write mappings, simply because there are no other
> APIs that could manage the dynamic memory for Map and Unmap --
> AllocateBuffer and FreeBuffer are not required for Read and Write
> mappings. PciIo implementors couldn't agree to our request even if they
> wanted to. Therefore Unmapping Read/Write operations is inherently
> unsafe in EBS handlers.
> 
> 
>>> NOTE: The device should still be halt state, because the device is
>>> also controlled by OS.
>>
>> This is the key point -- the device is *not* controlled by the guest OS,
>> in the worst case.
>>
>> The virtual device is ultimately implemented by the hypervisor. We don't
>> expect the virtual device (= the hypervisor) to mis-behave on purpose.
>> However, if the hypervisor is compromised by an external attacker (for
>> example over the network, or via privilege escalation from another
>> guest), then all guest RAM that has not been encrypted up to that point
>> becomes visible to the attacker. In other words, the hypervisor ("the
>> device") becomes retro-actively malicious.
>> [Jiewen] If that is the threat model, I have a question on the exposure:
>> 1) If the concern is for existing data, it means all DMA data already written.
>> However, the DMA data is already consumed or produced by virtual device
>> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.
> 
> Maybe, maybe not. The data may have been sent to a different host over
> the network, and wiped from memory.
> 
> Or, the data may have been written to a disk image that is separately
> encrypted by the host OS, and has been detached (unplugged) from the
> guest, and also unmounted from the host, with the disk key securely
> wiped from host memory.
> 
> We can also imagine the reverse direction. Assume that the user of the
> virtual machine goes into the UEFI shell in OVMF, starts the EDIT
> program, and types some secret information into a text file on the ESP.
> Further assume that the disk image is encrypted on the host OS. It is
> conceivable that fragments of the secret could remain stuck in the AHCI
> (disk) and USB (keyboard) DMA buffers.
> 
> Maybe you think that these are "made up" examples, and I agree, I just
> made them up. The point is, it is not my place to discount possible
> attack vectors (I haven't designed SEV). Attackers will be limited by
> their *own* imaginations only, not by mine :)
> 
> 
>> 2) if the concern is for future data, it means all user data to be written.
>> However, the C-bit already prevent that.
> 
> As far as I understand SEV, future data is out of scope. The goal is to
> prevent *retroactive* guest data leaks (= whatever is currently in host
> OS memory) if an attacker compromises an otherwise non-malicious hypervisor.
> 
> If an attacker compromises a virtualization host, then they can just sit
> silent and watch data flow into and out of guests from that point
> onward, because they can then monitor all DMA (which always happens in
> clear-text) real-time.
> 
> 
>> Maybe I do not fully understand the threat model defined.
>> If you can explain more, that would be great.
> 
> Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
> correct me if I'm wrong.
> 
> 
>> The point of SEV is to keep as much guest data encrypted at all times as
>> possible, so that *whenever* the hypervisor is compromised by an
>> attacker, the guest memory -- which the attacker can inevitably dump
>> from the host side -- remains un-intellegible to the attacker.
>> [Jiewen] OK. If this is a security question, I suggest we define a clear
>> threat model at first.
>> Or what we are doing might be unnecessary or insufficient.
> 
> I agree.
> 
> As I said above, my operating principle has been to re-encrypt all DMA
> buffers as soon as possible. For long-standing DMA buffers, re-encrypt
> them at ExitBootServices at the latest.
> 
> 
>> [Jiewen] For "require that Unmap() work for CommonBuffer
>> operations without releasing memory", I would hold my opinion until
>> it is documented clearly in UEFI spec.
> 
> You are right, of course.
> 
> It's just that we cannot avoid exploring ways, for this feature, that
> currently fall outside of the spec. Whatever we do here will be outside
> of the spec, one way or another. I suggested what I thought could be a
> reasonable extension to the spec, one that could be implemented by PciIo
> implementors even before the spec is modified.
> 
> edk2's PciIo.Unmap already behaves like this, without breaking the spec.
> 
> If there's a better way -- for example modifying CoreExitBootServices()
> in edk2, to signal IOMMU drivers separately, *after* notifying other
> ExitBootServices() handlers --, that might work as well.
> 
> 
>> My current concern is:
>> First, this sentence is NOT defined in UEFI specification.
> 
> Correct.
> 
> 
>> Second, it might break an existing implementation or existing feature, such as tracing.
> 
> Correct.
> 
>> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
>> to call memory services.
>> The only restriction is
>> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
>> 2) AP restriction, where no UEFI service/protocol is allowed for AP.
> 
> I agree.
> 
> 
>> I'm just trying to operate with the APIs currently defined by the UEFI
>> spec, and these assumptions were the best I could come up with.
>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
>> Especially the IHV Option ROM should not consume any private API.
> 
> I disagree about "only follow". If there are additional requirements
> that can be satisfied without breaking the spec, driver implementors can
> choose to conform to both sets of requirements.
> 
> For example (if I understand correctly), Microsoft / Windows present a
> bunch of requirements *beyond* the UEFI spec, for both platform and
> add-on card firmware. Most vendors conform :)
> 
> 
>>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>>> guest memory should be encrypted again, even those areas that were once
>>> used as CommonBuffers."
>>> For SEV case, I think it should be controlled by OS, because it is just CR3.
>>
>> If it was indeed just CR3, then I would fully agree with you.
>>
>> But -- to my understanding --, ensuring that the memory is actually
>> encrypted requires that the guest *write* to the memory through a
>> virtual address whose PTE has the C-bit set.
>>
>> And I don't think the guest OS can be expected to rewrite all of its
>> memory at launch. :(
>>
>> [Jiewen] That is fine.
>> I suggest we get clear on the threat model as the first step.
>> The threat model for SEV usage in OVMF.
>>
>> I am sorry if that is already discussed before. I might ignore the conversation.
> 
> No problem; it's always good to re-investigate our assumptions and
> operating principles.
> 
> 
>> If you or any SEV owner can share the insight, that will be great.
>> See my question above "If that is the threat model, I have a question on the exposure:..."
> 
> I shared *my* impressions of the threat model (which are somewhat
> unclear, admittedly, and thus could make me overly cautious).
> 
> I hope Brijesh can extend and/or correct my description.
> 
> 
>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>>> SEV forbids all DMA unless configured otherwise.
>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>>> I setup translation table, but mark all entry to be NOT-PRESENT.
>>> I grant DMA access only if there is a specific request by a device.
>>>
>>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>>> the existing OS, which might not have knowledge on IOMMU.
>>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>>> turn off IOMMU, or keep it enabled at ExitBootService.
>>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>>
>> Thanks for the clarification.
>>
>> But, again, will this not lead to the possibility that the VT-d IOMMU's
>> ExitBootServices() handler disables all DMA *before* the PCI drivers get
>> a chance to run their own ExitBootServices() handlers, disabling the
>> individual devices?
>> [Jiewen] Sorry for the confusing. Let me explain:
>> No, VTd never disables *all* DMA buffer at ExitBootService.
>>
>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
>> "only allow the DMA from Map() request".
> 
> Okay, but this interpretation was exactly what I thought of first (see
> above): "VT-d permits all DMA unless configured otherwise". You answered
> that it wasn't the case.
> 
> So basically, if I understand it correctly now, at ExitBootServices the
> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
> 
> That is equivalent to decrypting all memory under SEV, and is the exact
> opposite of what we want. (As I understand it.)
> 
> 
>> If that happens, then a series of IOMMU faults could be generated, which
>> I described above. (I.e., such IOMMU faults could result, at least in
>> the case of SEV, in garbage being written to disk, via queued commands.)
>> [Jiewen] You are right. That would not happen.
>> IOMMU driver should not bring any compatibility problem for the PCI driver,
>> iff the PCI driver follows the UEFI specification.
> 
> Agreed.
> 
> 
>> Now, to finish up, here's an idea I just had.
>>
>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
>> notification function?
>>
>> If we are, then we could do the following:
>>
>> * PciIo.Unmap() and friends would work as always (no restrictions on
>>    dynamic memory allocation or freeing, for any kind of DMA operation).
>>
>> * PCI drivers would not be expected to call PciIo.Unmap() in their
>>    ExitBootServices() handlers.
>>
>> * The IOMMU driver would have an ExitBootServices() notification
>>    function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>>    (doesn't matter which).
>>
>> * In this notification function, the IOMMU driver would signal *another*
>>    event (a private one). The notification function for this event would
>>    be enqueued strictly at the TPL_CALLBACK level.
>>
>> * The notification function for the second event (private to the IOMMU)
>>    would iterate over all existent mappings, and unmap them, without
>>    allocating or freeing memory.
>>
>> The key point is that by signaling the second event, the IOMMU driver
>> could order its own cleanup code after all other ExitBootServices()
>> callbacks. (Assuming, at least, that no other ExitBootServices()
>> callback employs the same trick to defer itself!) Therefore by the time
>> the second callback runs, all PCI devices have been halted, and it is
>> safe to tear down the translations.
>>
>> The ordering would be ensured by the following:
>>
>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>>    to be signaled before the first notification action is taken." This
>>    means that, by the time the IOMMU's ExitBootServices() handler is
>>    entered, all other ExitBootServices() handlers have been *queued* at
>>    least, at TPL_CALLBACK or TPL_NOTIFY.
>>
>> - Therefore, when we signal our second callback from there, for
>>    TPL_CALLBACK, the second callback will be queued at the end of the
>>    TPL_CALLBACK queue.
>>
>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>>    functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>>    for the Type argument of CreateEvent." So it wouldn't matter if a
>>    driver used CreateEvent() or CreateEventEx() for setting up its own
>>    handler, the handler would be queued just the same.
>>
>> I think this is ugly and brittle, but perhaps the only way to clean up
>> *all* translations safely, with regard to PciIo.Unmap() +
>> ExitBootServices(), without updating the UEFI spec.
>>
>> What do you think?
>>
>> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
>> We do not need update PCI driver and we do not need update UEFI spec.
>> We only need update IOMMU driver which is concerned, based upon the threat model.
>> That probably is best solution. :-)
> 
> I'm very glad to hear that you like the idea.
> 
> However, it depends on whether we are permitted, by the UEFI spec, to
> signal another event in an ExitBootServices() notification function.
> 
> Are we permitted to do that?
> 
> Does the UEFI spec guarantee that the notification function for the
> *second* event will be queued just like it would be under "normal"
> circumstances?
> 
> (I know we must not allocate or free memory in the notification function
> of the *second* event either; I just want to know if the second event's
> handler is *queued* like it would normally be.)
> 
> 
>> I assume you want to handle both common buffer and read/write buffer, right?
> 
> Yes. Under this idea, all kinds of operations would be cleaned up.
> 
> 
>> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
>> If you or any SEV owner can share ...
> 
> Absolutely. Please see above.
> 
> Thank you!
> Laszlo
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-06 12:14               ` Laszlo Ersek
  2017-09-06 15:39                 ` Brijesh Singh
@ 2017-09-07  4:34                 ` Yao, Jiewen
  2017-09-07 12:11                   ` Laszlo Ersek
  1 sibling, 1 reply; 21+ messages in thread
From: Yao, Jiewen @ 2017-09-07  4:34 UTC (permalink / raw)
  To: Laszlo Ersek, Zeng, Star, edk2-devel-01; +Cc: Brijesh Singh, Dong, Eric

Thanks.
It seems the discussion becomes too long. I try to make it short.


1)       As long as we use same mechanism to handle both common buffer and read/write buffer. I have no concern at all. :)


2)       I am sorry that I do not have an immediate answer for your question on event handling in ExitBootServices.
Although it seems tricky, I believe it works. Can you have a try?


3)       Looking at the code (DxeMain.c), I have another idea for your consideration only.

  //
  // Notify other drivers that we are exiting boot services.
  //
  CoreNotifySignalList (&gEfiEventExitBootServicesGuid);

  //
  // Report that ExitBootServices() has been called
  //
  REPORT_STATUS_CODE (
    EFI_PROGRESS_CODE,
    (EFI_SOFTWARE_EFI_BOOT_SERVICE | EFI_SW_BS_PC_EXIT_BOOT_SERVICES)
    );

The cleanup driver may register a report status code handle to do the cleanup. :)

We already have such example in MdeModulePkg\Universal\Acpi\FirmwarePerformanceDataTableDxe\ FirmwarePerformanceDxe.c.
FpdtStatusCodeListenerDxe()

  } else if (Value == (EFI_SOFTWARE_EFI_BOOT_SERVICE | EFI_SW_BS_PC_EXIT_BOOT_SERVICES)) {
......

The requirement will become: If a platform wants to add cleanup, it must use a real report status code library.

Thank you
Yao Jiewen



From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Wednesday, September 6, 2017 8:15 PM
To: Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>; Dong, Eric <eric.dong@intel.com>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

On 09/06/17 06:37, Yao, Jiewen wrote:
> Thanks for the clarification. Comment in line.
>
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Wednesday, September 6, 2017 1:57 AM
> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>; Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

>> Then after ExitBootService, the OS will take control of CR3 and set correct
>> encryption bit.
>
> This is true, the guest OS will set up new page tables, and in those
> PTEs, the C-bit ("encrypt") will likely be set by default.
>
> However, the guest OS will not *rewrite* all of the memory, with the
> C-bit set. This means that any memory that the firmware didn't actually
> re-encrypt (in the IOMMU driver) will still expose stale data to the
> hypervisor.
> [Jiewen] That is an interesting question.
> Is there any security concern associated?

I can't tell for sure. Answering this question is up to the proponents
of SEV, who have designed the threat model for SEV.

My operating assumption is that we should keep memory as tightly
encrypted as possible at the firmware --> OS control transfer. It could
be an exaggerated expectation from my side; I'd just better be safe than
sorry :)


> If the C-bit is cleared for a read/write buffer, is the content in the
> read/write buffer also exposed to hypervisor?

Not immediately. Only when the guest also rewrites the memory through
the appropriate virtual address.

Basically, in the virtual machine, the C-bit in a PTE that covers a
particular guest virtual address (GVA) controls whether a guest write
through that GVA will result in memory encryption, and if a gues read
through that GVA will result in memory decryption.

The current state of the C-bit doesn't matter for the hypervisor, what
matters is the last guest write through a GVA whose PTE had the C-bit
set, or clear. If the C-bit was clear when the guest last wrote the
area, then the hypervisor can read the data. If the C-bit was set, then
the hypervisor can only read garbage.


> I means if there is security concern, the concern should be applied to
> both common buffer and read/write buffer.
> Then we have to figure a way to resolve both buffer.

Yes, this is correct. The PciIo.Unmap operation, as currently
implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
correctly for all operations, but it can only guarantee *not* freeing
memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
is safe, while Unmap()ing Read/Write is not. The encryption would be
re-established just fine for Read/Write as well, but we would change the
UEFI memmap.

In OVMF, we currently manage this problem by making all asynchronous DMA
mappings CommonBuffer, even if they could othewise be Read or Write. We
use Read and Write only if the DMA operation completes before the
higher-level protocol function returns (so we can immediately Unmap the
operation, and the ExitBootServices() handler doesn't have to care).


> To be honest, that is exactly my biggest question on this patch:
> why do we only handle the common buffer specially?

For the following reason:

- Given that CommonBuffer mappings take a separate AllocateBuffer() /
FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
implementors -- beyond what the UEFI spec requries -- to keep *all*
dynamic memory management out of Map/Unmap. If they need dynamic memory
management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
This way Unmap is safe in ExitBootServices handlers.

- We cannot *reasonably ask* PciIo implementors to provide the same
guarantee for Read and Write mappings, simply because there are no other
APIs that could manage the dynamic memory for Map and Unmap --
AllocateBuffer and FreeBuffer are not required for Read and Write
mappings. PciIo implementors couldn't agree to our request even if they
wanted to. Therefore Unmapping Read/Write operations is inherently
unsafe in EBS handlers.


>> NOTE: The device should still be halt state, because the device is
>> also controlled by OS.
>
> This is the key point -- the device is *not* controlled by the guest OS,
> in the worst case.
>
> The virtual device is ultimately implemented by the hypervisor. We don't
> expect the virtual device (= the hypervisor) to mis-behave on purpose.
> However, if the hypervisor is compromised by an external attacker (for
> example over the network, or via privilege escalation from another
> guest), then all guest RAM that has not been encrypted up to that point
> becomes visible to the attacker. In other words, the hypervisor ("the
> device") becomes retro-actively malicious.
> [Jiewen] If that is the threat model, I have a question on the exposure:
> 1) If the concern is for existing data, it means all DMA data already written.
> However, the DMA data is already consumed or produced by virtual device
> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.

Maybe, maybe not. The data may have been sent to a different host over
the network, and wiped from memory.

Or, the data may have been written to a disk image that is separately
encrypted by the host OS, and has been detached (unplugged) from the
guest, and also unmounted from the host, with the disk key securely
wiped from host memory.

We can also imagine the reverse direction. Assume that the user of the
virtual machine goes into the UEFI shell in OVMF, starts the EDIT
program, and types some secret information into a text file on the ESP.
Further assume that the disk image is encrypted on the host OS. It is
conceivable that fragments of the secret could remain stuck in the AHCI
(disk) and USB (keyboard) DMA buffers.

Maybe you think that these are "made up" examples, and I agree, I just
made them up. The point is, it is not my place to discount possible
attack vectors (I haven't designed SEV). Attackers will be limited by
their *own* imaginations only, not by mine :)


> 2) if the concern is for future data, it means all user data to be written.
> However, the C-bit already prevent that.

As far as I understand SEV, future data is out of scope. The goal is to
prevent *retroactive* guest data leaks (= whatever is currently in host
OS memory) if an attacker compromises an otherwise non-malicious hypervisor.

If an attacker compromises a virtualization host, then they can just sit
silent and watch data flow into and out of guests from that point
onward, because they can then monitor all DMA (which always happens in
clear-text) real-time.


> Maybe I do not fully understand the threat model defined.
> If you can explain more, that would be great.

Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
correct me if I'm wrong.


> The point of SEV is to keep as much guest data encrypted at all times as
> possible, so that *whenever* the hypervisor is compromised by an
> attacker, the guest memory -- which the attacker can inevitably dump
> from the host side -- remains un-intellegible to the attacker.
> [Jiewen] OK. If this is a security question, I suggest we define a clear
> threat model at first.
> Or what we are doing might be unnecessary or insufficient.

I agree.

As I said above, my operating principle has been to re-encrypt all DMA
buffers as soon as possible. For long-standing DMA buffers, re-encrypt
them at ExitBootServices at the latest.


> [Jiewen] For "require that Unmap() work for CommonBuffer
> operations without releasing memory", I would hold my opinion until
> it is documented clearly in UEFI spec.

You are right, of course.

It's just that we cannot avoid exploring ways, for this feature, that
currently fall outside of the spec. Whatever we do here will be outside
of the spec, one way or another. I suggested what I thought could be a
reasonable extension to the spec, one that could be implemented by PciIo
implementors even before the spec is modified.

edk2's PciIo.Unmap already behaves like this, without breaking the spec.

If there's a better way -- for example modifying CoreExitBootServices()
in edk2, to signal IOMMU drivers separately, *after* notifying other
ExitBootServices() handlers --, that might work as well.


> My current concern is:
> First, this sentence is NOT defined in UEFI specification.

Correct.


> Second, it might break an existing implementation or existing feature, such as tracing.

Correct.

> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
> to call memory services.
> The only restriction is
> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
> 2) AP restriction, where no UEFI service/protocol is allowed for AP.

I agree.


> I'm just trying to operate with the APIs currently defined by the UEFI
> spec, and these assumptions were the best I could come up with.
> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
> Especially the IHV Option ROM should not consume any private API.

I disagree about "only follow". If there are additional requirements
that can be satisfied without breaking the spec, driver implementors can
choose to conform to both sets of requirements.

For example (if I understand correctly), Microsoft / Windows present a
bunch of requirements *beyond* the UEFI spec, for both platform and
add-on card firmware. Most vendors conform :)


>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>> guest memory should be encrypted again, even those areas that were once
>> used as CommonBuffers."
>> For SEV case, I think it should be controlled by OS, because it is just CR3.
>
> If it was indeed just CR3, then I would fully agree with you.
>
> But -- to my understanding --, ensuring that the memory is actually
> encrypted requires that the guest *write* to the memory through a
> virtual address whose PTE has the C-bit set.
>
> And I don't think the guest OS can be expected to rewrite all of its
> memory at launch. :(
>
> [Jiewen] That is fine.
> I suggest we get clear on the threat model as the first step.
> The threat model for SEV usage in OVMF.
>
> I am sorry if that is already discussed before. I might ignore the conversation.

No problem; it's always good to re-investigate our assumptions and
operating principles.


> If you or any SEV owner can share the insight, that will be great.
> See my question above "If that is the threat model, I have a question on the exposure:..."

I shared *my* impressions of the threat model (which are somewhat
unclear, admittedly, and thus could make me overly cautious).

I hope Brijesh can extend and/or correct my description.


>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>> SEV forbids all DMA unless configured otherwise.
>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>> I setup translation table, but mark all entry to be NOT-PRESENT.
>> I grant DMA access only if there is a specific request by a device.
>>
>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>> the existing OS, which might not have knowledge on IOMMU.
>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>> turn off IOMMU, or keep it enabled at ExitBootService.
>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>
> Thanks for the clarification.
>
> But, again, will this not lead to the possibility that the VT-d IOMMU's
> ExitBootServices() handler disables all DMA *before* the PCI drivers get
> a chance to run their own ExitBootServices() handlers, disabling the
> individual devices?
> [Jiewen] Sorry for the confusing. Let me explain:
> No, VTd never disables *all* DMA buffer at ExitBootService.
>
> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
> "only allow the DMA from Map() request".

Okay, but this interpretation was exactly what I thought of first (see
above): "VT-d permits all DMA unless configured otherwise". You answered
that it wasn't the case.

So basically, if I understand it correctly now, at ExitBootServices the
VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?

That is equivalent to decrypting all memory under SEV, and is the exact
opposite of what we want. (As I understand it.)


> If that happens, then a series of IOMMU faults could be generated, which
> I described above. (I.e., such IOMMU faults could result, at least in
> the case of SEV, in garbage being written to disk, via queued commands.)
> [Jiewen] You are right. That would not happen.
> IOMMU driver should not bring any compatibility problem for the PCI driver,
> iff the PCI driver follows the UEFI specification.

Agreed.


> Now, to finish up, here's an idea I just had.
>
> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
> notification function?
>
> If we are, then we could do the following:
>
> * PciIo.Unmap() and friends would work as always (no restrictions on
>   dynamic memory allocation or freeing, for any kind of DMA operation).
>
> * PCI drivers would not be expected to call PciIo.Unmap() in their
>   ExitBootServices() handlers.
>
> * The IOMMU driver would have an ExitBootServices() notification
>   function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>   (doesn't matter which).
>
> * In this notification function, the IOMMU driver would signal *another*
>   event (a private one). The notification function for this event would
>   be enqueued strictly at the TPL_CALLBACK level.
>
> * The notification function for the second event (private to the IOMMU)
>   would iterate over all existent mappings, and unmap them, without
>   allocating or freeing memory.
>
> The key point is that by signaling the second event, the IOMMU driver
> could order its own cleanup code after all other ExitBootServices()
> callbacks. (Assuming, at least, that no other ExitBootServices()
> callback employs the same trick to defer itself!) Therefore by the time
> the second callback runs, all PCI devices have been halted, and it is
> safe to tear down the translations.
>
> The ordering would be ensured by the following:
>
> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>   to be signaled before the first notification action is taken." This
>   means that, by the time the IOMMU's ExitBootServices() handler is
>   entered, all other ExitBootServices() handlers have been *queued* at
>   least, at TPL_CALLBACK or TPL_NOTIFY.
>
> - Therefore, when we signal our second callback from there, for
>   TPL_CALLBACK, the second callback will be queued at the end of the
>   TPL_CALLBACK queue.
>
> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>   functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>   for the Type argument of CreateEvent." So it wouldn't matter if a
>   driver used CreateEvent() or CreateEventEx() for setting up its own
>   handler, the handler would be queued just the same.
>
> I think this is ugly and brittle, but perhaps the only way to clean up
> *all* translations safely, with regard to PciIo.Unmap() +
> ExitBootServices(), without updating the UEFI spec.
>
> What do you think?
>
> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
> We do not need update PCI driver and we do not need update UEFI spec.
> We only need update IOMMU driver which is concerned, based upon the threat model.
> That probably is best solution. :-)

I'm very glad to hear that you like the idea.

However, it depends on whether we are permitted, by the UEFI spec, to
signal another event in an ExitBootServices() notification function.

Are we permitted to do that?

Does the UEFI spec guarantee that the notification function for the
*second* event will be queued just like it would be under "normal"
circumstances?

(I know we must not allocate or free memory in the notification function
of the *second* event either; I just want to know if the second event's
handler is *queued* like it would normally be.)


> I assume you want to handle both common buffer and read/write buffer, right?

Yes. Under this idea, all kinds of operations would be cleaned up.


> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
> If you or any SEV owner can share ...

Absolutely. Please see above.

Thank you!
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-06 15:39                 ` Brijesh Singh
@ 2017-09-07  4:46                   ` Yao, Jiewen
  2017-09-07 11:46                     ` Laszlo Ersek
  0 siblings, 1 reply; 21+ messages in thread
From: Yao, Jiewen @ 2017-09-07  4:46 UTC (permalink / raw)
  To: Brijesh Singh, Laszlo Ersek, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric

Thanks for the sharing, Brijesh.

"If a page was marked as "shared"
then its guest responsibility to make it "private" after its done communicating with
hypervisor."

I believe I have same understanding - a *guest* should guarantee that.

My question is: During the *transition* from firmware to OS, *which guest* should make the shared buffer to be private? Is it "guest firmware" or "guest OS"?

Maybe I can ask the specific question to get it more clear.

1)       If the common DMA buffer is not unmapped at ExitBootService, is that treated as an issue?

2)       If the read/write DMA buffer is not unmapped at ExitBootService, is that treated as an issue?

Thank you
Yao Jiewen


From: Brijesh Singh [mailto:brijesh.singh@amd.com]
Sent: Wednesday, September 6, 2017 11:40 PM
To: Laszlo Ersek <lersek@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
Cc: brijesh.singh@amd.com; Dong, Eric <eric.dong@intel.com>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()



On 09/06/2017 07:14 AM, Laszlo Ersek wrote:
> On 09/06/17 06:37, Yao, Jiewen wrote:
>> Thanks for the clarification. Comment in line.
>>
>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>> Sent: Wednesday, September 6, 2017 1:57 AM
>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>; Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>>
>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>
>>> Then after ExitBootService, the OS will take control of CR3 and set correct
>>> encryption bit.
>>
>> This is true, the guest OS will set up new page tables, and in those
>> PTEs, the C-bit ("encrypt") will likely be set by default.
>>
>> However, the guest OS will not *rewrite* all of the memory, with the
>> C-bit set. This means that any memory that the firmware didn't actually
>> re-encrypt (in the IOMMU driver) will still expose stale data to the
>> hypervisor.
>> [Jiewen] That is an interesting question.
>> Is there any security concern associated?
>
> I can't tell for sure. Answering this question is up to the proponents
> of SEV, who have designed the threat model for SEV.
>
> My operating assumption is that we should keep memory as tightly
> encrypted as possible at the firmware --> OS control transfer. It could
> be an exaggerated expectation from my side; I'd just better be safe than
> sorry :)
>
>

Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then
we can discuss a security model (if needed).

SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have their
pages (code and data) secured such that only the guest itself has access to the
unencrypted version. Each encrypted VMs is associated with a unique encryption
key; if its data is accessed by a different entity using a different key the
encrypted guest data will be incorrectly decrypted, leading to unintelligible data.
You can also find some detail for SEV in white paper [1].

SEV encrypted Vs have the concept of private and shared memory. The private memory
is encrypted with the guest-specific key, while shared memory may be encrypted
with hypervisor key. SEV guest VMs can choose which pages they would like to
be private. But the instruction pages and guest page tables are always treated
as private by the hardware. The DMA operation inside the guest must be performed
on shared pages -- this is mainly because in virtualization world the device
DMA needs some assistance from the hypervisor.

The security model provided by the SEV ensure that hypervisor will no longer able
to inspect or alter any guest code or data. The guest OS controls what it want to
share with outside world (in this case with Hypervisor).

In software implementation we took approach to encrypt everything possible starting
early in boot. In this approach whenever OVMF wants to perform certain DMA operations
it allocate a shared page, issues the command, free the shared page after the command
is completed (in other word we use sw bounce buffer to complete the DMA operation).

We have implemented IOMMU driver to provide the following functions:

AllocateBuffer():
--------------------
it allocate a private pages, as per UEFI spec the driver will map the buffer allocated
from this routine using BusMasterCommonBuffer hence we allocate extra stash pages
in addition to requested pages.


Map
----
If requested operation is BusMasterRead or BusMasterWrite then we allocate a shared buffer
and DeviceAddress point to shared buffer.

If requested operation is BusMasterCommonBuffer then we perform in-place decryption of the
contents and update the page-table to clear the C-bit (basically make it shared page)

Unmap
------
If operation was BusMasterRead or BusMasterWrite then we complete the unmapping and free
the shared buffer allocated in Map().

If operation was BusMasterCommonBuffer then we perform in-place encryption and set the C-bit
(basically make the page private)

FreeBuffer:
-----------
Free the pages


Lets run with the below scenario:

1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA read)
2) hypervisor completes the request operation
3) hypervisor caches the shared physical address and monitor it for information leak
4) sometime later if guest write data in its "shared" physical address then hypervisor can
    easily read it without guest knowledge.

SEV hardware can protect us against the attack where someone tries to inject or alter the
guest code. As I noted above any instruction fetch is forced as private hence if attacker
write some code into a shared buffer and point the RIP to his/her code then instruction
fetch will try to decrypt it and get unintelligible opcode. If a page was marked as "shared"
then its guest responsibility to make it "private" after its done communicating with
hypervisor.

[1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf


>> If the C-bit is cleared for a read/write buffer, is the content in the
>> read/write buffer also exposed to hypervisor?
>
> Not immediately. Only when the guest also rewrites the memory through
> the appropriate virtual address.
>
> Basically, in the virtual machine, the C-bit in a PTE that covers a
> particular guest virtual address (GVA) controls whether a guest write
> through that GVA will result in memory encryption, and if a gues read
> through that GVA will result in memory decryption.
>
> The current state of the C-bit doesn't matter for the hypervisor, what
> matters is the last guest write through a GVA whose PTE had the C-bit
> set, or clear. If the C-bit was clear when the guest last wrote the
> area, then the hypervisor can read the data. If the C-bit was set, then
> the hypervisor can only read garbage.
>
>
>> I means if there is security concern, the concern should be applied to
>> both common buffer and read/write buffer.
>> Then we have to figure a way to resolve both buffer.
>
> Yes, this is correct. The PciIo.Unmap operation, as currently
> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
> correctly for all operations, but it can only guarantee *not* freeing
> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
> is safe, while Unmap()ing Read/Write is not. The encryption would be
> re-established just fine for Read/Write as well, but we would change the
> UEFI memmap.
>
> In OVMF, we currently manage this problem by making all asynchronous DMA
> mappings CommonBuffer, even if they could othewise be Read or Write. We
> use Read and Write only if the DMA operation completes before the
> higher-level protocol function returns (so we can immediately Unmap the
> operation, and the ExitBootServices() handler doesn't have to care).
>
>
>> To be honest, that is exactly my biggest question on this patch:
>> why do we only handle the common buffer specially?
>
> For the following reason:
>
> - Given that CommonBuffer mappings take a separate AllocateBuffer() /
> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
> implementors -- beyond what the UEFI spec requries -- to keep *all*
> dynamic memory management out of Map/Unmap. If they need dynamic memory
> management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
> This way Unmap is safe in ExitBootServices handlers.
>
> - We cannot *reasonably ask* PciIo implementors to provide the same
> guarantee for Read and Write mappings, simply because there are no other
> APIs that could manage the dynamic memory for Map and Unmap --
> AllocateBuffer and FreeBuffer are not required for Read and Write
> mappings. PciIo implementors couldn't agree to our request even if they
> wanted to. Therefore Unmapping Read/Write operations is inherently
> unsafe in EBS handlers.
>
>
>>> NOTE: The device should still be halt state, because the device is
>>> also controlled by OS.
>>
>> This is the key point -- the device is *not* controlled by the guest OS,
>> in the worst case.
>>
>> The virtual device is ultimately implemented by the hypervisor. We don't
>> expect the virtual device (= the hypervisor) to mis-behave on purpose.
>> However, if the hypervisor is compromised by an external attacker (for
>> example over the network, or via privilege escalation from another
>> guest), then all guest RAM that has not been encrypted up to that point
>> becomes visible to the attacker. In other words, the hypervisor ("the
>> device") becomes retro-actively malicious.
>> [Jiewen] If that is the threat model, I have a question on the exposure:
>> 1) If the concern is for existing data, it means all DMA data already written.
>> However, the DMA data is already consumed or produced by virtual device
>> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.
>
> Maybe, maybe not. The data may have been sent to a different host over
> the network, and wiped from memory.
>
> Or, the data may have been written to a disk image that is separately
> encrypted by the host OS, and has been detached (unplugged) from the
> guest, and also unmounted from the host, with the disk key securely
> wiped from host memory.
>
> We can also imagine the reverse direction. Assume that the user of the
> virtual machine goes into the UEFI shell in OVMF, starts the EDIT
> program, and types some secret information into a text file on the ESP.
> Further assume that the disk image is encrypted on the host OS. It is
> conceivable that fragments of the secret could remain stuck in the AHCI
> (disk) and USB (keyboard) DMA buffers.
>
> Maybe you think that these are "made up" examples, and I agree, I just
> made them up. The point is, it is not my place to discount possible
> attack vectors (I haven't designed SEV). Attackers will be limited by
> their *own* imaginations only, not by mine :)
>
>
>> 2) if the concern is for future data, it means all user data to be written.
>> However, the C-bit already prevent that.
>
> As far as I understand SEV, future data is out of scope. The goal is to
> prevent *retroactive* guest data leaks (= whatever is currently in host
> OS memory) if an attacker compromises an otherwise non-malicious hypervisor.
>
> If an attacker compromises a virtualization host, then they can just sit
> silent and watch data flow into and out of guests from that point
> onward, because they can then monitor all DMA (which always happens in
> clear-text) real-time.
>
>
>> Maybe I do not fully understand the threat model defined.
>> If you can explain more, that would be great.
>
> Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
> correct me if I'm wrong.
>
>
>> The point of SEV is to keep as much guest data encrypted at all times as
>> possible, so that *whenever* the hypervisor is compromised by an
>> attacker, the guest memory -- which the attacker can inevitably dump
>> from the host side -- remains un-intellegible to the attacker.
>> [Jiewen] OK. If this is a security question, I suggest we define a clear
>> threat model at first.
>> Or what we are doing might be unnecessary or insufficient.
>
> I agree.
>
> As I said above, my operating principle has been to re-encrypt all DMA
> buffers as soon as possible. For long-standing DMA buffers, re-encrypt
> them at ExitBootServices at the latest.
>
>
>> [Jiewen] For "require that Unmap() work for CommonBuffer
>> operations without releasing memory", I would hold my opinion until
>> it is documented clearly in UEFI spec.
>
> You are right, of course.
>
> It's just that we cannot avoid exploring ways, for this feature, that
> currently fall outside of the spec. Whatever we do here will be outside
> of the spec, one way or another. I suggested what I thought could be a
> reasonable extension to the spec, one that could be implemented by PciIo
> implementors even before the spec is modified.
>
> edk2's PciIo.Unmap already behaves like this, without breaking the spec.
>
> If there's a better way -- for example modifying CoreExitBootServices()
> in edk2, to signal IOMMU drivers separately, *after* notifying other
> ExitBootServices() handlers --, that might work as well.
>
>
>> My current concern is:
>> First, this sentence is NOT defined in UEFI specification.
>
> Correct.
>
>
>> Second, it might break an existing implementation or existing feature, such as tracing.
>
> Correct.
>
>> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
>> to call memory services.
>> The only restriction is
>> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
>> 2) AP restriction, where no UEFI service/protocol is allowed for AP.
>
> I agree.
>
>
>> I'm just trying to operate with the APIs currently defined by the UEFI
>> spec, and these assumptions were the best I could come up with.
>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
>> Especially the IHV Option ROM should not consume any private API.
>
> I disagree about "only follow". If there are additional requirements
> that can be satisfied without breaking the spec, driver implementors can
> choose to conform to both sets of requirements.
>
> For example (if I understand correctly), Microsoft / Windows present a
> bunch of requirements *beyond* the UEFI spec, for both platform and
> add-on card firmware. Most vendors conform :)
>
>
>>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>>> guest memory should be encrypted again, even those areas that were once
>>> used as CommonBuffers."
>>> For SEV case, I think it should be controlled by OS, because it is just CR3.
>>
>> If it was indeed just CR3, then I would fully agree with you.
>>
>> But -- to my understanding --, ensuring that the memory is actually
>> encrypted requires that the guest *write* to the memory through a
>> virtual address whose PTE has the C-bit set.
>>
>> And I don't think the guest OS can be expected to rewrite all of its
>> memory at launch. :(
>>
>> [Jiewen] That is fine.
>> I suggest we get clear on the threat model as the first step.
>> The threat model for SEV usage in OVMF.
>>
>> I am sorry if that is already discussed before. I might ignore the conversation.
>
> No problem; it's always good to re-investigate our assumptions and
> operating principles.
>
>
>> If you or any SEV owner can share the insight, that will be great.
>> See my question above "If that is the threat model, I have a question on the exposure:..."
>
> I shared *my* impressions of the threat model (which are somewhat
> unclear, admittedly, and thus could make me overly cautious).
>
> I hope Brijesh can extend and/or correct my description.
>
>
>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>>> SEV forbids all DMA unless configured otherwise.
>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>>> I setup translation table, but mark all entry to be NOT-PRESENT.
>>> I grant DMA access only if there is a specific request by a device.
>>>
>>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>>> the existing OS, which might not have knowledge on IOMMU.
>>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>>> turn off IOMMU, or keep it enabled at ExitBootService.
>>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>>
>> Thanks for the clarification.
>>
>> But, again, will this not lead to the possibility that the VT-d IOMMU's
>> ExitBootServices() handler disables all DMA *before* the PCI drivers get
>> a chance to run their own ExitBootServices() handlers, disabling the
>> individual devices?
>> [Jiewen] Sorry for the confusing. Let me explain:
>> No, VTd never disables *all* DMA buffer at ExitBootService.
>>
>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
>> "only allow the DMA from Map() request".
>
> Okay, but this interpretation was exactly what I thought of first (see
> above): "VT-d permits all DMA unless configured otherwise". You answered
> that it wasn't the case.
>
> So basically, if I understand it correctly now, at ExitBootServices the
> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
>
> That is equivalent to decrypting all memory under SEV, and is the exact
> opposite of what we want. (As I understand it.)
>
>
>> If that happens, then a series of IOMMU faults could be generated, which
>> I described above. (I.e., such IOMMU faults could result, at least in
>> the case of SEV, in garbage being written to disk, via queued commands.)
>> [Jiewen] You are right. That would not happen.
>> IOMMU driver should not bring any compatibility problem for the PCI driver,
>> iff the PCI driver follows the UEFI specification.
>
> Agreed.
>
>
>> Now, to finish up, here's an idea I just had.
>>
>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
>> notification function?
>>
>> If we are, then we could do the following:
>>
>> * PciIo.Unmap() and friends would work as always (no restrictions on
>>    dynamic memory allocation or freeing, for any kind of DMA operation).
>>
>> * PCI drivers would not be expected to call PciIo.Unmap() in their
>>    ExitBootServices() handlers.
>>
>> * The IOMMU driver would have an ExitBootServices() notification
>>    function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>>    (doesn't matter which).
>>
>> * In this notification function, the IOMMU driver would signal *another*
>>    event (a private one). The notification function for this event would
>>    be enqueued strictly at the TPL_CALLBACK level.
>>
>> * The notification function for the second event (private to the IOMMU)
>>    would iterate over all existent mappings, and unmap them, without
>>    allocating or freeing memory.
>>
>> The key point is that by signaling the second event, the IOMMU driver
>> could order its own cleanup code after all other ExitBootServices()
>> callbacks. (Assuming, at least, that no other ExitBootServices()
>> callback employs the same trick to defer itself!) Therefore by the time
>> the second callback runs, all PCI devices have been halted, and it is
>> safe to tear down the translations.
>>
>> The ordering would be ensured by the following:
>>
>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>>    to be signaled before the first notification action is taken." This
>>    means that, by the time the IOMMU's ExitBootServices() handler is
>>    entered, all other ExitBootServices() handlers have been *queued* at
>>    least, at TPL_CALLBACK or TPL_NOTIFY.
>>
>> - Therefore, when we signal our second callback from there, for
>>    TPL_CALLBACK, the second callback will be queued at the end of the
>>    TPL_CALLBACK queue.
>>
>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>>    functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>>    for the Type argument of CreateEvent." So it wouldn't matter if a
>>    driver used CreateEvent() or CreateEventEx() for setting up its own
>>    handler, the handler would be queued just the same.
>>
>> I think this is ugly and brittle, but perhaps the only way to clean up
>> *all* translations safely, with regard to PciIo.Unmap() +
>> ExitBootServices(), without updating the UEFI spec.
>>
>> What do you think?
>>
>> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
>> We do not need update PCI driver and we do not need update UEFI spec.
>> We only need update IOMMU driver which is concerned, based upon the threat model.
>> That probably is best solution. :-)
>
> I'm very glad to hear that you like the idea.
>
> However, it depends on whether we are permitted, by the UEFI spec, to
> signal another event in an ExitBootServices() notification function.
>
> Are we permitted to do that?
>
> Does the UEFI spec guarantee that the notification function for the
> *second* event will be queued just like it would be under "normal"
> circumstances?
>
> (I know we must not allocate or free memory in the notification function
> of the *second* event either; I just want to know if the second event's
> handler is *queued* like it would normally be.)
>
>
>> I assume you want to handle both common buffer and read/write buffer, right?
>
> Yes. Under this idea, all kinds of operations would be cleaned up.
>
>
>> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
>> If you or any SEV owner can share ...
>
> Absolutely. Please see above.
>
> Thank you!
> Laszlo
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-07  4:46                   ` Yao, Jiewen
@ 2017-09-07 11:46                     ` Laszlo Ersek
  2017-09-07 14:40                       ` Brijesh Singh
  0 siblings, 1 reply; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-07 11:46 UTC (permalink / raw)
  To: Yao, Jiewen, Brijesh Singh, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric

On 09/07/17 06:46, Yao, Jiewen wrote:
> Thanks for the sharing, Brijesh.
> 
> "If a page was marked as "shared"
> then its guest responsibility to make it "private" after its done communicating with
> hypervisor."
> 
> I believe I have same understanding - a *guest* should guarantee that.
> 
> My question is: During the *transition* from firmware to OS, *which guest* should make the shared buffer to be private? Is it "guest firmware" or "guest OS"?
> 
> Maybe I can ask the specific question to get it more clear.
> 
> 1)       If the common DMA buffer is not unmapped at ExitBootService, is that treated as an issue?
> 
> 2)       If the read/write DMA buffer is not unmapped at ExitBootService, is that treated as an issue?

Very good questions, totally to the point.

On the authoritative answer, I defer to Brijesh.

(

My personal opinion is that both situations (#1 and #2) are problems,
because they break the *practical* security invariant for SEV guests:

- most memory should be encrypted at all times, *and*

- any memory that is decrypted must have an owner that is responsible
  for re-encrypting the memory eventually.

Therefore, *either* the firmware has to re-encrypt all remaining DMA
buffers at ExitBootServices(), *or* a new information channel must be
designed, from firmware to OS, to carry over the decryption status.

I strongly prefer the first option, for the following reason: the same
questions apply to all EDK2 IOMMU protocol interfaces, not just the one
exported by the SEV driver.

)

Thanks,
Laszlo

> From: Brijesh Singh [mailto:brijesh.singh@amd.com]
> Sent: Wednesday, September 6, 2017 11:40 PM
> To: Laszlo Ersek <lersek@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
> Cc: brijesh.singh@amd.com; Dong, Eric <eric.dong@intel.com>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
> 
> 
> 
> On 09/06/2017 07:14 AM, Laszlo Ersek wrote:
>> On 09/06/17 06:37, Yao, Jiewen wrote:
>>> Thanks for the clarification. Comment in line.
>>>
>>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>>> Sent: Wednesday, September 6, 2017 1:57 AM
>>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
>>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>; Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>>
>>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>
>>>> Then after ExitBootService, the OS will take control of CR3 and set correct
>>>> encryption bit.
>>>
>>> This is true, the guest OS will set up new page tables, and in those
>>> PTEs, the C-bit ("encrypt") will likely be set by default.
>>>
>>> However, the guest OS will not *rewrite* all of the memory, with the
>>> C-bit set. This means that any memory that the firmware didn't actually
>>> re-encrypt (in the IOMMU driver) will still expose stale data to the
>>> hypervisor.
>>> [Jiewen] That is an interesting question.
>>> Is there any security concern associated?
>>
>> I can't tell for sure. Answering this question is up to the proponents
>> of SEV, who have designed the threat model for SEV.
>>
>> My operating assumption is that we should keep memory as tightly
>> encrypted as possible at the firmware --> OS control transfer. It could
>> be an exaggerated expectation from my side; I'd just better be safe than
>> sorry :)
>>
>>
> 
> Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then
> we can discuss a security model (if needed).
> 
> SEV is an extension to the AMD-V architecture which supports running encrypted
> virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have their
> pages (code and data) secured such that only the guest itself has access to the
> unencrypted version. Each encrypted VMs is associated with a unique encryption
> key; if its data is accessed by a different entity using a different key the
> encrypted guest data will be incorrectly decrypted, leading to unintelligible data.
> You can also find some detail for SEV in white paper [1].
> 
> SEV encrypted Vs have the concept of private and shared memory. The private memory
> is encrypted with the guest-specific key, while shared memory may be encrypted
> with hypervisor key. SEV guest VMs can choose which pages they would like to
> be private. But the instruction pages and guest page tables are always treated
> as private by the hardware. The DMA operation inside the guest must be performed
> on shared pages -- this is mainly because in virtualization world the device
> DMA needs some assistance from the hypervisor.
> 
> The security model provided by the SEV ensure that hypervisor will no longer able
> to inspect or alter any guest code or data. The guest OS controls what it want to
> share with outside world (in this case with Hypervisor).
> 
> In software implementation we took approach to encrypt everything possible starting
> early in boot. In this approach whenever OVMF wants to perform certain DMA operations
> it allocate a shared page, issues the command, free the shared page after the command
> is completed (in other word we use sw bounce buffer to complete the DMA operation).
> 
> We have implemented IOMMU driver to provide the following functions:
> 
> AllocateBuffer():
> --------------------
> it allocate a private pages, as per UEFI spec the driver will map the buffer allocated
> from this routine using BusMasterCommonBuffer hence we allocate extra stash pages
> in addition to requested pages.
> 
> 
> Map
> ----
> If requested operation is BusMasterRead or BusMasterWrite then we allocate a shared buffer
> and DeviceAddress point to shared buffer.
> 
> If requested operation is BusMasterCommonBuffer then we perform in-place decryption of the
> contents and update the page-table to clear the C-bit (basically make it shared page)
> 
> Unmap
> ------
> If operation was BusMasterRead or BusMasterWrite then we complete the unmapping and free
> the shared buffer allocated in Map().
> 
> If operation was BusMasterCommonBuffer then we perform in-place encryption and set the C-bit
> (basically make the page private)
> 
> FreeBuffer:
> -----------
> Free the pages
> 
> 
> Lets run with the below scenario:
> 
> 1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA read)
> 2) hypervisor completes the request operation
> 3) hypervisor caches the shared physical address and monitor it for information leak
> 4) sometime later if guest write data in its "shared" physical address then hypervisor can
>     easily read it without guest knowledge.
> 
> SEV hardware can protect us against the attack where someone tries to inject or alter the
> guest code. As I noted above any instruction fetch is forced as private hence if attacker
> write some code into a shared buffer and point the RIP to his/her code then instruction
> fetch will try to decrypt it and get unintelligible opcode. If a page was marked as "shared"
> then its guest responsibility to make it "private" after its done communicating with
> hypervisor.
> 
> [1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
> 
> 
>>> If the C-bit is cleared for a read/write buffer, is the content in the
>>> read/write buffer also exposed to hypervisor?
>>
>> Not immediately. Only when the guest also rewrites the memory through
>> the appropriate virtual address.
>>
>> Basically, in the virtual machine, the C-bit in a PTE that covers a
>> particular guest virtual address (GVA) controls whether a guest write
>> through that GVA will result in memory encryption, and if a gues read
>> through that GVA will result in memory decryption.
>>
>> The current state of the C-bit doesn't matter for the hypervisor, what
>> matters is the last guest write through a GVA whose PTE had the C-bit
>> set, or clear. If the C-bit was clear when the guest last wrote the
>> area, then the hypervisor can read the data. If the C-bit was set, then
>> the hypervisor can only read garbage.
>>
>>
>>> I means if there is security concern, the concern should be applied to
>>> both common buffer and read/write buffer.
>>> Then we have to figure a way to resolve both buffer.
>>
>> Yes, this is correct. The PciIo.Unmap operation, as currently
>> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
>> correctly for all operations, but it can only guarantee *not* freeing
>> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
>> is safe, while Unmap()ing Read/Write is not. The encryption would be
>> re-established just fine for Read/Write as well, but we would change the
>> UEFI memmap.
>>
>> In OVMF, we currently manage this problem by making all asynchronous DMA
>> mappings CommonBuffer, even if they could othewise be Read or Write. We
>> use Read and Write only if the DMA operation completes before the
>> higher-level protocol function returns (so we can immediately Unmap the
>> operation, and the ExitBootServices() handler doesn't have to care).
>>
>>
>>> To be honest, that is exactly my biggest question on this patch:
>>> why do we only handle the common buffer specially?
>>
>> For the following reason:
>>
>> - Given that CommonBuffer mappings take a separate AllocateBuffer() /
>> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
>> implementors -- beyond what the UEFI spec requries -- to keep *all*
>> dynamic memory management out of Map/Unmap. If they need dynamic memory
>> management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
>> This way Unmap is safe in ExitBootServices handlers.
>>
>> - We cannot *reasonably ask* PciIo implementors to provide the same
>> guarantee for Read and Write mappings, simply because there are no other
>> APIs that could manage the dynamic memory for Map and Unmap --
>> AllocateBuffer and FreeBuffer are not required for Read and Write
>> mappings. PciIo implementors couldn't agree to our request even if they
>> wanted to. Therefore Unmapping Read/Write operations is inherently
>> unsafe in EBS handlers.
>>
>>
>>>> NOTE: The device should still be halt state, because the device is
>>>> also controlled by OS.
>>>
>>> This is the key point -- the device is *not* controlled by the guest OS,
>>> in the worst case.
>>>
>>> The virtual device is ultimately implemented by the hypervisor. We don't
>>> expect the virtual device (= the hypervisor) to mis-behave on purpose.
>>> However, if the hypervisor is compromised by an external attacker (for
>>> example over the network, or via privilege escalation from another
>>> guest), then all guest RAM that has not been encrypted up to that point
>>> becomes visible to the attacker. In other words, the hypervisor ("the
>>> device") becomes retro-actively malicious.
>>> [Jiewen] If that is the threat model, I have a question on the exposure:
>>> 1) If the concern is for existing data, it means all DMA data already written.
>>> However, the DMA data is already consumed or produced by virtual device
>>> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.
>>
>> Maybe, maybe not. The data may have been sent to a different host over
>> the network, and wiped from memory.
>>
>> Or, the data may have been written to a disk image that is separately
>> encrypted by the host OS, and has been detached (unplugged) from the
>> guest, and also unmounted from the host, with the disk key securely
>> wiped from host memory.
>>
>> We can also imagine the reverse direction. Assume that the user of the
>> virtual machine goes into the UEFI shell in OVMF, starts the EDIT
>> program, and types some secret information into a text file on the ESP.
>> Further assume that the disk image is encrypted on the host OS. It is
>> conceivable that fragments of the secret could remain stuck in the AHCI
>> (disk) and USB (keyboard) DMA buffers.
>>
>> Maybe you think that these are "made up" examples, and I agree, I just
>> made them up. The point is, it is not my place to discount possible
>> attack vectors (I haven't designed SEV). Attackers will be limited by
>> their *own* imaginations only, not by mine :)
>>
>>
>>> 2) if the concern is for future data, it means all user data to be written.
>>> However, the C-bit already prevent that.
>>
>> As far as I understand SEV, future data is out of scope. The goal is to
>> prevent *retroactive* guest data leaks (= whatever is currently in host
>> OS memory) if an attacker compromises an otherwise non-malicious hypervisor.
>>
>> If an attacker compromises a virtualization host, then they can just sit
>> silent and watch data flow into and out of guests from that point
>> onward, because they can then monitor all DMA (which always happens in
>> clear-text) real-time.
>>
>>
>>> Maybe I do not fully understand the threat model defined.
>>> If you can explain more, that would be great.
>>
>> Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
>> correct me if I'm wrong.
>>
>>
>>> The point of SEV is to keep as much guest data encrypted at all times as
>>> possible, so that *whenever* the hypervisor is compromised by an
>>> attacker, the guest memory -- which the attacker can inevitably dump
>>> from the host side -- remains un-intellegible to the attacker.
>>> [Jiewen] OK. If this is a security question, I suggest we define a clear
>>> threat model at first.
>>> Or what we are doing might be unnecessary or insufficient.
>>
>> I agree.
>>
>> As I said above, my operating principle has been to re-encrypt all DMA
>> buffers as soon as possible. For long-standing DMA buffers, re-encrypt
>> them at ExitBootServices at the latest.
>>
>>
>>> [Jiewen] For "require that Unmap() work for CommonBuffer
>>> operations without releasing memory", I would hold my opinion until
>>> it is documented clearly in UEFI spec.
>>
>> You are right, of course.
>>
>> It's just that we cannot avoid exploring ways, for this feature, that
>> currently fall outside of the spec. Whatever we do here will be outside
>> of the spec, one way or another. I suggested what I thought could be a
>> reasonable extension to the spec, one that could be implemented by PciIo
>> implementors even before the spec is modified.
>>
>> edk2's PciIo.Unmap already behaves like this, without breaking the spec.
>>
>> If there's a better way -- for example modifying CoreExitBootServices()
>> in edk2, to signal IOMMU drivers separately, *after* notifying other
>> ExitBootServices() handlers --, that might work as well.
>>
>>
>>> My current concern is:
>>> First, this sentence is NOT defined in UEFI specification.
>>
>> Correct.
>>
>>
>>> Second, it might break an existing implementation or existing feature, such as tracing.
>>
>> Correct.
>>
>>> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
>>> to call memory services.
>>> The only restriction is
>>> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
>>> 2) AP restriction, where no UEFI service/protocol is allowed for AP.
>>
>> I agree.
>>
>>
>>> I'm just trying to operate with the APIs currently defined by the UEFI
>>> spec, and these assumptions were the best I could come up with.
>>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
>>> Especially the IHV Option ROM should not consume any private API.
>>
>> I disagree about "only follow". If there are additional requirements
>> that can be satisfied without breaking the spec, driver implementors can
>> choose to conform to both sets of requirements.
>>
>> For example (if I understand correctly), Microsoft / Windows present a
>> bunch of requirements *beyond* the UEFI spec, for both platform and
>> add-on card firmware. Most vendors conform :)
>>
>>
>>>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>>>> guest memory should be encrypted again, even those areas that were once
>>>> used as CommonBuffers."
>>>> For SEV case, I think it should be controlled by OS, because it is just CR3.
>>>
>>> If it was indeed just CR3, then I would fully agree with you.
>>>
>>> But -- to my understanding --, ensuring that the memory is actually
>>> encrypted requires that the guest *write* to the memory through a
>>> virtual address whose PTE has the C-bit set.
>>>
>>> And I don't think the guest OS can be expected to rewrite all of its
>>> memory at launch. :(
>>>
>>> [Jiewen] That is fine.
>>> I suggest we get clear on the threat model as the first step.
>>> The threat model for SEV usage in OVMF.
>>>
>>> I am sorry if that is already discussed before. I might ignore the conversation.
>>
>> No problem; it's always good to re-investigate our assumptions and
>> operating principles.
>>
>>
>>> If you or any SEV owner can share the insight, that will be great.
>>> See my question above "If that is the threat model, I have a question on the exposure:..."
>>
>> I shared *my* impressions of the threat model (which are somewhat
>> unclear, admittedly, and thus could make me overly cautious).
>>
>> I hope Brijesh can extend and/or correct my description.
>>
>>
>>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>>>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>>>> SEV forbids all DMA unless configured otherwise.
>>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>>>> I setup translation table, but mark all entry to be NOT-PRESENT.
>>>> I grant DMA access only if there is a specific request by a device.
>>>>
>>>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>>>> the existing OS, which might not have knowledge on IOMMU.
>>>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>>>> turn off IOMMU, or keep it enabled at ExitBootService.
>>>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>>>
>>> Thanks for the clarification.
>>>
>>> But, again, will this not lead to the possibility that the VT-d IOMMU's
>>> ExitBootServices() handler disables all DMA *before* the PCI drivers get
>>> a chance to run their own ExitBootServices() handlers, disabling the
>>> individual devices?
>>> [Jiewen] Sorry for the confusing. Let me explain:
>>> No, VTd never disables *all* DMA buffer at ExitBootService.
>>>
>>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
>>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
>>> "only allow the DMA from Map() request".
>>
>> Okay, but this interpretation was exactly what I thought of first (see
>> above): "VT-d permits all DMA unless configured otherwise". You answered
>> that it wasn't the case.
>>
>> So basically, if I understand it correctly now, at ExitBootServices the
>> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
>>
>> That is equivalent to decrypting all memory under SEV, and is the exact
>> opposite of what we want. (As I understand it.)
>>
>>
>>> If that happens, then a series of IOMMU faults could be generated, which
>>> I described above. (I.e., such IOMMU faults could result, at least in
>>> the case of SEV, in garbage being written to disk, via queued commands.)
>>> [Jiewen] You are right. That would not happen.
>>> IOMMU driver should not bring any compatibility problem for the PCI driver,
>>> iff the PCI driver follows the UEFI specification.
>>
>> Agreed.
>>
>>
>>> Now, to finish up, here's an idea I just had.
>>>
>>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
>>> notification function?
>>>
>>> If we are, then we could do the following:
>>>
>>> * PciIo.Unmap() and friends would work as always (no restrictions on
>>>    dynamic memory allocation or freeing, for any kind of DMA operation).
>>>
>>> * PCI drivers would not be expected to call PciIo.Unmap() in their
>>>    ExitBootServices() handlers.
>>>
>>> * The IOMMU driver would have an ExitBootServices() notification
>>>    function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>>>    (doesn't matter which).
>>>
>>> * In this notification function, the IOMMU driver would signal *another*
>>>    event (a private one). The notification function for this event would
>>>    be enqueued strictly at the TPL_CALLBACK level.
>>>
>>> * The notification function for the second event (private to the IOMMU)
>>>    would iterate over all existent mappings, and unmap them, without
>>>    allocating or freeing memory.
>>>
>>> The key point is that by signaling the second event, the IOMMU driver
>>> could order its own cleanup code after all other ExitBootServices()
>>> callbacks. (Assuming, at least, that no other ExitBootServices()
>>> callback employs the same trick to defer itself!) Therefore by the time
>>> the second callback runs, all PCI devices have been halted, and it is
>>> safe to tear down the translations.
>>>
>>> The ordering would be ensured by the following:
>>>
>>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>>>    to be signaled before the first notification action is taken." This
>>>    means that, by the time the IOMMU's ExitBootServices() handler is
>>>    entered, all other ExitBootServices() handlers have been *queued* at
>>>    least, at TPL_CALLBACK or TPL_NOTIFY.
>>>
>>> - Therefore, when we signal our second callback from there, for
>>>    TPL_CALLBACK, the second callback will be queued at the end of the
>>>    TPL_CALLBACK queue.
>>>
>>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>>>    functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>>>    for the Type argument of CreateEvent." So it wouldn't matter if a
>>>    driver used CreateEvent() or CreateEventEx() for setting up its own
>>>    handler, the handler would be queued just the same.
>>>
>>> I think this is ugly and brittle, but perhaps the only way to clean up
>>> *all* translations safely, with regard to PciIo.Unmap() +
>>> ExitBootServices(), without updating the UEFI spec.
>>>
>>> What do you think?
>>>
>>> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
>>> We do not need update PCI driver and we do not need update UEFI spec.
>>> We only need update IOMMU driver which is concerned, based upon the threat model.
>>> That probably is best solution. :-)
>>
>> I'm very glad to hear that you like the idea.
>>
>> However, it depends on whether we are permitted, by the UEFI spec, to
>> signal another event in an ExitBootServices() notification function.
>>
>> Are we permitted to do that?
>>
>> Does the UEFI spec guarantee that the notification function for the
>> *second* event will be queued just like it would be under "normal"
>> circumstances?
>>
>> (I know we must not allocate or free memory in the notification function
>> of the *second* event either; I just want to know if the second event's
>> handler is *queued* like it would normally be.)
>>
>>
>>> I assume you want to handle both common buffer and read/write buffer, right?
>>
>> Yes. Under this idea, all kinds of operations would be cleaned up.
>>
>>
>>> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
>>> If you or any SEV owner can share ...
>>
>> Absolutely. Please see above.
>>
>> Thank you!
>> Laszlo
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-07  4:34                 ` Yao, Jiewen
@ 2017-09-07 12:11                   ` Laszlo Ersek
  0 siblings, 0 replies; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-07 12:11 UTC (permalink / raw)
  To: Yao, Jiewen, Zeng, Star, edk2-devel-01; +Cc: Brijesh Singh, Dong, Eric

On 09/07/17 06:34, Yao, Jiewen wrote:
> Thanks.
> It seems the discussion becomes too long. I try to make it short.
> 
> 
> 1)       As long as we use same mechanism to handle both common buffer and read/write buffer. I have no concern at all. :)
> 
> 
> 2)       I am sorry that I do not have an immediate answer for your question on event handling in ExitBootServices.
> Although it seems tricky, I believe it works. Can you have a try?
> 
> 
> 3)       Looking at the code (DxeMain.c), I have another idea for your consideration only.
> 
>   //
>   // Notify other drivers that we are exiting boot services.
>   //
>   CoreNotifySignalList (&gEfiEventExitBootServicesGuid);
> 
>   //
>   // Report that ExitBootServices() has been called
>   //
>   REPORT_STATUS_CODE (
>     EFI_PROGRESS_CODE,
>     (EFI_SOFTWARE_EFI_BOOT_SERVICE | EFI_SW_BS_PC_EXIT_BOOT_SERVICES)
>     );
> 
> The cleanup driver may register a report status code handle to do the cleanup. :)
> 
> We already have such example in MdeModulePkg\Universal\Acpi\FirmwarePerformanceDataTableDxe\ FirmwarePerformanceDxe.c.
> FpdtStatusCodeListenerDxe()
> 
>   } else if (Value == (EFI_SOFTWARE_EFI_BOOT_SERVICE | EFI_SW_BS_PC_EXIT_BOOT_SERVICES)) {
> ......
> 
> The requirement will become: If a platform wants to add cleanup, it must use a real report status code library.

I think I prefer option #2; it presents fewer requirements for the platform.

In addition: I've just realized that we don't need the UEFI spec to
promise us that option #2 works. We only have to care whether it works
in edk2.

The reason is that EDKII_IOMMU_PROTOCOL is *already* edk2-specific; it
is a platform (DXE) driver, not a UEFI driver. In other words, we would
use option #2 only inside edk2.

So, I'll try option #2.

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-07 11:46                     ` Laszlo Ersek
@ 2017-09-07 14:40                       ` Brijesh Singh
  2017-09-07 14:48                         ` Yao, Jiewen
  0 siblings, 1 reply; 21+ messages in thread
From: Brijesh Singh @ 2017-09-07 14:40 UTC (permalink / raw)
  To: Laszlo Ersek, Yao, Jiewen, Zeng, Star, edk2-devel-01
  Cc: brijesh.singh, Dong, Eric

Hi Jiewen,

On 09/07/2017 06:46 AM, Laszlo Ersek wrote:
> On 09/07/17 06:46, Yao, Jiewen wrote:
>> Thanks for the sharing, Brijesh.
>>
>> "If a page was marked as "shared"
>> then its guest responsibility to make it "private" after its done communicating with
>> hypervisor."
>>
>> I believe I have same understanding - a *guest* should guarantee that.
>>
>> My question is: During the *transition* from firmware to OS, *which guest* should make the shared buffer to be private? Is it "guest firmware" or "guest OS"?
>>
>> Maybe I can ask the specific question to get it more clear.
>>
>> 1)       If the common DMA buffer is not unmapped at ExitBootService, is that treated as an issue?
>>
>> 2)       If the read/write DMA buffer is not unmapped at ExitBootService, is that treated as an issue?
> 
> Very good questions, totally to the point.
> 
> On the authoritative answer, I defer to Brijesh.
> 


Both the above cases (#1 and #2) are problems. Since buffers was owned by firmware
and firmware made it "shared" hence firmware is responsible to mark them as private
after its done with the buffer. In other words, we must call Unmap() from ExitBootServices
to ensure that buffers mapped using BusMasterCommonBuffer/BusMasterRead/BusMasterWrite
is marked as "private" before we make it available to the guest OS. (we do similar thing
in Linux OS).

Having any kind of side channel to communicate the encryption status of a page
will not work -- we should be able to support a usecase where we boot OVMF in
64-bit but launch 32-bit Linux guest. When Linux boots in 32-bit mode it does not have
access to encryption bit (C-bit is bit-47 in page table) and can't mark the page as
private (even if we provide some kind of side-channel).

thank you very much for all your help.

> (
> 
> My personal opinion is that both situations (#1 and #2) are problems,
> because they break the *practical* security invariant for SEV guests:
> 
> - most memory should be encrypted at all times, *and*
> 
> - any memory that is decrypted must have an owner that is responsible
>    for re-encrypting the memory eventually.
> 
> Therefore, *either* the firmware has to re-encrypt all remaining DMA
> buffers at ExitBootServices(), *or* a new information channel must be
> designed, from firmware to OS, to carry over the decryption status.
> 
> I strongly prefer the first option, for the following reason: the same
> questions apply to all EDK2 IOMMU protocol interfaces, not just the one
> exported by the SEV driver.
> 
> )
> 
> Thanks,
> Laszlo
> 
>> From: Brijesh Singh [mailto:brijesh.singh@amd.com]
>> Sent: Wednesday, September 6, 2017 11:40 PM
>> To: Laszlo Ersek <lersek@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
>> Cc: brijesh.singh@amd.com; Dong, Eric <eric.dong@intel.com>
>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>
>>
>>
>> On 09/06/2017 07:14 AM, Laszlo Ersek wrote:
>>> On 09/06/17 06:37, Yao, Jiewen wrote:
>>>> Thanks for the clarification. Comment in line.
>>>>
>>>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>>>> Sent: Wednesday, September 6, 2017 1:57 AM
>>>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
>>>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>; Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>>
>>>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>>
>>>>> Then after ExitBootService, the OS will take control of CR3 and set correct
>>>>> encryption bit.
>>>>
>>>> This is true, the guest OS will set up new page tables, and in those
>>>> PTEs, the C-bit ("encrypt") will likely be set by default.
>>>>
>>>> However, the guest OS will not *rewrite* all of the memory, with the
>>>> C-bit set. This means that any memory that the firmware didn't actually
>>>> re-encrypt (in the IOMMU driver) will still expose stale data to the
>>>> hypervisor.
>>>> [Jiewen] That is an interesting question.
>>>> Is there any security concern associated?
>>>
>>> I can't tell for sure. Answering this question is up to the proponents
>>> of SEV, who have designed the threat model for SEV.
>>>
>>> My operating assumption is that we should keep memory as tightly
>>> encrypted as possible at the firmware --> OS control transfer. It could
>>> be an exaggerated expectation from my side; I'd just better be safe than
>>> sorry :)
>>>
>>>
>>
>> Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then
>> we can discuss a security model (if needed).
>>
>> SEV is an extension to the AMD-V architecture which supports running encrypted
>> virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have their
>> pages (code and data) secured such that only the guest itself has access to the
>> unencrypted version. Each encrypted VMs is associated with a unique encryption
>> key; if its data is accessed by a different entity using a different key the
>> encrypted guest data will be incorrectly decrypted, leading to unintelligible data.
>> You can also find some detail for SEV in white paper [1].
>>
>> SEV encrypted Vs have the concept of private and shared memory. The private memory
>> is encrypted with the guest-specific key, while shared memory may be encrypted
>> with hypervisor key. SEV guest VMs can choose which pages they would like to
>> be private. But the instruction pages and guest page tables are always treated
>> as private by the hardware. The DMA operation inside the guest must be performed
>> on shared pages -- this is mainly because in virtualization world the device
>> DMA needs some assistance from the hypervisor.
>>
>> The security model provided by the SEV ensure that hypervisor will no longer able
>> to inspect or alter any guest code or data. The guest OS controls what it want to
>> share with outside world (in this case with Hypervisor).
>>
>> In software implementation we took approach to encrypt everything possible starting
>> early in boot. In this approach whenever OVMF wants to perform certain DMA operations
>> it allocate a shared page, issues the command, free the shared page after the command
>> is completed (in other word we use sw bounce buffer to complete the DMA operation).
>>
>> We have implemented IOMMU driver to provide the following functions:
>>
>> AllocateBuffer():
>> --------------------
>> it allocate a private pages, as per UEFI spec the driver will map the buffer allocated
>> from this routine using BusMasterCommonBuffer hence we allocate extra stash pages
>> in addition to requested pages.
>>
>>
>> Map
>> ----
>> If requested operation is BusMasterRead or BusMasterWrite then we allocate a shared buffer
>> and DeviceAddress point to shared buffer.
>>
>> If requested operation is BusMasterCommonBuffer then we perform in-place decryption of the
>> contents and update the page-table to clear the C-bit (basically make it shared page)
>>
>> Unmap
>> ------
>> If operation was BusMasterRead or BusMasterWrite then we complete the unmapping and free
>> the shared buffer allocated in Map().
>>
>> If operation was BusMasterCommonBuffer then we perform in-place encryption and set the C-bit
>> (basically make the page private)
>>
>> FreeBuffer:
>> -----------
>> Free the pages
>>
>>
>> Lets run with the below scenario:
>>
>> 1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA read)
>> 2) hypervisor completes the request operation
>> 3) hypervisor caches the shared physical address and monitor it for information leak
>> 4) sometime later if guest write data in its "shared" physical address then hypervisor can
>>      easily read it without guest knowledge.
>>
>> SEV hardware can protect us against the attack where someone tries to inject or alter the
>> guest code. As I noted above any instruction fetch is forced as private hence if attacker
>> write some code into a shared buffer and point the RIP to his/her code then instruction
>> fetch will try to decrypt it and get unintelligible opcode. If a page was marked as "shared"
>> then its guest responsibility to make it "private" after its done communicating with
>> hypervisor.
>>
>> [1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
>>
>>
>>>> If the C-bit is cleared for a read/write buffer, is the content in the
>>>> read/write buffer also exposed to hypervisor?
>>>
>>> Not immediately. Only when the guest also rewrites the memory through
>>> the appropriate virtual address.
>>>
>>> Basically, in the virtual machine, the C-bit in a PTE that covers a
>>> particular guest virtual address (GVA) controls whether a guest write
>>> through that GVA will result in memory encryption, and if a gues read
>>> through that GVA will result in memory decryption.
>>>
>>> The current state of the C-bit doesn't matter for the hypervisor, what
>>> matters is the last guest write through a GVA whose PTE had the C-bit
>>> set, or clear. If the C-bit was clear when the guest last wrote the
>>> area, then the hypervisor can read the data. If the C-bit was set, then
>>> the hypervisor can only read garbage.
>>>
>>>
>>>> I means if there is security concern, the concern should be applied to
>>>> both common buffer and read/write buffer.
>>>> Then we have to figure a way to resolve both buffer.
>>>
>>> Yes, this is correct. The PciIo.Unmap operation, as currently
>>> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
>>> correctly for all operations, but it can only guarantee *not* freeing
>>> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
>>> is safe, while Unmap()ing Read/Write is not. The encryption would be
>>> re-established just fine for Read/Write as well, but we would change the
>>> UEFI memmap.
>>>
>>> In OVMF, we currently manage this problem by making all asynchronous DMA
>>> mappings CommonBuffer, even if they could othewise be Read or Write. We
>>> use Read and Write only if the DMA operation completes before the
>>> higher-level protocol function returns (so we can immediately Unmap the
>>> operation, and the ExitBootServices() handler doesn't have to care).
>>>
>>>
>>>> To be honest, that is exactly my biggest question on this patch:
>>>> why do we only handle the common buffer specially?
>>>
>>> For the following reason:
>>>
>>> - Given that CommonBuffer mappings take a separate AllocateBuffer() /
>>> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
>>> implementors -- beyond what the UEFI spec requries -- to keep *all*
>>> dynamic memory management out of Map/Unmap. If they need dynamic memory
>>> management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
>>> This way Unmap is safe in ExitBootServices handlers.
>>>
>>> - We cannot *reasonably ask* PciIo implementors to provide the same
>>> guarantee for Read and Write mappings, simply because there are no other
>>> APIs that could manage the dynamic memory for Map and Unmap --
>>> AllocateBuffer and FreeBuffer are not required for Read and Write
>>> mappings. PciIo implementors couldn't agree to our request even if they
>>> wanted to. Therefore Unmapping Read/Write operations is inherently
>>> unsafe in EBS handlers.
>>>
>>>
>>>>> NOTE: The device should still be halt state, because the device is
>>>>> also controlled by OS.
>>>>
>>>> This is the key point -- the device is *not* controlled by the guest OS,
>>>> in the worst case.
>>>>
>>>> The virtual device is ultimately implemented by the hypervisor. We don't
>>>> expect the virtual device (= the hypervisor) to mis-behave on purpose.
>>>> However, if the hypervisor is compromised by an external attacker (for
>>>> example over the network, or via privilege escalation from another
>>>> guest), then all guest RAM that has not been encrypted up to that point
>>>> becomes visible to the attacker. In other words, the hypervisor ("the
>>>> device") becomes retro-actively malicious.
>>>> [Jiewen] If that is the threat model, I have a question on the exposure:
>>>> 1) If the concern is for existing data, it means all DMA data already written.
>>>> However, the DMA data is already consumed or produced by virtual device
>>>> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.
>>>
>>> Maybe, maybe not. The data may have been sent to a different host over
>>> the network, and wiped from memory.
>>>
>>> Or, the data may have been written to a disk image that is separately
>>> encrypted by the host OS, and has been detached (unplugged) from the
>>> guest, and also unmounted from the host, with the disk key securely
>>> wiped from host memory.
>>>
>>> We can also imagine the reverse direction. Assume that the user of the
>>> virtual machine goes into the UEFI shell in OVMF, starts the EDIT
>>> program, and types some secret information into a text file on the ESP.
>>> Further assume that the disk image is encrypted on the host OS. It is
>>> conceivable that fragments of the secret could remain stuck in the AHCI
>>> (disk) and USB (keyboard) DMA buffers.
>>>
>>> Maybe you think that these are "made up" examples, and I agree, I just
>>> made them up. The point is, it is not my place to discount possible
>>> attack vectors (I haven't designed SEV). Attackers will be limited by
>>> their *own* imaginations only, not by mine :)
>>>
>>>
>>>> 2) if the concern is for future data, it means all user data to be written.
>>>> However, the C-bit already prevent that.
>>>
>>> As far as I understand SEV, future data is out of scope. The goal is to
>>> prevent *retroactive* guest data leaks (= whatever is currently in host
>>> OS memory) if an attacker compromises an otherwise non-malicious hypervisor.
>>>
>>> If an attacker compromises a virtualization host, then they can just sit
>>> silent and watch data flow into and out of guests from that point
>>> onward, because they can then monitor all DMA (which always happens in
>>> clear-text) real-time.
>>>
>>>
>>>> Maybe I do not fully understand the threat model defined.
>>>> If you can explain more, that would be great.
>>>
>>> Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
>>> correct me if I'm wrong.
>>>
>>>
>>>> The point of SEV is to keep as much guest data encrypted at all times as
>>>> possible, so that *whenever* the hypervisor is compromised by an
>>>> attacker, the guest memory -- which the attacker can inevitably dump
>>>> from the host side -- remains un-intellegible to the attacker.
>>>> [Jiewen] OK. If this is a security question, I suggest we define a clear
>>>> threat model at first.
>>>> Or what we are doing might be unnecessary or insufficient.
>>>
>>> I agree.
>>>
>>> As I said above, my operating principle has been to re-encrypt all DMA
>>> buffers as soon as possible. For long-standing DMA buffers, re-encrypt
>>> them at ExitBootServices at the latest.
>>>
>>>
>>>> [Jiewen] For "require that Unmap() work for CommonBuffer
>>>> operations without releasing memory", I would hold my opinion until
>>>> it is documented clearly in UEFI spec.
>>>
>>> You are right, of course.
>>>
>>> It's just that we cannot avoid exploring ways, for this feature, that
>>> currently fall outside of the spec. Whatever we do here will be outside
>>> of the spec, one way or another. I suggested what I thought could be a
>>> reasonable extension to the spec, one that could be implemented by PciIo
>>> implementors even before the spec is modified.
>>>
>>> edk2's PciIo.Unmap already behaves like this, without breaking the spec.
>>>
>>> If there's a better way -- for example modifying CoreExitBootServices()
>>> in edk2, to signal IOMMU drivers separately, *after* notifying other
>>> ExitBootServices() handlers --, that might work as well.
>>>
>>>
>>>> My current concern is:
>>>> First, this sentence is NOT defined in UEFI specification.
>>>
>>> Correct.
>>>
>>>
>>>> Second, it might break an existing implementation or existing feature, such as tracing.
>>>
>>> Correct.
>>>
>>>> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
>>>> to call memory services.
>>>> The only restriction is
>>>> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
>>>> 2) AP restriction, where no UEFI service/protocol is allowed for AP.
>>>
>>> I agree.
>>>
>>>
>>>> I'm just trying to operate with the APIs currently defined by the UEFI
>>>> spec, and these assumptions were the best I could come up with.
>>>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
>>>> Especially the IHV Option ROM should not consume any private API.
>>>
>>> I disagree about "only follow". If there are additional requirements
>>> that can be satisfied without breaking the spec, driver implementors can
>>> choose to conform to both sets of requirements.
>>>
>>> For example (if I understand correctly), Microsoft / Windows present a
>>> bunch of requirements *beyond* the UEFI spec, for both platform and
>>> add-on card firmware. Most vendors conform :)
>>>
>>>
>>>>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>>>>> guest memory should be encrypted again, even those areas that were once
>>>>> used as CommonBuffers."
>>>>> For SEV case, I think it should be controlled by OS, because it is just CR3.
>>>>
>>>> If it was indeed just CR3, then I would fully agree with you.
>>>>
>>>> But -- to my understanding --, ensuring that the memory is actually
>>>> encrypted requires that the guest *write* to the memory through a
>>>> virtual address whose PTE has the C-bit set.
>>>>
>>>> And I don't think the guest OS can be expected to rewrite all of its
>>>> memory at launch. :(
>>>>
>>>> [Jiewen] That is fine.
>>>> I suggest we get clear on the threat model as the first step.
>>>> The threat model for SEV usage in OVMF.
>>>>
>>>> I am sorry if that is already discussed before. I might ignore the conversation.
>>>
>>> No problem; it's always good to re-investigate our assumptions and
>>> operating principles.
>>>
>>>
>>>> If you or any SEV owner can share the insight, that will be great.
>>>> See my question above "If that is the threat model, I have a question on the exposure:..."
>>>
>>> I shared *my* impressions of the threat model (which are somewhat
>>> unclear, admittedly, and thus could make me overly cautious).
>>>
>>> I hope Brijesh can extend and/or correct my description.
>>>
>>>
>>>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>>>>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>>>>> SEV forbids all DMA unless configured otherwise.
>>>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>>>>> I setup translation table, but mark all entry to be NOT-PRESENT.
>>>>> I grant DMA access only if there is a specific request by a device.
>>>>>
>>>>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>>>>> the existing OS, which might not have knowledge on IOMMU.
>>>>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>>>>> turn off IOMMU, or keep it enabled at ExitBootService.
>>>>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>>>>
>>>> Thanks for the clarification.
>>>>
>>>> But, again, will this not lead to the possibility that the VT-d IOMMU's
>>>> ExitBootServices() handler disables all DMA *before* the PCI drivers get
>>>> a chance to run their own ExitBootServices() handlers, disabling the
>>>> individual devices?
>>>> [Jiewen] Sorry for the confusing. Let me explain:
>>>> No, VTd never disables *all* DMA buffer at ExitBootService.
>>>>
>>>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
>>>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
>>>> "only allow the DMA from Map() request".
>>>
>>> Okay, but this interpretation was exactly what I thought of first (see
>>> above): "VT-d permits all DMA unless configured otherwise". You answered
>>> that it wasn't the case.
>>>
>>> So basically, if I understand it correctly now, at ExitBootServices the
>>> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
>>>
>>> That is equivalent to decrypting all memory under SEV, and is the exact
>>> opposite of what we want. (As I understand it.)
>>>
>>>
>>>> If that happens, then a series of IOMMU faults could be generated, which
>>>> I described above. (I.e., such IOMMU faults could result, at least in
>>>> the case of SEV, in garbage being written to disk, via queued commands.)
>>>> [Jiewen] You are right. That would not happen.
>>>> IOMMU driver should not bring any compatibility problem for the PCI driver,
>>>> iff the PCI driver follows the UEFI specification.
>>>
>>> Agreed.
>>>
>>>
>>>> Now, to finish up, here's an idea I just had.
>>>>
>>>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
>>>> notification function?
>>>>
>>>> If we are, then we could do the following:
>>>>
>>>> * PciIo.Unmap() and friends would work as always (no restrictions on
>>>>     dynamic memory allocation or freeing, for any kind of DMA operation).
>>>>
>>>> * PCI drivers would not be expected to call PciIo.Unmap() in their
>>>>     ExitBootServices() handlers.
>>>>
>>>> * The IOMMU driver would have an ExitBootServices() notification
>>>>     function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>>>>     (doesn't matter which).
>>>>
>>>> * In this notification function, the IOMMU driver would signal *another*
>>>>     event (a private one). The notification function for this event would
>>>>     be enqueued strictly at the TPL_CALLBACK level.
>>>>
>>>> * The notification function for the second event (private to the IOMMU)
>>>>     would iterate over all existent mappings, and unmap them, without
>>>>     allocating or freeing memory.
>>>>
>>>> The key point is that by signaling the second event, the IOMMU driver
>>>> could order its own cleanup code after all other ExitBootServices()
>>>> callbacks. (Assuming, at least, that no other ExitBootServices()
>>>> callback employs the same trick to defer itself!) Therefore by the time
>>>> the second callback runs, all PCI devices have been halted, and it is
>>>> safe to tear down the translations.
>>>>
>>>> The ordering would be ensured by the following:
>>>>
>>>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>>>>     to be signaled before the first notification action is taken." This
>>>>     means that, by the time the IOMMU's ExitBootServices() handler is
>>>>     entered, all other ExitBootServices() handlers have been *queued* at
>>>>     least, at TPL_CALLBACK or TPL_NOTIFY.
>>>>
>>>> - Therefore, when we signal our second callback from there, for
>>>>     TPL_CALLBACK, the second callback will be queued at the end of the
>>>>     TPL_CALLBACK queue.
>>>>
>>>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>>>>     functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>>>>     for the Type argument of CreateEvent." So it wouldn't matter if a
>>>>     driver used CreateEvent() or CreateEventEx() for setting up its own
>>>>     handler, the handler would be queued just the same.
>>>>
>>>> I think this is ugly and brittle, but perhaps the only way to clean up
>>>> *all* translations safely, with regard to PciIo.Unmap() +
>>>> ExitBootServices(), without updating the UEFI spec.
>>>>
>>>> What do you think?
>>>>
>>>> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
>>>> We do not need update PCI driver and we do not need update UEFI spec.
>>>> We only need update IOMMU driver which is concerned, based upon the threat model.
>>>> That probably is best solution. :-)
>>>
>>> I'm very glad to hear that you like the idea.
>>>
>>> However, it depends on whether we are permitted, by the UEFI spec, to
>>> signal another event in an ExitBootServices() notification function.
>>>
>>> Are we permitted to do that?
>>>
>>> Does the UEFI spec guarantee that the notification function for the
>>> *second* event will be queued just like it would be under "normal"
>>> circumstances?
>>>
>>> (I know we must not allocate or free memory in the notification function
>>> of the *second* event either; I just want to know if the second event's
>>> handler is *queued* like it would normally be.)
>>>
>>>
>>>> I assume you want to handle both common buffer and read/write buffer, right?
>>>
>>> Yes. Under this idea, all kinds of operations would be cleaned up.
>>>
>>>
>>>> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
>>>> If you or any SEV owner can share ...
>>>
>>> Absolutely. Please see above.
>>>
>>> Thank you!
>>> Laszlo
>>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-07 14:40                       ` Brijesh Singh
@ 2017-09-07 14:48                         ` Yao, Jiewen
  2017-09-07 16:40                           ` Laszlo Ersek
  0 siblings, 1 reply; 21+ messages in thread
From: Yao, Jiewen @ 2017-09-07 14:48 UTC (permalink / raw)
  To: Brijesh Singh, Laszlo Ersek, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric

Great. Thanks to confirm that. Now it is clear to me.

Then let's wait Laszlo's new patch to make all DMA buffer to private. :)

Thank you
Yao, Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Brijesh Singh
Sent: Thursday, September 7, 2017 10:40 PM
To: Laszlo Ersek <lersek@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
Cc: brijesh.singh@amd.com; Dong, Eric <eric.dong@intel.com>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

Hi Jiewen,

On 09/07/2017 06:46 AM, Laszlo Ersek wrote:
> On 09/07/17 06:46, Yao, Jiewen wrote:
>> Thanks for the sharing, Brijesh.
>>
>> "If a page was marked as "shared"
>> then its guest responsibility to make it "private" after its done communicating with
>> hypervisor."
>>
>> I believe I have same understanding - a *guest* should guarantee that.
>>
>> My question is: During the *transition* from firmware to OS, *which guest* should make the shared buffer to be private? Is it "guest firmware" or "guest OS"?
>>
>> Maybe I can ask the specific question to get it more clear.
>>
>> 1)       If the common DMA buffer is not unmapped at ExitBootService, is that treated as an issue?
>>
>> 2)       If the read/write DMA buffer is not unmapped at ExitBootService, is that treated as an issue?
>
> Very good questions, totally to the point.
>
> On the authoritative answer, I defer to Brijesh.
>


Both the above cases (#1 and #2) are problems. Since buffers was owned by firmware
and firmware made it "shared" hence firmware is responsible to mark them as private
after its done with the buffer. In other words, we must call Unmap() from ExitBootServices
to ensure that buffers mapped using BusMasterCommonBuffer/BusMasterRead/BusMasterWrite
is marked as "private" before we make it available to the guest OS. (we do similar thing
in Linux OS).

Having any kind of side channel to communicate the encryption status of a page
will not work -- we should be able to support a usecase where we boot OVMF in
64-bit but launch 32-bit Linux guest. When Linux boots in 32-bit mode it does not have
access to encryption bit (C-bit is bit-47 in page table) and can't mark the page as
private (even if we provide some kind of side-channel).

thank you very much for all your help.

> (
>
> My personal opinion is that both situations (#1 and #2) are problems,
> because they break the *practical* security invariant for SEV guests:
>
> - most memory should be encrypted at all times, *and*
>
> - any memory that is decrypted must have an owner that is responsible
>    for re-encrypting the memory eventually.
>
> Therefore, *either* the firmware has to re-encrypt all remaining DMA
> buffers at ExitBootServices(), *or* a new information channel must be
> designed, from firmware to OS, to carry over the decryption status.
>
> I strongly prefer the first option, for the following reason: the same
> questions apply to all EDK2 IOMMU protocol interfaces, not just the one
> exported by the SEV driver.
>
> )
>
> Thanks,
> Laszlo
>
>> From: Brijesh Singh [mailto:brijesh.singh@amd.com]
>> Sent: Wednesday, September 6, 2017 11:40 PM
>> To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
>> Cc: brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>; Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>
>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>
>>
>>
>> On 09/06/2017 07:14 AM, Laszlo Ersek wrote:
>>> On 09/06/17 06:37, Yao, Jiewen wrote:
>>>> Thanks for the clarification. Comment in line.
>>>>
>>>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>>>> Sent: Wednesday, September 6, 2017 1:57 AM
>>>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>>
>>>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com<mailto:eric.dong@intel.com%3cmailto:eric.dong@intel.com>>>; Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com<mailto:brijesh.singh@amd.com%3cmailto:brijesh.singh@amd.com>>>
>>>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>>
>>>>> Then after ExitBootService, the OS will take control of CR3 and set correct
>>>>> encryption bit.
>>>>
>>>> This is true, the guest OS will set up new page tables, and in those
>>>> PTEs, the C-bit ("encrypt") will likely be set by default.
>>>>
>>>> However, the guest OS will not *rewrite* all of the memory, with the
>>>> C-bit set. This means that any memory that the firmware didn't actually
>>>> re-encrypt (in the IOMMU driver) will still expose stale data to the
>>>> hypervisor.
>>>> [Jiewen] That is an interesting question.
>>>> Is there any security concern associated?
>>>
>>> I can't tell for sure. Answering this question is up to the proponents
>>> of SEV, who have designed the threat model for SEV.
>>>
>>> My operating assumption is that we should keep memory as tightly
>>> encrypted as possible at the firmware --> OS control transfer. It could
>>> be an exaggerated expectation from my side; I'd just better be safe than
>>> sorry :)
>>>
>>>
>>
>> Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then
>> we can discuss a security model (if needed).
>>
>> SEV is an extension to the AMD-V architecture which supports running encrypted
>> virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have their
>> pages (code and data) secured such that only the guest itself has access to the
>> unencrypted version. Each encrypted VMs is associated with a unique encryption
>> key; if its data is accessed by a different entity using a different key the
>> encrypted guest data will be incorrectly decrypted, leading to unintelligible data.
>> You can also find some detail for SEV in white paper [1].
>>
>> SEV encrypted Vs have the concept of private and shared memory. The private memory
>> is encrypted with the guest-specific key, while shared memory may be encrypted
>> with hypervisor key. SEV guest VMs can choose which pages they would like to
>> be private. But the instruction pages and guest page tables are always treated
>> as private by the hardware. The DMA operation inside the guest must be performed
>> on shared pages -- this is mainly because in virtualization world the device
>> DMA needs some assistance from the hypervisor.
>>
>> The security model provided by the SEV ensure that hypervisor will no longer able
>> to inspect or alter any guest code or data. The guest OS controls what it want to
>> share with outside world (in this case with Hypervisor).
>>
>> In software implementation we took approach to encrypt everything possible starting
>> early in boot. In this approach whenever OVMF wants to perform certain DMA operations
>> it allocate a shared page, issues the command, free the shared page after the command
>> is completed (in other word we use sw bounce buffer to complete the DMA operation).
>>
>> We have implemented IOMMU driver to provide the following functions:
>>
>> AllocateBuffer():
>> --------------------
>> it allocate a private pages, as per UEFI spec the driver will map the buffer allocated
>> from this routine using BusMasterCommonBuffer hence we allocate extra stash pages
>> in addition to requested pages.
>>
>>
>> Map
>> ----
>> If requested operation is BusMasterRead or BusMasterWrite then we allocate a shared buffer
>> and DeviceAddress point to shared buffer.
>>
>> If requested operation is BusMasterCommonBuffer then we perform in-place decryption of the
>> contents and update the page-table to clear the C-bit (basically make it shared page)
>>
>> Unmap
>> ------
>> If operation was BusMasterRead or BusMasterWrite then we complete the unmapping and free
>> the shared buffer allocated in Map().
>>
>> If operation was BusMasterCommonBuffer then we perform in-place encryption and set the C-bit
>> (basically make the page private)
>>
>> FreeBuffer:
>> -----------
>> Free the pages
>>
>>
>> Lets run with the below scenario:
>>
>> 1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA read)
>> 2) hypervisor completes the request operation
>> 3) hypervisor caches the shared physical address and monitor it for information leak
>> 4) sometime later if guest write data in its "shared" physical address then hypervisor can
>>      easily read it without guest knowledge.
>>
>> SEV hardware can protect us against the attack where someone tries to inject or alter the
>> guest code. As I noted above any instruction fetch is forced as private hence if attacker
>> write some code into a shared buffer and point the RIP to his/her code then instruction
>> fetch will try to decrypt it and get unintelligible opcode. If a page was marked as "shared"
>> then its guest responsibility to make it "private" after its done communicating with
>> hypervisor.
>>
>> [1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
>>
>>
>>>> If the C-bit is cleared for a read/write buffer, is the content in the
>>>> read/write buffer also exposed to hypervisor?
>>>
>>> Not immediately. Only when the guest also rewrites the memory through
>>> the appropriate virtual address.
>>>
>>> Basically, in the virtual machine, the C-bit in a PTE that covers a
>>> particular guest virtual address (GVA) controls whether a guest write
>>> through that GVA will result in memory encryption, and if a gues read
>>> through that GVA will result in memory decryption.
>>>
>>> The current state of the C-bit doesn't matter for the hypervisor, what
>>> matters is the last guest write through a GVA whose PTE had the C-bit
>>> set, or clear. If the C-bit was clear when the guest last wrote the
>>> area, then the hypervisor can read the data. If the C-bit was set, then
>>> the hypervisor can only read garbage.
>>>
>>>
>>>> I means if there is security concern, the concern should be applied to
>>>> both common buffer and read/write buffer.
>>>> Then we have to figure a way to resolve both buffer.
>>>
>>> Yes, this is correct. The PciIo.Unmap operation, as currently
>>> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
>>> correctly for all operations, but it can only guarantee *not* freeing
>>> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
>>> is safe, while Unmap()ing Read/Write is not. The encryption would be
>>> re-established just fine for Read/Write as well, but we would change the
>>> UEFI memmap.
>>>
>>> In OVMF, we currently manage this problem by making all asynchronous DMA
>>> mappings CommonBuffer, even if they could othewise be Read or Write. We
>>> use Read and Write only if the DMA operation completes before the
>>> higher-level protocol function returns (so we can immediately Unmap the
>>> operation, and the ExitBootServices() handler doesn't have to care).
>>>
>>>
>>>> To be honest, that is exactly my biggest question on this patch:
>>>> why do we only handle the common buffer specially?
>>>
>>> For the following reason:
>>>
>>> - Given that CommonBuffer mappings take a separate AllocateBuffer() /
>>> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
>>> implementors -- beyond what the UEFI spec requries -- to keep *all*
>>> dynamic memory management out of Map/Unmap. If they need dynamic memory
>>> management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
>>> This way Unmap is safe in ExitBootServices handlers.
>>>
>>> - We cannot *reasonably ask* PciIo implementors to provide the same
>>> guarantee for Read and Write mappings, simply because there are no other
>>> APIs that could manage the dynamic memory for Map and Unmap --
>>> AllocateBuffer and FreeBuffer are not required for Read and Write
>>> mappings. PciIo implementors couldn't agree to our request even if they
>>> wanted to. Therefore Unmapping Read/Write operations is inherently
>>> unsafe in EBS handlers.
>>>
>>>
>>>>> NOTE: The device should still be halt state, because the device is
>>>>> also controlled by OS.
>>>>
>>>> This is the key point -- the device is *not* controlled by the guest OS,
>>>> in the worst case.
>>>>
>>>> The virtual device is ultimately implemented by the hypervisor. We don't
>>>> expect the virtual device (= the hypervisor) to mis-behave on purpose.
>>>> However, if the hypervisor is compromised by an external attacker (for
>>>> example over the network, or via privilege escalation from another
>>>> guest), then all guest RAM that has not been encrypted up to that point
>>>> becomes visible to the attacker. In other words, the hypervisor ("the
>>>> device") becomes retro-actively malicious.
>>>> [Jiewen] If that is the threat model, I have a question on the exposure:
>>>> 1) If the concern is for existing data, it means all DMA data already written.
>>>> However, the DMA data is already consumed or produced by virtual device
>>>> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.
>>>
>>> Maybe, maybe not. The data may have been sent to a different host over
>>> the network, and wiped from memory.
>>>
>>> Or, the data may have been written to a disk image that is separately
>>> encrypted by the host OS, and has been detached (unplugged) from the
>>> guest, and also unmounted from the host, with the disk key securely
>>> wiped from host memory.
>>>
>>> We can also imagine the reverse direction. Assume that the user of the
>>> virtual machine goes into the UEFI shell in OVMF, starts the EDIT
>>> program, and types some secret information into a text file on the ESP.
>>> Further assume that the disk image is encrypted on the host OS. It is
>>> conceivable that fragments of the secret could remain stuck in the AHCI
>>> (disk) and USB (keyboard) DMA buffers.
>>>
>>> Maybe you think that these are "made up" examples, and I agree, I just
>>> made them up. The point is, it is not my place to discount possible
>>> attack vectors (I haven't designed SEV). Attackers will be limited by
>>> their *own* imaginations only, not by mine :)
>>>
>>>
>>>> 2) if the concern is for future data, it means all user data to be written.
>>>> However, the C-bit already prevent that.
>>>
>>> As far as I understand SEV, future data is out of scope. The goal is to
>>> prevent *retroactive* guest data leaks (= whatever is currently in host
>>> OS memory) if an attacker compromises an otherwise non-malicious hypervisor.
>>>
>>> If an attacker compromises a virtualization host, then they can just sit
>>> silent and watch data flow into and out of guests from that point
>>> onward, because they can then monitor all DMA (which always happens in
>>> clear-text) real-time.
>>>
>>>
>>>> Maybe I do not fully understand the threat model defined.
>>>> If you can explain more, that would be great.
>>>
>>> Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
>>> correct me if I'm wrong.
>>>
>>>
>>>> The point of SEV is to keep as much guest data encrypted at all times as
>>>> possible, so that *whenever* the hypervisor is compromised by an
>>>> attacker, the guest memory -- which the attacker can inevitably dump
>>>> from the host side -- remains un-intellegible to the attacker.
>>>> [Jiewen] OK. If this is a security question, I suggest we define a clear
>>>> threat model at first.
>>>> Or what we are doing might be unnecessary or insufficient.
>>>
>>> I agree.
>>>
>>> As I said above, my operating principle has been to re-encrypt all DMA
>>> buffers as soon as possible. For long-standing DMA buffers, re-encrypt
>>> them at ExitBootServices at the latest.
>>>
>>>
>>>> [Jiewen] For "require that Unmap() work for CommonBuffer
>>>> operations without releasing memory", I would hold my opinion until
>>>> it is documented clearly in UEFI spec.
>>>
>>> You are right, of course.
>>>
>>> It's just that we cannot avoid exploring ways, for this feature, that
>>> currently fall outside of the spec. Whatever we do here will be outside
>>> of the spec, one way or another. I suggested what I thought could be a
>>> reasonable extension to the spec, one that could be implemented by PciIo
>>> implementors even before the spec is modified.
>>>
>>> edk2's PciIo.Unmap already behaves like this, without breaking the spec.
>>>
>>> If there's a better way -- for example modifying CoreExitBootServices()
>>> in edk2, to signal IOMMU drivers separately, *after* notifying other
>>> ExitBootServices() handlers --, that might work as well.
>>>
>>>
>>>> My current concern is:
>>>> First, this sentence is NOT defined in UEFI specification.
>>>
>>> Correct.
>>>
>>>
>>>> Second, it might break an existing implementation or existing feature, such as tracing.
>>>
>>> Correct.
>>>
>>>> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
>>>> to call memory services.
>>>> The only restriction is
>>>> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
>>>> 2) AP restriction, where no UEFI service/protocol is allowed for AP.
>>>
>>> I agree.
>>>
>>>
>>>> I'm just trying to operate with the APIs currently defined by the UEFI
>>>> spec, and these assumptions were the best I could come up with.
>>>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
>>>> Especially the IHV Option ROM should not consume any private API.
>>>
>>> I disagree about "only follow". If there are additional requirements
>>> that can be satisfied without breaking the spec, driver implementors can
>>> choose to conform to both sets of requirements.
>>>
>>> For example (if I understand correctly), Microsoft / Windows present a
>>> bunch of requirements *beyond* the UEFI spec, for both platform and
>>> add-on card firmware. Most vendors conform :)
>>>
>>>
>>>>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>>>>> guest memory should be encrypted again, even those areas that were once
>>>>> used as CommonBuffers."
>>>>> For SEV case, I think it should be controlled by OS, because it is just CR3.
>>>>
>>>> If it was indeed just CR3, then I would fully agree with you.
>>>>
>>>> But -- to my understanding --, ensuring that the memory is actually
>>>> encrypted requires that the guest *write* to the memory through a
>>>> virtual address whose PTE has the C-bit set.
>>>>
>>>> And I don't think the guest OS can be expected to rewrite all of its
>>>> memory at launch. :(
>>>>
>>>> [Jiewen] That is fine.
>>>> I suggest we get clear on the threat model as the first step.
>>>> The threat model for SEV usage in OVMF.
>>>>
>>>> I am sorry if that is already discussed before. I might ignore the conversation.
>>>
>>> No problem; it's always good to re-investigate our assumptions and
>>> operating principles.
>>>
>>>
>>>> If you or any SEV owner can share the insight, that will be great.
>>>> See my question above "If that is the threat model, I have a question on the exposure:..."
>>>
>>> I shared *my* impressions of the threat model (which are somewhat
>>> unclear, admittedly, and thus could make me overly cautious).
>>>
>>> I hope Brijesh can extend and/or correct my description.
>>>
>>>
>>>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>>>>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>>>>> SEV forbids all DMA unless configured otherwise.
>>>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>>>>> I setup translation table, but mark all entry to be NOT-PRESENT.
>>>>> I grant DMA access only if there is a specific request by a device.
>>>>>
>>>>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>>>>> the existing OS, which might not have knowledge on IOMMU.
>>>>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>>>>> turn off IOMMU, or keep it enabled at ExitBootService.
>>>>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>>>>
>>>> Thanks for the clarification.
>>>>
>>>> But, again, will this not lead to the possibility that the VT-d IOMMU's
>>>> ExitBootServices() handler disables all DMA *before* the PCI drivers get
>>>> a chance to run their own ExitBootServices() handlers, disabling the
>>>> individual devices?
>>>> [Jiewen] Sorry for the confusing. Let me explain:
>>>> No, VTd never disables *all* DMA buffer at ExitBootService.
>>>>
>>>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
>>>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
>>>> "only allow the DMA from Map() request".
>>>
>>> Okay, but this interpretation was exactly what I thought of first (see
>>> above): "VT-d permits all DMA unless configured otherwise". You answered
>>> that it wasn't the case.
>>>
>>> So basically, if I understand it correctly now, at ExitBootServices the
>>> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
>>>
>>> That is equivalent to decrypting all memory under SEV, and is the exact
>>> opposite of what we want. (As I understand it.)
>>>
>>>
>>>> If that happens, then a series of IOMMU faults could be generated, which
>>>> I described above. (I.e., such IOMMU faults could result, at least in
>>>> the case of SEV, in garbage being written to disk, via queued commands.)
>>>> [Jiewen] You are right. That would not happen.
>>>> IOMMU driver should not bring any compatibility problem for the PCI driver,
>>>> iff the PCI driver follows the UEFI specification.
>>>
>>> Agreed.
>>>
>>>
>>>> Now, to finish up, here's an idea I just had.
>>>>
>>>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
>>>> notification function?
>>>>
>>>> If we are, then we could do the following:
>>>>
>>>> * PciIo.Unmap() and friends would work as always (no restrictions on
>>>>     dynamic memory allocation or freeing, for any kind of DMA operation).
>>>>
>>>> * PCI drivers would not be expected to call PciIo.Unmap() in their
>>>>     ExitBootServices() handlers.
>>>>
>>>> * The IOMMU driver would have an ExitBootServices() notification
>>>>     function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>>>>     (doesn't matter which).
>>>>
>>>> * In this notification function, the IOMMU driver would signal *another*
>>>>     event (a private one). The notification function for this event would
>>>>     be enqueued strictly at the TPL_CALLBACK level.
>>>>
>>>> * The notification function for the second event (private to the IOMMU)
>>>>     would iterate over all existent mappings, and unmap them, without
>>>>     allocating or freeing memory.
>>>>
>>>> The key point is that by signaling the second event, the IOMMU driver
>>>> could order its own cleanup code after all other ExitBootServices()
>>>> callbacks. (Assuming, at least, that no other ExitBootServices()
>>>> callback employs the same trick to defer itself!) Therefore by the time
>>>> the second callback runs, all PCI devices have been halted, and it is
>>>> safe to tear down the translations.
>>>>
>>>> The ordering would be ensured by the following:
>>>>
>>>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>>>>     to be signaled before the first notification action is taken." This
>>>>     means that, by the time the IOMMU's ExitBootServices() handler is
>>>>     entered, all other ExitBootServices() handlers have been *queued* at
>>>>     least, at TPL_CALLBACK or TPL_NOTIFY.
>>>>
>>>> - Therefore, when we signal our second callback from there, for
>>>>     TPL_CALLBACK, the second callback will be queued at the end of the
>>>>     TPL_CALLBACK queue.
>>>>
>>>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>>>>     functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>>>>     for the Type argument of CreateEvent." So it wouldn't matter if a
>>>>     driver used CreateEvent() or CreateEventEx() for setting up its own
>>>>     handler, the handler would be queued just the same.
>>>>
>>>> I think this is ugly and brittle, but perhaps the only way to clean up
>>>> *all* translations safely, with regard to PciIo.Unmap() +
>>>> ExitBootServices(), without updating the UEFI spec.
>>>>
>>>> What do you think?
>>>>
>>>> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
>>>> We do not need update PCI driver and we do not need update UEFI spec.
>>>> We only need update IOMMU driver which is concerned, based upon the threat model.
>>>> That probably is best solution. :-)
>>>
>>> I'm very glad to hear that you like the idea.
>>>
>>> However, it depends on whether we are permitted, by the UEFI spec, to
>>> signal another event in an ExitBootServices() notification function.
>>>
>>> Are we permitted to do that?
>>>
>>> Does the UEFI spec guarantee that the notification function for the
>>> *second* event will be queued just like it would be under "normal"
>>> circumstances?
>>>
>>> (I know we must not allocate or free memory in the notification function
>>> of the *second* event either; I just want to know if the second event's
>>> handler is *queued* like it would normally be.)
>>>
>>>
>>>> I assume you want to handle both common buffer and read/write buffer, right?
>>>
>>> Yes. Under this idea, all kinds of operations would be cleaned up.
>>>
>>>
>>>> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
>>>> If you or any SEV owner can share ...
>>>
>>> Absolutely. Please see above.
>>>
>>> Thank you!
>>> Laszlo
>>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
>
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
  2017-09-07 14:48                         ` Yao, Jiewen
@ 2017-09-07 16:40                           ` Laszlo Ersek
  0 siblings, 0 replies; 21+ messages in thread
From: Laszlo Ersek @ 2017-09-07 16:40 UTC (permalink / raw)
  To: Yao, Jiewen, Brijesh Singh, Zeng, Star, edk2-devel-01; +Cc: Dong, Eric

On 09/07/17 16:48, Yao, Jiewen wrote:
> Great. Thanks to confirm that. Now it is clear to me.
> 
> Then let's wait Laszlo's new patch to make all DMA buffer to private. :)

I've got the patches ready and smoke-tested. Let me look a bit more at
the logs, and re-review the patches. I hope I can post the series still
today (well, in my timezone anyway :) )

Thanks!
Laszlo

> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Brijesh Singh
> Sent: Thursday, September 7, 2017 10:40 PM
> To: Laszlo Ersek <lersek@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Zeng, Star <star.zeng@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
> Cc: brijesh.singh@amd.com; Dong, Eric <eric.dong@intel.com>
> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
> 
> Hi Jiewen,
> 
> On 09/07/2017 06:46 AM, Laszlo Ersek wrote:
>> On 09/07/17 06:46, Yao, Jiewen wrote:
>>> Thanks for the sharing, Brijesh.
>>>
>>> "If a page was marked as "shared"
>>> then its guest responsibility to make it "private" after its done communicating with
>>> hypervisor."
>>>
>>> I believe I have same understanding - a *guest* should guarantee that.
>>>
>>> My question is: During the *transition* from firmware to OS, *which guest* should make the shared buffer to be private? Is it "guest firmware" or "guest OS"?
>>>
>>> Maybe I can ask the specific question to get it more clear.
>>>
>>> 1)       If the common DMA buffer is not unmapped at ExitBootService, is that treated as an issue?
>>>
>>> 2)       If the read/write DMA buffer is not unmapped at ExitBootService, is that treated as an issue?
>>
>> Very good questions, totally to the point.
>>
>> On the authoritative answer, I defer to Brijesh.
>>
> 
> 
> Both the above cases (#1 and #2) are problems. Since buffers was owned by firmware
> and firmware made it "shared" hence firmware is responsible to mark them as private
> after its done with the buffer. In other words, we must call Unmap() from ExitBootServices
> to ensure that buffers mapped using BusMasterCommonBuffer/BusMasterRead/BusMasterWrite
> is marked as "private" before we make it available to the guest OS. (we do similar thing
> in Linux OS).
> 
> Having any kind of side channel to communicate the encryption status of a page
> will not work -- we should be able to support a usecase where we boot OVMF in
> 64-bit but launch 32-bit Linux guest. When Linux boots in 32-bit mode it does not have
> access to encryption bit (C-bit is bit-47 in page table) and can't mark the page as
> private (even if we provide some kind of side-channel).
> 
> thank you very much for all your help.
> 
>> (
>>
>> My personal opinion is that both situations (#1 and #2) are problems,
>> because they break the *practical* security invariant for SEV guests:
>>
>> - most memory should be encrypted at all times, *and*
>>
>> - any memory that is decrypted must have an owner that is responsible
>>    for re-encrypting the memory eventually.
>>
>> Therefore, *either* the firmware has to re-encrypt all remaining DMA
>> buffers at ExitBootServices(), *or* a new information channel must be
>> designed, from firmware to OS, to carry over the decryption status.
>>
>> I strongly prefer the first option, for the following reason: the same
>> questions apply to all EDK2 IOMMU protocol interfaces, not just the one
>> exported by the SEV driver.
>>
>> )
>>
>> Thanks,
>> Laszlo
>>
>>> From: Brijesh Singh [mailto:brijesh.singh@amd.com]
>>> Sent: Wednesday, September 6, 2017 11:40 PM
>>> To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>>
>>> Cc: brijesh.singh@amd.com<mailto:brijesh.singh@amd.com>; Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com>>
>>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>>
>>>
>>>
>>> On 09/06/2017 07:14 AM, Laszlo Ersek wrote:
>>>> On 09/06/17 06:37, Yao, Jiewen wrote:
>>>>> Thanks for the clarification. Comment in line.
>>>>>
>>>>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>>>>> Sent: Wednesday, September 6, 2017 1:57 AM
>>>>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>; edk2-devel-01 <edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>>
>>>>> Cc: Dong, Eric <eric.dong@intel.com<mailto:eric.dong@intel.com<mailto:eric.dong@intel.com%3cmailto:eric.dong@intel.com>>>; Brijesh Singh <brijesh.singh@amd.com<mailto:brijesh.singh@amd.com<mailto:brijesh.singh@amd.com%3cmailto:brijesh.singh@amd.com>>>
>>>>> Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()
>>>>
>>>>>> Then after ExitBootService, the OS will take control of CR3 and set correct
>>>>>> encryption bit.
>>>>>
>>>>> This is true, the guest OS will set up new page tables, and in those
>>>>> PTEs, the C-bit ("encrypt") will likely be set by default.
>>>>>
>>>>> However, the guest OS will not *rewrite* all of the memory, with the
>>>>> C-bit set. This means that any memory that the firmware didn't actually
>>>>> re-encrypt (in the IOMMU driver) will still expose stale data to the
>>>>> hypervisor.
>>>>> [Jiewen] That is an interesting question.
>>>>> Is there any security concern associated?
>>>>
>>>> I can't tell for sure. Answering this question is up to the proponents
>>>> of SEV, who have designed the threat model for SEV.
>>>>
>>>> My operating assumption is that we should keep memory as tightly
>>>> encrypted as possible at the firmware --> OS control transfer. It could
>>>> be an exaggerated expectation from my side; I'd just better be safe than
>>>> sorry :)
>>>>
>>>>
>>>
>>> Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then
>>> we can discuss a security model (if needed).
>>>
>>> SEV is an extension to the AMD-V architecture which supports running encrypted
>>> virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have their
>>> pages (code and data) secured such that only the guest itself has access to the
>>> unencrypted version. Each encrypted VMs is associated with a unique encryption
>>> key; if its data is accessed by a different entity using a different key the
>>> encrypted guest data will be incorrectly decrypted, leading to unintelligible data.
>>> You can also find some detail for SEV in white paper [1].
>>>
>>> SEV encrypted Vs have the concept of private and shared memory. The private memory
>>> is encrypted with the guest-specific key, while shared memory may be encrypted
>>> with hypervisor key. SEV guest VMs can choose which pages they would like to
>>> be private. But the instruction pages and guest page tables are always treated
>>> as private by the hardware. The DMA operation inside the guest must be performed
>>> on shared pages -- this is mainly because in virtualization world the device
>>> DMA needs some assistance from the hypervisor.
>>>
>>> The security model provided by the SEV ensure that hypervisor will no longer able
>>> to inspect or alter any guest code or data. The guest OS controls what it want to
>>> share with outside world (in this case with Hypervisor).
>>>
>>> In software implementation we took approach to encrypt everything possible starting
>>> early in boot. In this approach whenever OVMF wants to perform certain DMA operations
>>> it allocate a shared page, issues the command, free the shared page after the command
>>> is completed (in other word we use sw bounce buffer to complete the DMA operation).
>>>
>>> We have implemented IOMMU driver to provide the following functions:
>>>
>>> AllocateBuffer():
>>> --------------------
>>> it allocate a private pages, as per UEFI spec the driver will map the buffer allocated
>>> from this routine using BusMasterCommonBuffer hence we allocate extra stash pages
>>> in addition to requested pages.
>>>
>>>
>>> Map
>>> ----
>>> If requested operation is BusMasterRead or BusMasterWrite then we allocate a shared buffer
>>> and DeviceAddress point to shared buffer.
>>>
>>> If requested operation is BusMasterCommonBuffer then we perform in-place decryption of the
>>> contents and update the page-table to clear the C-bit (basically make it shared page)
>>>
>>> Unmap
>>> ------
>>> If operation was BusMasterRead or BusMasterWrite then we complete the unmapping and free
>>> the shared buffer allocated in Map().
>>>
>>> If operation was BusMasterCommonBuffer then we perform in-place encryption and set the C-bit
>>> (basically make the page private)
>>>
>>> FreeBuffer:
>>> -----------
>>> Free the pages
>>>
>>>
>>> Lets run with the below scenario:
>>>
>>> 1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA read)
>>> 2) hypervisor completes the request operation
>>> 3) hypervisor caches the shared physical address and monitor it for information leak
>>> 4) sometime later if guest write data in its "shared" physical address then hypervisor can
>>>      easily read it without guest knowledge.
>>>
>>> SEV hardware can protect us against the attack where someone tries to inject or alter the
>>> guest code. As I noted above any instruction fetch is forced as private hence if attacker
>>> write some code into a shared buffer and point the RIP to his/her code then instruction
>>> fetch will try to decrypt it and get unintelligible opcode. If a page was marked as "shared"
>>> then its guest responsibility to make it "private" after its done communicating with
>>> hypervisor.
>>>
>>> [1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
>>>
>>>
>>>>> If the C-bit is cleared for a read/write buffer, is the content in the
>>>>> read/write buffer also exposed to hypervisor?
>>>>
>>>> Not immediately. Only when the guest also rewrites the memory through
>>>> the appropriate virtual address.
>>>>
>>>> Basically, in the virtual machine, the C-bit in a PTE that covers a
>>>> particular guest virtual address (GVA) controls whether a guest write
>>>> through that GVA will result in memory encryption, and if a gues read
>>>> through that GVA will result in memory decryption.
>>>>
>>>> The current state of the C-bit doesn't matter for the hypervisor, what
>>>> matters is the last guest write through a GVA whose PTE had the C-bit
>>>> set, or clear. If the C-bit was clear when the guest last wrote the
>>>> area, then the hypervisor can read the data. If the C-bit was set, then
>>>> the hypervisor can only read garbage.
>>>>
>>>>
>>>>> I means if there is security concern, the concern should be applied to
>>>>> both common buffer and read/write buffer.
>>>>> Then we have to figure a way to resolve both buffer.
>>>>
>>>> Yes, this is correct. The PciIo.Unmap operation, as currently
>>>> implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
>>>> correctly for all operations, but it can only guarantee *not* freeing
>>>> memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
>>>> is safe, while Unmap()ing Read/Write is not. The encryption would be
>>>> re-established just fine for Read/Write as well, but we would change the
>>>> UEFI memmap.
>>>>
>>>> In OVMF, we currently manage this problem by making all asynchronous DMA
>>>> mappings CommonBuffer, even if they could othewise be Read or Write. We
>>>> use Read and Write only if the DMA operation completes before the
>>>> higher-level protocol function returns (so we can immediately Unmap the
>>>> operation, and the ExitBootServices() handler doesn't have to care).
>>>>
>>>>
>>>>> To be honest, that is exactly my biggest question on this patch:
>>>>> why do we only handle the common buffer specially?
>>>>
>>>> For the following reason:
>>>>
>>>> - Given that CommonBuffer mappings take a separate AllocateBuffer() /
>>>> FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
>>>> implementors -- beyond what the UEFI spec requries -- to keep *all*
>>>> dynamic memory management out of Map/Unmap. If they need dynamic memory
>>>> management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
>>>> This way Unmap is safe in ExitBootServices handlers.
>>>>
>>>> - We cannot *reasonably ask* PciIo implementors to provide the same
>>>> guarantee for Read and Write mappings, simply because there are no other
>>>> APIs that could manage the dynamic memory for Map and Unmap --
>>>> AllocateBuffer and FreeBuffer are not required for Read and Write
>>>> mappings. PciIo implementors couldn't agree to our request even if they
>>>> wanted to. Therefore Unmapping Read/Write operations is inherently
>>>> unsafe in EBS handlers.
>>>>
>>>>
>>>>>> NOTE: The device should still be halt state, because the device is
>>>>>> also controlled by OS.
>>>>>
>>>>> This is the key point -- the device is *not* controlled by the guest OS,
>>>>> in the worst case.
>>>>>
>>>>> The virtual device is ultimately implemented by the hypervisor. We don't
>>>>> expect the virtual device (= the hypervisor) to mis-behave on purpose.
>>>>> However, if the hypervisor is compromised by an external attacker (for
>>>>> example over the network, or via privilege escalation from another
>>>>> guest), then all guest RAM that has not been encrypted up to that point
>>>>> becomes visible to the attacker. In other words, the hypervisor ("the
>>>>> device") becomes retro-actively malicious.
>>>>> [Jiewen] If that is the threat model, I have a question on the exposure:
>>>>> 1) If the concern is for existing data, it means all DMA data already written.
>>>>> However, the DMA data is already consumed or produced by virtual device
>>>>> or a hypervisor. If the hypervisor is attacked, it already gets all the data content.
>>>>
>>>> Maybe, maybe not. The data may have been sent to a different host over
>>>> the network, and wiped from memory.
>>>>
>>>> Or, the data may have been written to a disk image that is separately
>>>> encrypted by the host OS, and has been detached (unplugged) from the
>>>> guest, and also unmounted from the host, with the disk key securely
>>>> wiped from host memory.
>>>>
>>>> We can also imagine the reverse direction. Assume that the user of the
>>>> virtual machine goes into the UEFI shell in OVMF, starts the EDIT
>>>> program, and types some secret information into a text file on the ESP.
>>>> Further assume that the disk image is encrypted on the host OS. It is
>>>> conceivable that fragments of the secret could remain stuck in the AHCI
>>>> (disk) and USB (keyboard) DMA buffers.
>>>>
>>>> Maybe you think that these are "made up" examples, and I agree, I just
>>>> made them up. The point is, it is not my place to discount possible
>>>> attack vectors (I haven't designed SEV). Attackers will be limited by
>>>> their *own* imaginations only, not by mine :)
>>>>
>>>>
>>>>> 2) if the concern is for future data, it means all user data to be written.
>>>>> However, the C-bit already prevent that.
>>>>
>>>> As far as I understand SEV, future data is out of scope. The goal is to
>>>> prevent *retroactive* guest data leaks (= whatever is currently in host
>>>> OS memory) if an attacker compromises an otherwise non-malicious hypervisor.
>>>>
>>>> If an attacker compromises a virtualization host, then they can just sit
>>>> silent and watch data flow into and out of guests from that point
>>>> onward, because they can then monitor all DMA (which always happens in
>>>> clear-text) real-time.
>>>>
>>>>
>>>>> Maybe I do not fully understand the threat model defined.
>>>>> If you can explain more, that would be great.
>>>>
>>>> Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
>>>> correct me if I'm wrong.
>>>>
>>>>
>>>>> The point of SEV is to keep as much guest data encrypted at all times as
>>>>> possible, so that *whenever* the hypervisor is compromised by an
>>>>> attacker, the guest memory -- which the attacker can inevitably dump
>>>>> from the host side -- remains un-intellegible to the attacker.
>>>>> [Jiewen] OK. If this is a security question, I suggest we define a clear
>>>>> threat model at first.
>>>>> Or what we are doing might be unnecessary or insufficient.
>>>>
>>>> I agree.
>>>>
>>>> As I said above, my operating principle has been to re-encrypt all DMA
>>>> buffers as soon as possible. For long-standing DMA buffers, re-encrypt
>>>> them at ExitBootServices at the latest.
>>>>
>>>>
>>>>> [Jiewen] For "require that Unmap() work for CommonBuffer
>>>>> operations without releasing memory", I would hold my opinion until
>>>>> it is documented clearly in UEFI spec.
>>>>
>>>> You are right, of course.
>>>>
>>>> It's just that we cannot avoid exploring ways, for this feature, that
>>>> currently fall outside of the spec. Whatever we do here will be outside
>>>> of the spec, one way or another. I suggested what I thought could be a
>>>> reasonable extension to the spec, one that could be implemented by PciIo
>>>> implementors even before the spec is modified.
>>>>
>>>> edk2's PciIo.Unmap already behaves like this, without breaking the spec.
>>>>
>>>> If there's a better way -- for example modifying CoreExitBootServices()
>>>> in edk2, to signal IOMMU drivers separately, *after* notifying other
>>>> ExitBootServices() handlers --, that might work as well.
>>>>
>>>>
>>>>> My current concern is:
>>>>> First, this sentence is NOT defined in UEFI specification.
>>>>
>>>> Correct.
>>>>
>>>>
>>>>> Second, it might break an existing implementation or existing feature, such as tracing.
>>>>
>>>> Correct.
>>>>
>>>>> Till now, I did not see any restriction in UEFI spec say: In this function, you are not allowed
>>>>> to call memory services.
>>>>> The only restriction is
>>>>> 1) TPL restriction, where memory service must be <= TPL_NOTIFY.
>>>>> 2) AP restriction, where no UEFI service/protocol is allowed for AP.
>>>>
>>>> I agree.
>>>>
>>>>
>>>>> I'm just trying to operate with the APIs currently defined by the UEFI
>>>>> spec, and these assumptions were the best I could come up with.
>>>>> [Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
>>>>> Especially the IHV Option ROM should not consume any private API.
>>>>
>>>> I disagree about "only follow". If there are additional requirements
>>>> that can be satisfied without breaking the spec, driver implementors can
>>>> choose to conform to both sets of requirements.
>>>>
>>>> For example (if I understand correctly), Microsoft / Windows present a
>>>> bunch of requirements *beyond* the UEFI spec, for both platform and
>>>> add-on card firmware. Most vendors conform :)
>>>>
>>>>
>>>>>> [Jiewen] I am not sure who will control "When control is transferred to the OS, all
>>>>>> guest memory should be encrypted again, even those areas that were once
>>>>>> used as CommonBuffers."
>>>>>> For SEV case, I think it should be controlled by OS, because it is just CR3.
>>>>>
>>>>> If it was indeed just CR3, then I would fully agree with you.
>>>>>
>>>>> But -- to my understanding --, ensuring that the memory is actually
>>>>> encrypted requires that the guest *write* to the memory through a
>>>>> virtual address whose PTE has the C-bit set.
>>>>>
>>>>> And I don't think the guest OS can be expected to rewrite all of its
>>>>> memory at launch. :(
>>>>>
>>>>> [Jiewen] That is fine.
>>>>> I suggest we get clear on the threat model as the first step.
>>>>> The threat model for SEV usage in OVMF.
>>>>>
>>>>> I am sorry if that is already discussed before. I might ignore the conversation.
>>>>
>>>> No problem; it's always good to re-investigate our assumptions and
>>>> operating principles.
>>>>
>>>>
>>>>> If you or any SEV owner can share the insight, that will be great.
>>>>> See my question above "If that is the threat model, I have a question on the exposure:..."
>>>>
>>>> I shared *my* impressions of the threat model (which are somewhat
>>>> unclear, admittedly, and thus could make me overly cautious).
>>>>
>>>> I hope Brijesh can extend and/or correct my description.
>>>>
>>>>
>>>>>> So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
>>>>>> are opposites -- VT-d permits all DMA unless configured otherwise, while
>>>>>> SEV forbids all DMA unless configured otherwise.
>>>>>> [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable all DMA access.
>>>>>> I setup translation table, but mark all entry to be NOT-PRESENT.
>>>>>> I grant DMA access only if there is a specific request by a device.
>>>>>>
>>>>>> I open DMA access in ExitBootServices, just want to make sure it is compatible to
>>>>>> the existing OS, which might not have knowledge on IOMMU.
>>>>>> I will provide a patch to make it a policy very soon. As such VTd IOMMU may
>>>>>> turn off IOMMU, or keep it enabled at ExitBootService.
>>>>>> An IOMMU aware OS may take over IOMMU directly and reprogram it.
>>>>>
>>>>> Thanks for the clarification.
>>>>>
>>>>> But, again, will this not lead to the possibility that the VT-d IOMMU's
>>>>> ExitBootServices() handler disables all DMA *before* the PCI drivers get
>>>>> a chance to run their own ExitBootServices() handlers, disabling the
>>>>> individual devices?
>>>>> [Jiewen] Sorry for the confusing. Let me explain:
>>>>> No, VTd never disables *all* DMA buffer at ExitBootService.
>>>>>
>>>>> "disable VTd" means to "disable IOMMU protection" and "allow all DMA".
>>>>> "Keep VTd enabled" means to "keep IOMMU protection enabled" and
>>>>> "only allow the DMA from Map() request".
>>>>
>>>> Okay, but this interpretation was exactly what I thought of first (see
>>>> above): "VT-d permits all DMA unless configured otherwise". You answered
>>>> that it wasn't the case.
>>>>
>>>> So basically, if I understand it correctly now, at ExitBootServices the
>>>> VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
>>>>
>>>> That is equivalent to decrypting all memory under SEV, and is the exact
>>>> opposite of what we want. (As I understand it.)
>>>>
>>>>
>>>>> If that happens, then a series of IOMMU faults could be generated, which
>>>>> I described above. (I.e., such IOMMU faults could result, at least in
>>>>> the case of SEV, in garbage being written to disk, via queued commands.)
>>>>> [Jiewen] You are right. That would not happen.
>>>>> IOMMU driver should not bring any compatibility problem for the PCI driver,
>>>>> iff the PCI driver follows the UEFI specification.
>>>>
>>>> Agreed.
>>>>
>>>>
>>>>> Now, to finish up, here's an idea I just had.
>>>>>
>>>>> Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
>>>>> notification function?
>>>>>
>>>>> If we are, then we could do the following:
>>>>>
>>>>> * PciIo.Unmap() and friends would work as always (no restrictions on
>>>>>     dynamic memory allocation or freeing, for any kind of DMA operation).
>>>>>
>>>>> * PCI drivers would not be expected to call PciIo.Unmap() in their
>>>>>     ExitBootServices() handlers.
>>>>>
>>>>> * The IOMMU driver would have an ExitBootServices() notification
>>>>>     function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
>>>>>     (doesn't matter which).
>>>>>
>>>>> * In this notification function, the IOMMU driver would signal *another*
>>>>>     event (a private one). The notification function for this event would
>>>>>     be enqueued strictly at the TPL_CALLBACK level.
>>>>>
>>>>> * The notification function for the second event (private to the IOMMU)
>>>>>     would iterate over all existent mappings, and unmap them, without
>>>>>     allocating or freeing memory.
>>>>>
>>>>> The key point is that by signaling the second event, the IOMMU driver
>>>>> could order its own cleanup code after all other ExitBootServices()
>>>>> callbacks. (Assuming, at least, that no other ExitBootServices()
>>>>> callback employs the same trick to defer itself!) Therefore by the time
>>>>> the second callback runs, all PCI devices have been halted, and it is
>>>>> safe to tear down the translations.
>>>>>
>>>>> The ordering would be ensured by the following:
>>>>>
>>>>> - The UEFI spec says under CreateEventEx(), "All events are guaranteed
>>>>>     to be signaled before the first notification action is taken." This
>>>>>     means that, by the time the IOMMU's ExitBootServices() handler is
>>>>>     entered, all other ExitBootServices() handlers have been *queued* at
>>>>>     least, at TPL_CALLBACK or TPL_NOTIFY.
>>>>>
>>>>> - Therefore, when we signal our second callback from there, for
>>>>>     TPL_CALLBACK, the second callback will be queued at the end of the
>>>>>     TPL_CALLBACK queue.
>>>>>
>>>>> - CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
>>>>>     functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
>>>>>     for the Type argument of CreateEvent." So it wouldn't matter if a
>>>>>     driver used CreateEvent() or CreateEventEx() for setting up its own
>>>>>     handler, the handler would be queued just the same.
>>>>>
>>>>> I think this is ugly and brittle, but perhaps the only way to clean up
>>>>> *all* translations safely, with regard to PciIo.Unmap() +
>>>>> ExitBootServices(), without updating the UEFI spec.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> [Jiewen] If the order is correct, and all PCI device driver is halt at ExitBootServices, that works.
>>>>> We do not need update PCI driver and we do not need update UEFI spec.
>>>>> We only need update IOMMU driver which is concerned, based upon the threat model.
>>>>> That probably is best solution. :-)
>>>>
>>>> I'm very glad to hear that you like the idea.
>>>>
>>>> However, it depends on whether we are permitted, by the UEFI spec, to
>>>> signal another event in an ExitBootServices() notification function.
>>>>
>>>> Are we permitted to do that?
>>>>
>>>> Does the UEFI spec guarantee that the notification function for the
>>>> *second* event will be queued just like it would be under "normal"
>>>> circumstances?
>>>>
>>>> (I know we must not allocate or free memory in the notification function
>>>> of the *second* event either; I just want to know if the second event's
>>>> handler is *queued* like it would normally be.)
>>>>
>>>>
>>>>> I assume you want to handle both common buffer and read/write buffer, right?
>>>>
>>>> Yes. Under this idea, all kinds of operations would be cleaned up.
>>>>
>>>>
>>>>> And if possible, I still have interest to get clear on the threat model for SEV in OVMF.
>>>>> If you or any SEV owner can share ...
>>>>
>>>> Absolutely. Please see above.
>>>>
>>>> Thank you!
>>>> Laszlo
>>>>
>>> _______________________________________________
>>> edk2-devel mailing list
>>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
>>> https://lists.01.org/mailman/listinfo/edk2-devel
>>>
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-09-07 16:38 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-03 19:54 [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() Laszlo Ersek
2017-09-03 19:54 ` [PATCH 1/4] MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot services Laszlo Ersek
2017-09-03 19:54 ` [PATCH 2/4] MdeModulePkg/EhciDxe: " Laszlo Ersek
2017-09-03 19:54 ` [PATCH 3/4] MdeModulePkg/XhciDxe: " Laszlo Ersek
2017-09-03 19:54 ` [PATCH 4/4] MdeModulePkg/AtaAtapiPassThru: unmap common buffers at ExitBootServices() Laszlo Ersek
2017-09-04 10:36 ` [PATCH 0/4] MdeModulePkg: some PCI HC drivers: " Zeng, Star
2017-09-04 21:19   ` Laszlo Ersek
2017-09-05  2:18     ` Yao, Jiewen
2017-09-05  9:15       ` Laszlo Ersek
2017-09-05 13:44         ` Yao, Jiewen
2017-09-05 17:57           ` Laszlo Ersek
2017-09-06  4:37             ` Yao, Jiewen
2017-09-06 12:14               ` Laszlo Ersek
2017-09-06 15:39                 ` Brijesh Singh
2017-09-07  4:46                   ` Yao, Jiewen
2017-09-07 11:46                     ` Laszlo Ersek
2017-09-07 14:40                       ` Brijesh Singh
2017-09-07 14:48                         ` Yao, Jiewen
2017-09-07 16:40                           ` Laszlo Ersek
2017-09-07  4:34                 ` Yao, Jiewen
2017-09-07 12:11                   ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox