From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 1426F21C9126E for ; Tue, 25 Jul 2017 11:15:51 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 38639C0641E7; Tue, 25 Jul 2017 18:17:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 38639C0641E7 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=lersek@redhat.com Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-22.phx2.redhat.com [10.3.116.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8859A619D0; Tue, 25 Jul 2017 18:17:45 +0000 (UTC) To: Brijesh Singh , edk2-devel@lists.01.org Cc: Tom Lendacky , Jordan Justen , Jason Wang , "Michael S . Tsirkin" , Gerd Hoffmann , Ard Biesheuvel References: <1500502151-13508-1-git-send-email-brijesh.singh@amd.com> From: Laszlo Ersek Message-ID: <841bec5f-6f6e-8b1f-25ba-0fd37a915b72@redhat.com> Date: Tue, 25 Jul 2017 20:17:44 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <1500502151-13508-1-git-send-email-brijesh.singh@amd.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 25 Jul 2017 18:17:52 +0000 (UTC) Subject: Re: [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jul 2017 18:15:51 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Adding Ard On 07/20/17 00:09, Brijesh Singh wrote: > I have found that OVMF fails to detect the disk when iommu_platform is set from > qemu cli. The failure occurs during the feature bit negotiation. > > Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch series introduced > a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute and methods to > allocate, free, map and unmap the DMA memory for the master bus devices > > In this patch series, I have tried to enable the IOMMU_PLATFORM feature for > VirtioBlkDevice. I am sending this as RFC to seek feedback before I extend the support > for other Virtio devices. The patch has been tested in SEV guest - mainly because > IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If needed, I can > extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests. > > qemu cli used for testing: > > # $QEMU \ > ... > -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \ > -device virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off > ... > > Cc: Jordan Justen > Cc: Laszlo Ersek > Cc: Jason Wang > Cc: Michael S. Tsirkin > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Brijesh Singh > > Brijesh Singh (3): > OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit > OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support > OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support In the course of processing my post-PTO email backlog, yesterday I skimmed this topic very quickly, and thought about it for an hour or so (while away for the computer). Here's my take on it. (1) The closest to any formal language that I found for VIRTIO_F_IOMMU_PLATFORM are: https://issues.oasis-open.org/browse/VIRTIO-154 https://lists.oasis-open.org/archives/virtio-dev/201610/msg00121.html Are these references up-to-date? The ticket also references SVN r587 as the resolution, but that link is dead. (2) A few weeks back I saw Jason's SeaBIOS patch (also pointed out to me more recently by Brijesh, in private): [SeaBIOS] [PATCH] virtio: IOMMU support I don't understand this patch. (I also don't understand the "iommu_platform" device property on the QEMU command line.) According to the spec language quoted from the mailing list above, we have four cases: (2.1) device does not offer VIRITO_F_IOMMU_PLATFORM --> everything works like before (2.2) device offers VIRITO_F_IOMMU_PLATFORM, but the driver does not negotiate it --> device is allowed to reject functioning * device offers VIRITO_F_IOMMU_PLATFORM and the driver negotiates it: there are two possibilities: (2.3) the driver *disables* the IOMMU, and works like before (2.4) the driver *configures* the IOMMU and works accordingly The SeaBIOS patch negotiates VIRITO_F_IOMMU_PLATFORM, but it *neither* disables *nor* configures the IOMMU. It simply *ignores* the IOMMU. I don't see how that is any different *in effect* from simply not negotiating VIRITO_F_IOMMU_PLATFORM -- case (2.2) --, and I don't understand why QEMU tolerates "ignoring the IOMMU" differently from "not negotiating the flag". (3) *If* we indeed just wanted to follow the SeaBIOS patch, then VIRTIO_F_IOMMU_PLATFORM should be treated in OVMF *precisely* in parallel with VIRTIO_F_VERSION_1. (4) I'm confused by the intersection of the SEV and VIRTIO_F_IOMMU_PLATFORM use cases. The IoMmu DXE driver added in edk2 commit f9d129e68a45 ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06) is specific to SEV. In SEV-less guests, the IoMmu DXE driver will not install the IOMMU protocol, but the QEMU command line may still contain the "iommu_platform=true" property. For example, as far as I recall, DPDK guest payloads use an emulated "viommu" device. For this OVMF, would have to provide a separate IOMMU DXE driver, one that could actually interact with the "viommu" device. And there should be some coordination that exactly one instance of gEdkiiIoMmuProtocolGuid OR gIoMmuAbsentProtocolGuid be installed. I see that the SEV references are only in the blurb of the patch set; no actual commit message refers to SEV. That's OK; I just think the blurb is confusing. (5) Now I'm coming to my main point. The virtio device drivers in OvmfPkg are UEFI_DRIVER modules that conform to the UEFI driver model. They consume our home-grown the VIRTIO_DEVICE_PROTOCOL, and produce whatever UEFI protocol is appropriate on top (for example, EFI_BLOCK_IO_PROTOCOL in VirtioBlkDxe). Such a driver has no business talking to the platform's IOMMU protocol directly (even if there is one). Instead: (5.1) The VIRTIO_DEVICE_PROTOCOL has to be extended with the necessary AllocateSharedPages, FreeSharedPages, MapSharedPages, UnmapSharedPages functions. (5.2) All top-level virtio driver code, including VirtioLib, has to be rebased to the new VIRTIO_DEVICE_PROTOCOL member functions. This covers the ring memory itself, the memory pointed-to by the descriptors placed on the ring(s) -- i.e., the device specific requests --, and all further memory that is referenced by those device specific requests. This will result in a larger memory footprint, as all current pool allocations will be turned into page allocations, but I guess that is tolerable. (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM simply in parallel with VIRTIO_F_VERSION_1, and don't act upon VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically this is just my point (3) from above. (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2: - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional virtio-pci devices, and offers virtio 0.9.5 semantics. - "ArmVirtPkg/VirtioFdtDxe" (via "OvmfPkg/Library/VirtioMmioDeviceLib") binds virtio-mmio devices, and offers virtio 0.9.5 semantics. - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and offers virtio 1.0.0 semantics. The first two drivers should implement the AllocateSharedPages() and FreeSharedPages() member functions simply with the corresponding MemoryAllocationLib functions (using BootServicesData type memory), and implement the MapSharedPages() and UnmapSharedPages() member functions as no-ops (return the input addresses transparently). The third driver should implement all four new member functions by respectively delegating the job to: - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData -- - EFI_PCI_IO_PROTOCOL.FreeBuffer() - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 -- - EFI_PCI_IO_PROTOCOL.Unmap() The EFI_PCI_IO_PROTOCOL implementation will delegate these calls to the platform-specific PCI host bridge / root bridge driver, and *that* driver in turn is allowed to talk to an IOMMU protocol (if any). (This last step is already covered by the following edk2 commits: - generally, c15da8eb3587 ("MdeModulePkg/PciHostBridge: Add IOMMU support.", 2017-04-29), - specifically for SEV in OVMF, c6ab9aecb71b ("OvmfPkg: update PciHostBridgeDxe to use PlatformHasIoMmuLib", 2017-07-06).) (5.5) There's a delicate question that has to be considered with care; namely the ExitBootServices() callbacks in the virtio drivers. Currently these functions perform virtio resets. They cause the devices to forget all their configuration (including guest RAM references). Aborting in-flight DMA (e.g. in VirtioNetDxe) is a practice that is specifically recommended in the UEFI Driver Writers' Guide; plus at some point the guest kernel will reclaim and overwrite BootServicesData type memory. If at that point QEMU is still looking at some (originally firmware-allocated) areas as virtio rings, the results won't be amusing. (Speaking from experience.) Hence the resetting of virtio devices upon ExitBootServices(). Now, remember that ExitBootServices() callbacks *must not* change the UEFI memory map, so *no* memory allocations *must* be released in the callbacks. The virtio resets performed in the callbacks are surgical for this reason; nothing else is being done. The memory will be reclaimed by the OS, later. With an IOMMU in the picture, further actions become necessary: *after* the virtio reset, any buffers previously in use (including rings, device specific requests pointed-to by descriptors, and any further memory referenced by those requests) must be *unmapped*, but *not freed*. (Speaking in SEV terms, this will result in those memory areas seeing their C bits restored, without changing the UEFI memmap.) This means that: - the new VIRTIO_DEVICE_PROTOCOL.UnmapSharedPages() function has to be called judiciously from these callbacks, after the virtio reset, - *and* that the *entire* call chain originating from UnmapSharedPages(), through PciIo, through PciRootBridgeIo, to EdkiiIoMmu, *must* not call gBS->FreePool() or gBS->FreePages() (or the equivalent MemoryAllocationLib functions). Note that if the last requirement is currently violated (outside of OvmfPkg), then that is a general problem for physical platforms as well -- IMO, a physical NIC driver too should be able to abort DMA in its exit-boot-services callback and then unmap any relevant IOMMU mappings (via PciIo->Unmap().) Thanks Laszlo