From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 321F020958BD2 for ; Tue, 5 Sep 2017 14:56:14 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C454F7F3F6; Tue, 5 Sep 2017 21:59:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com C454F7F3F6 Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lersek@redhat.com Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-185.rdu2.redhat.com [10.10.120.185]) by smtp.corp.redhat.com (Postfix) with ESMTP id 178507EF8F; Tue, 5 Sep 2017 21:59:00 +0000 (UTC) To: Ard Biesheuvel Cc: Brijesh Singh , "edk2-devel@lists.01.org" , Jordan Justen , Tom Lendacky , Jiewen Yao , Star Zeng References: <1504265045-19008-1-git-send-email-brijesh.singh@amd.com> <1504265045-19008-2-git-send-email-brijesh.singh@amd.com> <04fefb16-23d5-f6df-0657-9d4c5e96ac57@redhat.com> <4f286363-6f04-bee0-8b12-34b50a813e70@amd.com> <5992734a-1ded-9d4b-36bd-99c13f30ca32@redhat.com> From: Laszlo Ersek Message-ID: Date: Tue, 5 Sep 2017 23:59:00 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 05 Sep 2017 21:59:03 +0000 (UTC) Subject: Re: [PATCH 1/5] OvmfPkg/VirtioNetDxe: map VRING using VirtioRingMap() X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Sep 2017 21:56:14 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 09/05/17 23:11, Ard Biesheuvel wrote: > On 5 September 2017 at 21:17, Laszlo Ersek wrote: > [...] >> ... Based on the discussion in the other thread ("[edk2] [PATCH 0/4] >> MdeModulePkg: some PCI HC drivers: unmap common buffers at >> ExitBootServices()"), I'm worried that we might need another >> restructuring for the IOMMU driver, and for the ExitBootServices() >> handlers for the virtio drivers (removing the Unmap() calls). >> >> If that's the case, then it wouldn't be good if you wasted work on >> VirtioNetExitBoot() in v2 of this series. >> >> Also under patch v1 #4, I requested that you not use >> VirtioOperationBusMasterRead for DMA that remains pending after the >> protocol member function returns, because we cannot Unmap() >> BusMasterRead in VirtioNetExitBoot() without freeing memory. Dependent >> of the outcome of said other thread, this might not be good advice after >> all. >> >> I'd be pretty disappointed if we had to go back to the drawing board at >> this stage. In my opinion, the UEFI-2.7 spec doesn't offer any >> facilities to us that would let us reliably and safely Unmap() bus >> master operations (= re-encrypt memory) at ExitBootServices(), for such >> PCI device drivers that we cannot modify. Whatever we do at this point >> looks like a hack: >> >> >> * Option #1: modify Virtio and other PCI drivers to use only >> CommonBuffer operations for asynch DMA, and manually Unmap() those >> operations in the ExitBootServices() handlers of the drivers. In >> addition, guarantee that Unmap() for CommonBuffer will not release >> memory. >> >> This is the approach I've been supporting. We could implement it for >> OVMF, because the community controls most of the platform (QEMU, >> KVM, OVMF), OVMF is 100% open source, and we can propose patches for >> other (e.g. MdeModulePkg) PCI drivers in edk2, if necessary. >> >> Problem: PCI drivers are not required by the spec, or the Driver >> Writer's Guide, to Unmap() on ExitBootServices() (and then to Unmap() >> only CommonBuffer). Also, PciIo implementations are not required by >> the spec to behave like this (= not free memory when Unmap()ing >> CommonBuffer). We can satisfy those assumptions in OvmfPkg, but >> perhaps not in MdeModulePkg drivers. >> > > I think you will have much bigger problems with out of tree PCI > drivers that never use Map/Unmap in the first place, because stuff > just works on PCs if you omit them. I'm currently not worried about out-of-tree PCI drivers in the least :) I'm worried about finding common grounds with MdeModulePkg drivers, because they do use Map/Unmap, they seek to conform to the UEFI spec (mostly), and they are an example / reference implementation for other dirvers / driver writers. If out-of-tree drivers break, let them break (as first step); exactly for the reason you state. > This is actually the reason I am quite pleased with this SEV support: > it means x86 drivers will start breaking in the same way as they would > have on ARM systems with non-coherent DMA or non-1:1 mapped PCI, and > vendors will have to clean up their code anyway, not just for ARM > compatibility. I certainly agree this is a benefit! >>From the virtio conversion thus far (even from the mostly-reviewer side), I can say it is a lot of work. > The only requirement imposed on DMA capable devices is that they cease > to perform DMA after ExitBootServices. Anything beyond that should not > be the responsibility of the device driver, simply because that > requirement did not exist before, and so we cannot go back in time and > add it. Fine by me (as a general guideline, not necessarily as a practical one); what can we use instead, for closing down the IOMMU holes late enough, originally opened by the PciIo.Map() calls? > >> * Option #2: add an ExitBootServices() handler to the IOMMU driver, and >> clean up mappings (= re-encrypt memory) after the PCI / Virtio drivers >> are finalized in their ExitBootServices() handlers. Ignore mappings in >> the drivers' ExitBootServices() handlers. >> >> Problem: the keyword is "after". Under this approach, we *must* clean >> up the mappings in the IOMMU driver, but we *must not* do that before >> the device drivers are finished. And the UEFI spec does not allow us >> to express a dependency order between ExitBootServices() handlers. >> >> We could hack that around by deferring the actual cleanup to *another* >> event handler, signaled from the IOMMU's "normal" ExitBootServices() >> handler. >> >> Problem: the UEFI spec does not promise that signaling events from >> ExitBootServices() handlers is safe. >> >> Problem: if some PCI driver does the same trick for whatever reason in >> its ExitBootServices() handler, then we're back to square one. >> >> > > I think this is the only feasible option. Fortunately, we don't have > to solve this at the spec level, but only have to deal with the > implementation. > > What we need is a method in the IOMMU protocol that is invoked when > the ExitBootServices implementation is about to return EFI_SUCCESS Yes, after "everything else" is done. (Of course, this is the age-old problem with UEFI, when components that were originally meant to be independent now try to order themselves in some fashion. For example, "let me just install, or patch, this ACPI table or configuration table quickly at ReadyToBoot, *after* everyone else is done, and I get a good look at system state". Now combine two such DXE drivers into a firmware, and hilarity ensues as they fight for the last word.) No facility exists to my knowledge (on the spec level) that would enable such fine delaying or dependency ordering between EBS handlers. The only ordering is between notification functions enqueued at TPL_NOTIFY vs. TPL_CALLBACK, and there is a guarantee (I think?) that if you get signalled demonstrably later (i.e., not via an event group), then you get queued for later, within the same TPL. The problem (for this discussion) is that any random PCI driver can register its EBS event notification function at either TPL_NOTIFY or at TPL_CALLBACK, plus that all EBS events are signaled together as an event group. Consequently, if the IOMMU driver registers its EBS notifier at TPL_NOTIFY, it is guaranteed to run earlier than all notifiers with TPL_CALLBACK (which is exactly what we *don't* want). If the IOMMU registers its EBS handler at TPL_CALLBACK, then (due to being signaled through a large event group) the notifier will still be invoked somewhere in the middle of a bunch of other TPL_CALLBACK-level notifiers -- that is, not necessarily in the last position. Hence my floating of a hack to re-queue the notification (to another TPL_CALLBACK handler), so that we get to the "end of the low-prio queue". But calling SignalEvent() from an EBS handler is not explicitly permitted in the spec. (Below you even suggest that an EBS handler should not call any boot service.) Of course, if CoreExitBootServices() can be updated specifically for this use case, I won't protest. > > (which means it could be the second time it was called). Side remark: the CoreExitBootServices() implementation does not notice memory map changes incurred by the ExitBootServices() handlers, in my interpretation. CoreExitBootServices() has a MapKey (= "memmap generation") check early on, in CoreTerminateMemoryMap(). This check catches memmap changes between the last GetMemoryMap(), and the call to ExitBootServices(). After this check succeeds, the notify functions are kicked, and on this path, CoreExitBootServices() simply cannot return any other value than EFI_SUCCESS. So whatever mess the individual callbacks make, it doesn't notice or report. Perhaps this is better for your argument, actually. I'm not fully sure. > This severely > limits what the method is actually capable of doing, and I think it > should not be allowed to call any boot or DXE services at all, but it > should still be sufficient for some linked list traversals and > CopyMem() operations, and whatever else is needed to re-encrypt the > memory. Yes, the necessary actions are sufficiently low-level for this. The problem is the ordering. > > And actually, this is not only useful for SEV, other IOMMU drivers > will have to deal with the same issue: in most cases, you'll want to > disable it before handing over to the OS, but this can never be done > safely in a EBS event handler for precisely the reasons you pointed > out. Most PCI devices probably deal with this gracefully, but tearing > down mappings should simply be avoided if a device could still be > accessing it. > Fully agreed. Once Jiewen adds the policy option to lock down VT-d at EBS (I hope I understood that plan right), dependent on OS support for the IOMMU, the IOMMU faults can show up just the same at EBS, until the rest of the PCI driver EBS handlers finish up. Thanks! Laszlo