From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 3FFC321EA35D7 for ; Tue, 5 Sep 2017 06:41:20 -0700 (PDT) Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2017 06:44:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,480,1498546800"; d="scan'208,217";a="147661621" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga005.fm.intel.com with ESMTP; 05 Sep 2017 06:44:08 -0700 Received: from fmsmsx156.amr.corp.intel.com (10.18.116.74) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 5 Sep 2017 06:44:08 -0700 Received: from shsmsx101.ccr.corp.intel.com (10.239.4.153) by fmsmsx156.amr.corp.intel.com (10.18.116.74) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 5 Sep 2017 06:44:07 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.39]) by SHSMSX101.ccr.corp.intel.com ([169.254.1.168]) with mapi id 14.03.0319.002; Tue, 5 Sep 2017 21:44:05 +0800 From: "Yao, Jiewen" To: Laszlo Ersek , "Zeng, Star" , edk2-devel-01 CC: "Dong, Eric" Thread-Topic: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() Thread-Index: AQHTJO6HlEgpQf73uk+WGvJRCtH+06KkA0GAgACz0wCAAMTM8IAAA0SAgADDw8A= Date: Tue, 5 Sep 2017 13:44:04 +0000 Message-ID: <74D8A39837DF1E4DA445A8C0B3885C503A9A7F10@shsmsx102.ccr.corp.intel.com> References: <20170903195449.30261-1-lersek@redhat.com> <0C09AFA07DD0434D9E2A0C6AEB0483103B93A125@shsmsx102.ccr.corp.intel.com> <4b24b1eb-362f-3b46-97e2-bdfda53f40c9@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C503A9A79BD@shsmsx102.ccr.corp.intel.com> <5f1fdc84-5824-bee2-5a1a-fbd67adf5443@redhat.com> In-Reply-To: <5f1fdc84-5824-bee2-5a1a-fbd67adf5443@redhat.com> Accept-Language: zh-CN, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [10.239.127.40] MIME-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 Subject: Re: [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices() X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Sep 2017 13:41:20 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Good discussion. Comment in line. From: Laszlo Ersek [mailto:lersek@redhat.com] Sent: Tuesday, September 5, 2017 5:16 PM To: Yao, Jiewen ; Zeng, Star ; e= dk2-devel-01 Cc: Dong, Eric Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap co= mmon buffers at ExitBootServices() On 09/05/17 04:18, Yao, Jiewen wrote: > HI Laszlo > Thank you very much to raise this DMA topic. > > I have a chat with Star. We have couples of question, from high level to = low level. > Hope you can clarify: > > > 1) About disable DMA > > As you mentioned, in UEFI_Driver_Writer_Guide_V1.0.1_120308.pdf, 7.7 Addi= ng the Exit Boot Services feature > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Examples from the EDK II that use this feature are the PCI device drivers= for USB Host > Controllers. Some USB Host Controllers are PCI Bus Masters that continuou= sly access a > memory buffer to poll for operation requests. Access to this memory buffe= r by a USB > Host Controller may be required to boot an operation system, but this act= ivity must be > terminated when the OS calls ExitBootServices(). The typical action in th= e Exit Boot > Services Event for these types of drivers is to disable the PCI bus maste= r and place the > USB Host Controller into a halted state > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > I believe the fundamental requirement is to "disable DMA". > As such, the simplest action is to clear PCI Bus Master Enable bit in the= command register. > > Do you think clearing BME is enough? > Is there any special reason that we must unmap the DMA buffer? Modifying the device state is enough to ask the device *nicely* to stop doing DMA. But, I think, if we are on an IOMMU-protected system, where part of PciIo.Map() is to add a translation (=3D system memory access permission) to the IOMMU, then we should also revoke this permission explicitly, *after* we asked the device nicely to stop doing DMA. Basically, ask the device nicely first, and then enforce the protection. ... BTW, I mentioned "IOMMU faults" earlier, and I have some further thoughts on them. This is about tearing down IOMMU translations *first*, in the IOMMU driver's ExitBootServices() handler, *before* the PCI drivers get a chance -- in *their* ExitBootServices() handlers -- to ask the PCI devices nicely to stop doing DMA. I think a quite problematic failure mode is possible here. Assume that a disk controller has some write commands queued (which are mapped with Bus Master Read or Bus Master CommonBuffer operations). If the ExitBootServices() handler for the IOMMU driver runs first, then the device could act upon these queued commands with missing IOMMU translations, before the device's own PCI driver asked the device to shut down DMA. In some cases, I guess this could result in those queued write commands simply failing, without ill effects. However, dependent on the IOMMU implementation, the write commands might not fail explicitly, but read garbage from system memory, and write garbage to disk blocks. That would be bad. I think this is exactly what could happen with the SEV IOMMU we have. The memory would be first re-encrypted (this is equivalent to revoking a translation), and then the write commands (queued in the emulated AHCI controller, in the hypervisor) would read garbage from guest memory, and write garbage to the virtual disk. [Jiewen] I am not clear on how it happens in SEV here. Below is my thought, please correct me if I am wrong. First, the fundamental requirement is to disable BME at exit boot service, or make sure the controller is in halt state. If a device driver fails to meet such requirement, it is a bug we need to f= ix at first. Now, let's assume all device driver is in halt state at ExitBootService. - = that is my understanding for all rest discussion. I checked OVMF and I do not see any code in IOMMU driver to register exit b= oot services event, which means the DMA buffer is still decrypted. If that is the case, the DMA transfer before BME disable is safety (DMA buf= fer is decrypted) and there will be no DMA transfer any more at ExitBootService (BME disabled). Then how does the write command queued in the emulated AHCI controller star= t read garbage? The SEV IOMMU driver only modifies the page table. Then after ExitBootService, the OS will take control of CR3 and set correct encryption bit. NOTE: The device should still be halt state, because the device is also controlled by OS. > > > 2) About common buffer > If there is a special requirement that we must unmap the DMA buffer, why = we only unmap the common buffer? > > I understand your statement that unmapping a read buffer or write buffer = may cause free memory action. But that is implementation choice. I believe I disagree with this (i.e., I think it is not an implementation choice). In the general case, we can assume that PciIo.Map() allocates a bounce buffer internally, when Read or Write action is requested. [Jiewen] When I say implementation choice, I mean Map() *may* allocates a bounce buffer or it just uses the same address, if the platform is capabl= e of using DRAM as DMA buffer directly. In turn, PciIo.Unmap() has to release the same buffer, internally. PciIo.Unmap() does not know if it is called from an ExitBootServices() notification function, or from a normal context. It cannot decide if it is allowed to release memory, or not. There's no additional parameter to provide this hint either. If PciIo.Unmap() never releases buffers, then ExitBootServices() will be happy, but we're going to leak memory. Now, PciIo.Unmap() could move the bounce buffer to some internal "free list" instead, rather than freeing it. And, the next PciIo.Map() call could try to reuse a buffer from the "free list". But this is difficult to do, because the buffers may have any size, and recycling them would require an internal, general "page pool", in parallel to the Boot Services page pool (AllocatePages, FreePages). If PciIo.Unmap() could be informed that it is running in ExitBootServices() notification context -- called from PCI drivers --, then unmapping Read and Write operations would be simple. > If we have strong requirement to unmap the DMA buffer, we should achieve = that. I totally agree that we *should* achieve that, but I don't see how the current PciIo.Map() and PciIo.Unmap() APIs make it possible, for Read and Write operations. For CommonBuffer operations, it is possible, because AllocateBuffer() and FreeBuffer() are separate actions, and an ExitBootServices() notification function can *avoid* calling FreeBuffer. (But only for CommonBuffer operations.) PciIo.Unmap() cannot be told *not* to release a bounce buffer for Read/Write. [Jiewen] Again, it is an implementation choice. It is possible that a host bridge allocate common buffer without calling any Allocate(). But another implementation may also allocate internal data structure to record every map() action, including common buffer. (For event log, or tracing purpose, et etc.) I do not think Free() should bring difference between common buffer or read/write buffer. I am a little worried that we have too many assumption here: e.g. Map() common buffer never involves Allocate() even for driver internal data structure , and Map() read/write buffer always involves Allocate() eve= n there is no need to allocate anything. With that assumption, we purposely avoid Unmap() read/write buffer, but force Unmap() common buffer. I am not sure if that assumption is correct. > > > 3) About interaction with IOMMU > I am not worried about IOMMU interaction. > I believe an proper IOMMU implementation should not block any action take= n by a PCI device driver. > Current IntelVTdDxe driver disables VTd engine in ExitBootService, which = means the DMA access is no longer managed by VTd engine. It is NOT related = to disable DMA buffer. Ah, I understand! So, on one hand, that is totally safe, for PCI device drivers. It means all pending DMA can get through, until the PCI drivers ask the devices (nicely) to stop doing DMA. [Jiewen] Yes. I think any IOMMU implementation should take that into consideration. On the other hand, it is the opposite of the security goal of SEV. With SEV, the default state is "all DMA forbidden" (=3D all guest memory encrypted). This is also the status that we'd like to achieve when ExitBootServices() completes. When control is transferred to the OS, all guest memory should be encrypted again, even those areas that were once used as CommonBuffers. [Jiewen] I am not sure who will control "When control is transferred to the= OS, all guest memory should be encrypted again, even those areas that were once used as CommonBuffers." For SEV case, I think it should be controlled by OS, because it is just CR3= . If so, we should not meet any problem, because we already disable BME in device driver's ExitBootService event. So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU are opposites -- VT-d permits all DMA unless configured otherwise, while SEV forbids all DMA unless configured otherwise. [Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disab= le all DMA access. I setup translation table, but mark all entry to be NOT-PRESENT. I grant DMA access only if there is a specific request by a device. I open DMA access in ExitBootServices, just want to make sure it is compati= ble to the existing OS, which might not have knowledge on IOMMU. I will provide a patch to make it a policy very soon. As such VTd IOMMU may turn off IOMMU, or keep it enabled at ExitBootService. An IOMMU aware OS may take over IOMMU directly and reprogram it. > 4) About the driver you modified > I believe we have some other PCI device driver (NVMe/SDMMC/UFS), besides = the ones you modified in this patch (ATA/UCHI/EHCI/XHCI). > If we need take this approach, I assume they also need update, right? Yes, they should be updated similarly. SDMMC and UFS are not included in OVMF, so I couldn't test their behavior. NVMe is included in OVMF, but it is used very rarely in practice (we can call it a corner case), so I thought I would first submit patches for the four main drivers (UHCI, EHCI, XHCI, AHCI) that are frequently used with OVMF and QEMU. Tracking down the CommonBuffer lifecycles was difficult :) [Jiewen] Understood. That is OK. We can handle that in separate patch. Thank you! Laszlo > > Thank you > Yao Jiewen > > > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of La= szlo Ersek > Sent: Tuesday, September 5, 2017 5:20 AM > To: Zeng, Star >; edk2-de= vel-01 > > Cc: Dong, Eric > > Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap = common buffers at ExitBootServices() > > On 09/04/17 12:36, Zeng, Star wrote: >> Curious about one question when I am reviewing this patch series. >> >> Does/Should the IOMMU component disable the address translation at ExitB= ootServices? > > It is a very good and interesting question. > > My answer is, I don't know. The IOMMU protocol abstraction is specific > to edk2, and its behavior is not documented in the PI or UEFI specs (as > far as I know). > > For one example, the VT-d IOMMU implementation in > > IntelSiliconPkg/IntelVTdDxe > > zaps all translations at ExitBootServices(): > > VOID > EFIAPI > OnExitBootServices ( > IN EFI_EVENT Event, > IN VOID *Context > ) > { > DEBUG ((DEBUG_INFO, "Vtd OnExitBootServices\n")); > DumpVtdRegsAll (); > DisableDmar (); > DumpVtdRegsAll (); > } > > Now, in case someone suggested that the same approach should be followed > by all IOMMU implementations, I'd have three counter-arguments: > > (1) My series states the problem (and seeks to solve it) on the PciIo > protocol level, which is standardized. The IOMMU protocol is edk2-only. > I feel that this kind of "unmap on ExitBootServices" should be done by > all UEFI drivers that map DMA common buffers, regardless of > platform-specific PciIo implementations. > > (2) ExitBootServices() notify functions are called in unspecified order > (there's no way to express dependencies between them). This means that, > following the above pattern, an IO address translation could be torn > down before a reliant PCI device is de-configured in its own UEFI_DRIVER > ExitBootServices() handler. The result can be a series of IOMMU faults. > I'm not sure if that's bad in practice, but it does look like a layering > violation. (We shouldn't tear down resources from the bottom up.) > > (3) The DisableDmar() function is relatively simple. It uses VT-d > registers that exist independently of any PCI devices, and of any "live" > mappings. > > Some IOMMU protocol implementations might be different -- for them the > live configuration might only be known by constantly tracking the full > set of Map() and Unmap() operations. That's a hurdle for the implementati= on. > > Furthermore, what is supposed to happen if both the IOMMU driver and the > individual PCI driver destroy the mapping at ExitBootServices()? If the > PCI driver's notification function runs first, then the IOMMU driver can > mark the mapping as "unmapped", and ignore it in its own notification > function. If the IOMMU driver's notification function runs first, then > it can again only mark the mapping as "unmapped", so that when the PCI > driver tries to unmap it as well, things don't blow up. This seems to > imply that an unmapped mapping data structure can never be recycled, > because we never know what action might follow. This sounds very > confusing to me. > > > Now, there are two arguments in favor of the IOMMU ExitBootServices() > callback as well: > > - Security. It doesn't matter if some driver forgets to de-configure a > translation, the IOMMU driver won't. Layering violations be damned. > > - In ExitBootServices() notification functions, the UEFI memory map must > not be altered (memory cannot be released). This means that Bus Master > Write and Bus Master Read operations, for pending transfers, cannot be > unmapped (only CommonBuffer operations can be), because unmapping > Write/Read might free memory (bounce buffers) *implicitly*. For > CommonBuffer operations, separate PciIo.FreeBuffer() calls are necessary > for releasing memory, thus the notification functions can simply *not* > call FreeBuffer(), to stay safe. But this is not possible for pending > Read/Write operations. > > > My current thinking is that (a) PCI drivers should unmap pending Common > Buffer operations at ExitBootServices; (b) PCI drivers should never > return from their protocol member functions with pending Write/Read > operations, only CommonBuffer operations (put differently, if the > operation is asynchronous on the higher protocol level, then make the > underlying BMDMA operation CommonBuffer); (c) IOMMU protocol > implementations should not be required to clean up translations, only > allowed to, if they can do it without interfering with (a). > > Basically, the question is, who *owns* the translations. Based on the > PciIo interfaces, and on the requirement that PCI drivers Map() and > Unmap() buffers, my opinion is that the PCI drivers own the translations. > > If you think that we should take this question to the USWG, I'm OK with > that. > > Thanks, > Laszlo > > > >> -----Original Message----- >> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of L= aszlo Ersek >> Sent: Monday, September 4, 2017 3:55 AM >> To: edk2-devel-01 >> >> Cc: Dong, Eric >>; Zeng, Star >> >> Subject: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap com= mon buffers at ExitBootServices() >> >> Repo: https://github.com/lersek/edk2.git >> Branch: pci_host_controllers_unmap >> >> This series updates four PCI Host Controller drivers so that they don't = leave CommonBuffer mappings hanging when control is transfered to the OS. >> >> Cc: Brijesh Singh >> >> Cc: Eric Dong >> >> Cc: Star Zeng >> >> >> Thanks >> Laszlo >> >> Laszlo Ersek (4): >> MdeModulePkg/UhciDxe: unmap BM common buffers when exiting boot >> services >> MdeModulePkg/EhciDxe: unmap BM common buffers when exiting boot >> services >> MdeModulePkg/XhciDxe: unmap BM common buffers when exiting boot >> services >> MdeModulePkg/AtaAtapiPassThru: unmap common buffers at >> ExitBootServices() >> >> MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AhciMode.h | 18 ++++++ >> MdeModulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.h | 5 ++ MdeMo= dulePkg/Bus/Ata/AtaAtapiPassThru/AtaAtapiPassThru.c | 67 ++++++++++++++++++= +- >> MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 25 +++++++- >> MdeModulePkg/Bus/Pci/UhciDxe/Uhci.c | 25 +++++++- >> MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c | 31 +++++++++ >> 6 files changed, 168 insertions(+), 3 deletions(-) >> >> -- >> 2.14.1.3.gb7cf6e02401b >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org> >> https://lists.01.org/mailman/listinfo/edk2-devel >> > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org> > https://lists.01.org/mailman/listinfo/edk2-devel >