From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Thu, 27 Jun 2019 11:44:00 -0700 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 84A8C308A951; Thu, 27 Jun 2019 18:43:59 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-118.ams2.redhat.com [10.36.116.118]) by smtp.corp.redhat.com (Postfix) with ESMTP id 42C8D5C257; Thu, 27 Jun 2019 18:43:57 +0000 (UTC) Subject: Re: [PATCH v3 4/4] OvmfPkg: don't assign PCI BARs above 4GiB when CSM enabled To: Alexander Graf Cc: devel@edk2.groups.io, David Woodhouse , Ard Biesheuvel References: <91d912c9-533a-22a3-4aa3-0fe114e1149f@amazon.com> From: "Laszlo Ersek" Message-ID: Date: Thu, 27 Jun 2019 20:43:56 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <91d912c9-533a-22a3-4aa3-0fe114e1149f@amazon.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Thu, 27 Jun 2019 18:43:59 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 06/27/19 18:36, Alexander Graf wrote: > Hi David and Laszlo, >=20 > (with broken threading because gmane still mirrors the old ML ...) >=20 >> Mostly, this is only necessary for devices that the CSM might have >> native support for, such as VirtIO and NVMe; PciBusDxe will already >> degrade devices to 32-bit if they have an OpROM. >> >> However, there doesn't seem to be a generic way of requesting PciBusDx= e >> to downgrade specific devices. >> >> There's IncompatiblePciDeviceSupportProtocol but that doesn't provide >> the PCI class information or a handle to the device itself, so there's >> no simple way to just match on all NVMe devices, for example. >> >> Just leave gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Size set to zero for >> CSM builds, until/unless that can be fixed. >> >> Signed-off-by: David Woodhouse >> Reviewed-by: Laszlo Ersek >> --- >> =C2=A0OvmfPkg/OvmfPkgIa32X64.dsc | 4 ++++ >> =C2=A0OvmfPkg/OvmfPkgX64.dsc=C2=A0=C2=A0=C2=A0=C2=A0 | 4 ++++ >> =C2=A02 files changed, 8 insertions(+) >> >> diff --git a/OvmfPkg/OvmfPkgIa32X64.dsc b/OvmfPkg/OvmfPkgIa32X64.dsc >> index 639e33cb285f..ad20531ceb8b 100644 >> --- a/OvmfPkg/OvmfPkgIa32X64.dsc >> +++ b/OvmfPkg/OvmfPkgIa32X64.dsc >> @@ -543,7 +543,11 @@ [PcdsDynamicDefault] >> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio32Base|0x0 >> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio32Size|0x0 >> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Base|0x0 >> +!ifdef $(CSM_ENABLE) >> +=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Size|0x0 >> +!else >> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Size|0x800000000 >> +!endif >> =C2=A0 >> =C2=A0=C2=A0 gEfiMdePkgTokenSpaceGuid.PcdPlatformBootTimeOut|0 >> =C2=A0 >> diff --git a/OvmfPkg/OvmfPkgX64.dsc b/OvmfPkg/OvmfPkgX64.dsc >> index 69a3497c2c9e..0542ac2235b4 100644 >> --- a/OvmfPkg/OvmfPkgX64.dsc >> +++ b/OvmfPkg/OvmfPkgX64.dsc >> @@ -542,7 +542,11 @@ [PcdsDynamicDefault] >> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio32Base|0x0 >> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio32Size|0x0 >> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Base|0x0 >> +!ifdef $(CSM_ENABLE) >> +=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Size|0x0 >=20 > IIRC x86 Linux just takes firmware provided BAR maps as they are and > doesn't map on its own. That's correct. > Or does it map if a BAR was previously unmapped? My understanding is that Linux re-maps the BARs if it dislikes something (e.g. root bridge apertures described in ACPI _CRS do not cover some ranges programmed into actual BARs). IIUC reallocation can be requested on the kernel cmdline as well, with pci=3Drealloc. I believe you could test your question with the "pci-testdev" QEMU device model -- in QEMU commit 417463341e3e ("pci-testdev: add optional memory bar", 2018-11-05), Gerd added the "membar" property for just that (IIRC). > In the former case, wouldn't that mean that we're breaking GPU > passthrough (*big* BARs) for OVMF if the OVMF version happens to suppor= t > CSM? So if a distro decides to turn on CSM, that would be a very subtle > regression. Yes, this is in theory a possible regression. It requires the user to combine huge BARs with an OVMF build that includes the CSM. I've been aware of this, but it seems like such a corner case that I didn't intend to raise it. To begin with, building OVMF with the CSM is a niche use case in itself. David described (but I've forgotten the details, by now) some kind of setup or service where a user cannot choose between pure SeaBIOS and pure OVMF, for their virtual machine. They are given just one firmware, and so in order to let users boot both legacy and UEFI OSes, it makes sense for the service provider to offer OVMF+CSM. Fine -- but, in that kind of service, where users are prevented from picking one of two "pure" firmwares, do we really expect users to have the configuration freedom to shove GPUs with huge BARs into their VMs? > Would it be possible to change the PCI mapping logic to just simply > *prefer* low BAR space if there's some available and the BAR is not big > (<64MB for example)? PciBusDxe in MdeModulePkg is practically untouchable, at such a "strategy" level. We can fix bugs in it, but only surgically. (This is not something that I endorse, I'm just observing it.) Platforms are expected to influence the behavior of PciBusDxe through implementing the "incompatible pci device support" protocol. OVMF already does that (IncompatiblePciDeviceSupportDxe), but the protocol interface (from the PI spec) is not flexible enough for what David actually wanted. Otherwise, this restriction would have been expressed per-controller. If the problem that you describe above outweighs the issue that David intends to mitigate with the patch, in a given service, then the vendor can rebuild OVMF with a suitable "--pcd=3D..." option. Or else, they can even use -fw_cfg name=3Dopt/ovmf/X-PciMmio64Mb,string=3D32768 dynamically, on the QEMU command line. (Please see commit 7e5b1b670c38, "OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE", 2016-03-23.) > That way we could have CSM enabled OVMF for everyone ;) Well, as long as we're discussing "everyone": we should forget about the CSM altogether, in the long term. The CSM is a concession towards OSes that are stuck in the past; a concession that is hugely complex and difficult to debug & maintain. It is also incompatible with Secure Boot. Over time, we should spend less and less time & energy on the CSM. Just my opinion, of course. :) Thanks Laszlo