From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: mx.groups.io;
 dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com)
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by groups.io with SMTP; Thu, 27 Jun 2019 11:44:00 -0700
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 84A8C308A951;
	Thu, 27 Jun 2019 18:43:59 +0000 (UTC)
Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-118.ams2.redhat.com [10.36.116.118])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 42C8D5C257;
	Thu, 27 Jun 2019 18:43:57 +0000 (UTC)
Subject: Re: [PATCH v3 4/4] OvmfPkg: don't assign PCI BARs above 4GiB when CSM enabled
To: Alexander Graf <graf@amazon.com>
Cc: devel@edk2.groups.io, David Woodhouse <dwmw2@infradead.org>,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>
References: <91d912c9-533a-22a3-4aa3-0fe114e1149f@amazon.com>
From: "Laszlo Ersek" <lersek@redhat.com>
Message-ID: <b47ec57d-2c6f-7766-c5c4-b55edeb0a703@redhat.com>
Date: Thu, 27 Jun 2019 20:43:56 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <91d912c9-533a-22a3-4aa3-0fe114e1149f@amazon.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Thu, 27 Jun 2019 18:43:59 +0000 (UTC)
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable

On 06/27/19 18:36, Alexander Graf wrote:
> Hi David and Laszlo,
>=20
> (with broken threading because gmane still mirrors the old ML ...)
>=20
>> Mostly, this is only necessary for devices that the CSM might have
>> native support for, such as VirtIO and NVMe; PciBusDxe will already
>> degrade devices to 32-bit if they have an OpROM.
>>
>> However, there doesn't seem to be a generic way of requesting PciBusDx=
e
>> to downgrade specific devices.
>>
>> There's IncompatiblePciDeviceSupportProtocol but that doesn't provide
>> the PCI class information or a handle to the device itself, so there's
>> no simple way to just match on all NVMe devices, for example.
>>
>> Just leave gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Size set to zero for
>> CSM builds, until/unless that can be fixed.
>>
>> Signed-off-by: David Woodhouse <dwmw2@...>
>> Reviewed-by: Laszlo Ersek <lersek@...>
>> ---
>> =C2=A0OvmfPkg/OvmfPkgIa32X64.dsc | 4 ++++
>> =C2=A0OvmfPkg/OvmfPkgX64.dsc=C2=A0=C2=A0=C2=A0=C2=A0 | 4 ++++
>> =C2=A02 files changed, 8 insertions(+)
>>
>> diff --git a/OvmfPkg/OvmfPkgIa32X64.dsc b/OvmfPkg/OvmfPkgIa32X64.dsc
>> index 639e33cb285f..ad20531ceb8b 100644
>> --- a/OvmfPkg/OvmfPkgIa32X64.dsc
>> +++ b/OvmfPkg/OvmfPkgIa32X64.dsc
>> @@ -543,7 +543,11 @@ [PcdsDynamicDefault]
>> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio32Base|0x0
>> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio32Size|0x0
>> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Base|0x0
>> +!ifdef $(CSM_ENABLE)
>> +=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Size|0x0
>> +!else
>> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Size|0x800000000
>> +!endif
>> =C2=A0
>> =C2=A0=C2=A0 gEfiMdePkgTokenSpaceGuid.PcdPlatformBootTimeOut|0
>> =C2=A0
>> diff --git a/OvmfPkg/OvmfPkgX64.dsc b/OvmfPkg/OvmfPkgX64.dsc
>> index 69a3497c2c9e..0542ac2235b4 100644
>> --- a/OvmfPkg/OvmfPkgX64.dsc
>> +++ b/OvmfPkg/OvmfPkgX64.dsc
>> @@ -542,7 +542,11 @@ [PcdsDynamicDefault]
>> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio32Base|0x0
>> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio32Size|0x0
>> =C2=A0=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Base|0x0
>> +!ifdef $(CSM_ENABLE)
>> +=C2=A0 gUefiOvmfPkgTokenSpaceGuid.PcdPciMmio64Size|0x0
>=20
> IIRC x86 Linux just takes firmware provided BAR maps as they are and
> doesn't map on its own.

That's correct.


> Or does it map if a BAR was previously unmapped?

My understanding is that Linux re-maps the BARs if it dislikes something
(e.g. root bridge apertures described in ACPI _CRS do not cover some
ranges programmed into actual BARs).

IIUC reallocation can be requested on the kernel cmdline as well, with
pci=3Drealloc.

I believe you could test your question with the "pci-testdev" QEMU
device model -- in QEMU commit 417463341e3e ("pci-testdev: add optional
memory bar", 2018-11-05), Gerd added the "membar" property for just that
(IIRC).


> In the former case, wouldn't that mean that we're breaking GPU
> passthrough (*big* BARs) for OVMF if the OVMF version happens to suppor=
t
> CSM? So if a distro decides to turn on CSM, that would be a very subtle
> regression.

Yes, this is in theory a possible regression. It requires the user to
combine huge BARs with an OVMF build that includes the CSM.

I've been aware of this, but it seems like such a corner case that I
didn't intend to raise it. To begin with, building OVMF with the CSM is
a niche use case in itself.

David described (but I've forgotten the details, by now) some kind of
setup or service where a user cannot choose between pure SeaBIOS and
pure OVMF, for their virtual machine. They are given just one firmware,
and so in order to let users boot both legacy and UEFI OSes, it makes
sense for the service provider to offer OVMF+CSM.

Fine -- but, in that kind of service, where users are prevented from
picking one of two "pure" firmwares, do we really expect users to have
the configuration freedom to shove GPUs with huge BARs into their VMs?


> Would it be possible to change the PCI mapping logic to just simply
> *prefer* low BAR space if there's some available and the BAR is not big
> (<64MB for example)?

PciBusDxe in MdeModulePkg is practically untouchable, at such a
"strategy" level. We can fix bugs in it, but only surgically. (This is
not something that I endorse, I'm just observing it.)

Platforms are expected to influence the behavior of PciBusDxe through
implementing the "incompatible pci device support" protocol. OVMF
already does that (IncompatiblePciDeviceSupportDxe), but the protocol
interface (from the PI spec) is not flexible enough for what David
actually wanted. Otherwise, this restriction would have been expressed
per-controller.

If the problem that you describe above outweighs the issue that David
intends to mitigate with the patch, in a given service, then the vendor
can rebuild OVMF with a suitable "--pcd=3D..." option. Or else, they can
even use

  -fw_cfg name=3Dopt/ovmf/X-PciMmio64Mb,string=3D32768

dynamically, on the QEMU command line. (Please see commit 7e5b1b670c38,
"OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64
DXE", 2016-03-23.)


> That way we could have CSM enabled OVMF for everyone ;)

Well, as long as we're discussing "everyone": we should forget about the
CSM altogether, in the long term. The CSM is a concession towards OSes
that are stuck in the past; a concession that is hugely complex and
difficult to debug & maintain. It is also incompatible with Secure Boot.
Over time, we should spend less and less time & energy on the CSM. Just
my opinion, of course. :)

Thanks
Laszlo