From: "Laszlo Ersek" <lersek@redhat.com>
To: Gerd Hoffmann <kraxel@redhat.com>
Cc: devel@edk2.groups.io, Pawel Polawski <ppolawsk@redhat.com>,
Jiewen Yao <jiewen.yao@intel.com>,
Oliver Steffen <osteffen@redhat.com>,
Jordan Justen <jordan.l.justen@intel.com>,
Ard Biesheuvel <ardb+tianocore@kernel.org>
Subject: Re: [PATCH v2 2/4] OvmfPkg/PlatformInitLib: Add PlatformGetLowMemoryCB
Date: Wed, 11 Jan 2023 14:56:05 +0100 [thread overview]
Message-ID: <4ace3789-7192-0c53-b4b2-f62f907176d0@redhat.com> (raw)
In-Reply-To: <20230111072925.l47t4ahgynqjyegq@sirius.home.kraxel.org>
On 1/11/23 08:29, Gerd Hoffmann wrote:
>> (9) BTW, still regarding commit 2a0bd3bffc80 ("OvmfPkg/PlatformInitLib:
>> q35 mtrr setup fix", 2022-09-28) -- does the following code comment
>> still apply?
>>
>> //
>> // Set the memory range from the start of the 32-bit MMIO area (32-bit PCI
>> // MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable.
>> //
>>
>> Because, in case the new branch introduced by commit 2a0bd3bffc80 takes
>> effect (namely, "Uc32Base = BASE_2GB"), I'm not sure where the PCIEXBAR
>> starts, and then the above comment may no longer hold.
>
> PCIEXBAR location doesn't change, it's still at 0xb0000000.
>>> @@ -158,11 +157,11 @@ PlatformMemMapInitialization (
>>> // the base of the 32-bit PCI host aperture.
>>> //
>>> PciExBarBase = PcdGet64 (PcdPciExpressBaseAddress);
>>> - ASSERT (TopOfLowRam <= PciExBarBase);
>>> + ASSERT (PlatformInfoHob->LowMemory <= PciExBarBase);
>>> ASSERT (PciExBarBase <= MAX_UINT32 - SIZE_256MB);
>>> PciBase = (UINT32)(PciExBarBase + SIZE_256MB);
>>
>> This change looks OK, but, similarly to my question (9):
>>
>> (11) Is the following comment still up to date:
>>
>> //
>> // The MMCONFIG area is expected to fall between the top of low RAM and
>> // the base of the 32-bit PCI host aperture.
>> //
>>
>> with regard to the new branch introduced by commit 2a0bd3bffc80
>> ("OvmfPkg/PlatformInitLib: q35 mtrr setup fix", 2022-09-28)?
>
> root@fedora ~# cat /proc/iomem
> [ ... ]
> 7ebfe000-7effffff : System RAM
> 7f000000-7fffffff : Reserved
> 80000000-afffffff : PCI Bus 0000:00
> b0000000-bfffffff : PCI MMCONFIG 0000 [bus 00-ff]
> b0000000-bfffffff : Reserved
> b0000000-bfffffff : pnp 00:04
> c0000000-febfffff : PCI Bus 0000:00
> [ ... ]
> root@fedora ~# cat /proc/mtrr
> reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
> reg01: base=0x800000000 (32768MB), size=32768MB, count=1: uncachable
Ugh, what? :)
I was about to point out a contradiction between (a) the following from
commit 2a0bd3bffc80:
+ // Newer qemu with gigabyte aligned memory,
+ // 32-bit pci mmio window is 2G -> 4G then.
and (b) your confirmation that the PCIEXBAR location does not change.
Namely, I was about to point out that PCIEXBAR -- *config space*
expressed as MMIO -- would then overlap the 32-bit MMIO aperture, the
one that's assignable to BARs as MMIO space.
But then your /proc/iomem quote actually confirms this is what happens
in practice -- and apparently works??? In Linux anyways?
FWIW I don't see how this is safe with regard to the firmware. Even if
QEMU is capable of generating a set of discontiguous resource
descriptors in the DSDT / _CRS, and Linux is capable of dealing with
that, I don't understand how the firmware does it.
Has it only been working by chance, perhaps? PciHostBridgeDxe uses this
intersection-based MMIO addition that we discussed in the BZ, so I
certainly don't expect a conflict there; after all, the MMIO space used
for MMCONFIG and the MMIO space used for BAR assignments should mostly
be the same; IOW there should be no conflict or compatibility problem at
the GCD level that PciHostBridgeDxe chokes on.
But when the actual BARs are assigned (allocated), what prevents them
from being allocated from the overlapping MMCONFIG "sub" interval? I
think there's a problem there, we just have not hit it yet.
OK, OK, let me review what the code actually does. Here's the relevant
part of the call tree:
InitializePlatform() [OvmfPkg/PlatformPei/Platform.c]
[1] PlatformQemuUc32BaseInitialization() [OvmfPkg/Library/PlatformInitLib/MemDetect.c]
InitializeRamRegions() [OvmfPkg/PlatformPei/MemDetect.c]
[2] PlatformQemuInitializeRam() [OvmfPkg/Library/PlatformInitLib/MemDetect.c]
MemMapInitialization() [OvmfPkg/PlatformPei/Platform.c]
[3] PlatformMemMapInitialization() [OvmfPkg/Library/PlatformInitLib/Platform.c]
MiscInitialization() [OvmfPkg/PlatformPei/Platform.c]
PlatformMiscInitialization() [OvmfPkg/Library/PlatformInitLib/Platform.c]
[4] PciExBarInitialization() [OvmfPkg/Library/PlatformInitLib/Platform.c]
[5] AmdSevDxeEntryPoint() [OvmfPkg/AmdSevDxe/AmdSevDxe.c]
- At [1], the most recent state is from commit 2a0bd3bffc80
("OvmfPkg/PlatformInitLib: q35 mtrr setup fix", 2022-09-28). What this
commit does is, it lowers (conditionally) the base of the area that
we'll mark as UC ("Uc32Base"), from 0xB0000000 (2816 MB, the start of
the EXBAR) to 2048 MB.
It does not change the location of MMCONFIG *or* the 32-bit MMIO
aperture. It only adds a *comment* like this:
+ // Newer qemu with gigabyte aligned memory,
+ // 32-bit pci mmio window is 2G -> 4G then.
which is a *natural language statement* about the 32-bit MMIO
aperture, but no code change.
- At [2], we mark the area [Uc32Base, 4GB) as uncacheable in the MTRRs.
There is a comment saying:
//
// Set the memory range from the start of the 32-bit MMIO area (32-bit PCI
// MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable.
//
And this comment *conflicts* with the one from [1]. Because on Q35,
the new Uc32Base (2GB) may not actually match the PCIEXBAR start (2816
MB). Therefore commit 2a0bd3bffc80 made the comment at [2] stale. This
is issue {1}.
Still, this is not a functional error, as we're only marking a
*larger* (and simpler-to-express) area as UC than before. When this
new logic fires, we have *nothing* in the space between the 32-bit RAM
and the EXBAR base -- thus far, anyway!.
- At [3], we have a comment saying
//
// The MMCONFIG area is expected to fall between the top of low RAM and
// the base of the 32-bit PCI host aperture.
//
plus code that actually *implements* this. In spite of the *comment*
added in commit 2a0bd3bffc80, at [1], the logic at [3] *still* --
correctly -- places the 32-bit MMIO aperture *above* the PCIEXBAR.
This is what PciHostBridgeDxe will expose as aperture, and what
PciBusDxe will assign BARs from. No conflict is possible with the
MMCONFIG area.
This is in fact confirmed by the following log snippet, when booting a
pc-q35-7.2 guest with 1792 MB of RAM:
> PciHostBridgeUtilityInitRootBridge: populated root bus 0, with room for 255 subordinate bus(es)
> RootBridge: PciRoot(0x0)
> Support/Attr: 70069 / 70069
> DmaAbove4G: No
> NoExtConfSpace: No
> AllocAttr: 3 (CombineMemPMem Mem64Decode)
> Bus: 0 - FF Translation=0
> Io: 6000 - FFFF Translation=0
> Mem: C0000000 - FBFFFFFF Translation=0
> MemAbove4G: 800000000 - FFFFFFFFF Translation=0
> PMem: FFFFFFFFFFFFFFFF - 0 Translation=0
> PMemAbove4G: FFFFFFFFFFFFFFFF - 0 Translation=0
Note "Mem: C0000000 - FBFFFFFF Translation=0".
- Still at [3], we cover the MMCONFIG area with a reserved memory HOB
(not MMIO HOB). This is alright; and again the reason we have no
conflict in PciHostBridgeDxe is that the 32-bit MMIO aperture *still*
starts above the EXBAR; it does not "surround" it.
- At [4], we actually program the EXBAR.
- At [5], early in the DXE phase, knowing that we marked the EXBAR as
reserved memory and not as MMIO, we explicitly decrypt (when on SEV)
the EXBAR.
OK, so what do we learn from the above?
- issue {1}: the statement in [2] PlatformQemuInitializeRam(), in the
following comment:
//
// Set the memory range from the start of the 32-bit MMIO area (32-bit PCI
// MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable.
//
was broken by commit 2a0bd3bffc80.
- issue {2}: the statement from commit 2a0bd3bffc80
+ // 32-bit pci mmio window is 2G -> 4G then.
is not factual, as far as the firmware is concerned. I don't know
*how* Linux ends up extending the MMIO aperture so far down that it
streddles / embeds MMCONFIG, but AFAICT it has nothing to do with the
firmware.
Therefore, I also don't understand where the requirement comes (from
Linux? where?) that the firmware mark the "gap" between 2048 MB and
2816 MB as uncached. The firmware does not use it for anything, so why
does the Linux kernel do? And if the Linux kernel does, then why does
it not reprogram the MTRRs as well?
The commit message from commit 2a0bd3bffc80 states, "Which effectively
makes the 32-bit pci mmio window start at 0x80000000".
I'm precisely after that "effectively" adverb here: placing the 32-bit
MMIO aperture at 2048 MB is not *at all* what the firmware does.
... OK, I think I've found it! It's the following QEMU commit:
4a4418369d6d ("q35: fix mmconfig and PCI0._CRS", 2019-06-16).
I've even found the original discussion:
- v1: http://mid.mail-archive.com/20190528204331.5280-1-kraxel@redhat.com
- v2: http://mid.mail-archive.com/20190607073429.3436-1-kraxel@redhat.com
So, this seems to happen by QEMU moving the hole as low as possible in
the \SB.PCI0 _CRS, in the DSDT, punching gaps as neeed. And Linux
adheres to that.
I've tested it with said pc-q35-7.2 guest with 1792 MB of RAM, running
RHEL-7.9 (that's what I've had handy). The MTRR update from edk2 commit
2a0bd3bffc80 takes effect:
> [ 0.000000] MTRR variable ranges enabled:
> [ 0.000000] 0 base 0080000000 mask FF80000000 uncachable
*but* the start of the 32-bit MMIO range (which is discontiguous) appears
even lower, per QEMU commit 4a4418369d6d:
> [ 0.000000] e820: [mem 0x70000000-0xafffffff] available for PCI devices
> ...
> [ 0.281120] pci_bus 0000:00: root bus resource [mem 0x70000000-0xafffffff window]
> ...
> [ 0.529741] pci 0000:00:02.0: BAR 6: assigned [mem 0x70000000-0x7000ffff pref]
> ...
> [ 0.565903] pci_bus 0000:00: resource 7 [mem 0x70000000-0xafffffff window]
and
> 70000000-afffffff : PCI Bus 0000:00
> 70000000-7000ffff : 0000:00:02.0
> b0000000-bfffffff : PCI MMCONFIG 0000 [bus 00-ff]
> b0000000-bfffffff : reserved
> b0000000-bfffffff : pnp 00:04
> c0000000-febfffff : PCI Bus 0000:00
> c0000000-c0ffffff : 0000:00:02.0
> c0000000-c0ffffff : bochs-drm
There is *no sign* of 0x70000000 in the firmware log. So Linux
effectively ressigns some PCI resources, and utilizes the
(discontiguous) area that QEMU exposes in the DSDT, with the firmware
knowing nothing about it.
Note, again, that our UC settings in the MTRR start *higher*, at 2GB.
So not only am I unconvinced (or at least confused!) that edk2 commit
2a0bd3bffc80 was necessary, it even *seems* insufficient, for
UC-coverage of the 32-bit MMIO aperture.
In practice though, it doesn't seem to cause issues! (Which again
questions if the original change in 2a0bd3bffc80 was really necessary.)
I've filed a new TianoCore BZ about clarifying the comments please:
https://bugzilla.tianocore.org/show_bug.cgi?id=4289
> With gigabyte-alignment being the common case these days it might make
> sense to place the MMCONFIG area at 0xe0000000 instead ...
I feel really unsafe about complicating this code even further.
Laszlo
next prev parent reply other threads:[~2023-01-11 13:56 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-10 8:21 [PATCH v2 0/4] OvmfPkg: check 64bit mmio window for resource conflicts Gerd Hoffmann
2023-01-10 8:21 ` [PATCH v2 1/4] OvmfPkg/PlatformInitLib: Add PlatformScanE820 and GetFirstNonAddressCB Gerd Hoffmann
2023-01-10 15:41 ` Laszlo Ersek
2023-01-10 8:21 ` [PATCH v2 2/4] OvmfPkg/PlatformInitLib: Add PlatformGetLowMemoryCB Gerd Hoffmann
2023-01-10 16:53 ` Laszlo Ersek
2023-01-11 7:29 ` Gerd Hoffmann
2023-01-11 13:56 ` Laszlo Ersek [this message]
2023-01-11 15:23 ` Gerd Hoffmann
2023-01-12 11:09 ` Laszlo Ersek
2023-01-12 14:03 ` Gerd Hoffmann
2023-01-12 15:44 ` Laszlo Ersek
2023-01-10 8:21 ` [PATCH v2 3/4] OvmfPkg/PlatformInitLib: Add PlatformAddHobCB Gerd Hoffmann
2023-01-10 17:42 ` Laszlo Ersek
2023-01-11 8:06 ` Gerd Hoffmann
2023-01-11 14:08 ` Laszlo Ersek
2023-01-10 8:21 ` [PATCH v2 4/4] OvmfPkg/PlatformInitLib: Add PlatformReservationConflictCB Gerd Hoffmann
2023-01-10 17:55 ` Laszlo Ersek
2023-01-11 8:26 ` Gerd Hoffmann
2023-01-12 10:36 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ace3789-7192-0c53-b4b2-f62f907176d0@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox