From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mx.groups.io with SMTP id smtpd.web10.24117.1673445370632223549 for ; Wed, 11 Jan 2023 05:56:10 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fC2qUQJR; spf=pass (domain: redhat.com, ip: 170.10.133.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673445369; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0CRLKTRsdaFsu4GYFDxXxlmoQ+2BZpF91PvXT88ETL8=; b=fC2qUQJRP0jhlpqyHC8rvg/YSmFxVOOUXy7UeD3zycueiyKOIiGPgd5Gwbg88dbnJdW/tK TzmvgqqC8rJuB0Z73bmFcqXZ9wgSzUlHxDvJquktLlo9cOmlJBbHpVeHPC2m8LRejmjJxV 9aRlW9iYAxOnAnQ+9hDRl9iKHoi09Sg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-252-UTRA0bedP5eejmR-WLbAow-1; Wed, 11 Jan 2023 08:56:08 -0500 X-MC-Unique: UTRA0bedP5eejmR-WLbAow-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0C38E802C1D; Wed, 11 Jan 2023 13:56:08 +0000 (UTC) Received: from [10.39.192.40] (unknown [10.39.192.40]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7579640C115E; Wed, 11 Jan 2023 13:56:06 +0000 (UTC) Message-ID: <4ace3789-7192-0c53-b4b2-f62f907176d0@redhat.com> Date: Wed, 11 Jan 2023 14:56:05 +0100 MIME-Version: 1.0 Subject: Re: [PATCH v2 2/4] OvmfPkg/PlatformInitLib: Add PlatformGetLowMemoryCB To: Gerd Hoffmann Cc: devel@edk2.groups.io, Pawel Polawski , Jiewen Yao , Oliver Steffen , Jordan Justen , Ard Biesheuvel References: <20230110082123.159521-1-kraxel@redhat.com> <20230110082123.159521-3-kraxel@redhat.com> <043b03d6-0a6b-c533-255b-24a7805d5cca@redhat.com> <20230111072925.l47t4ahgynqjyegq@sirius.home.kraxel.org> From: "Laszlo Ersek" In-Reply-To: <20230111072925.l47t4ahgynqjyegq@sirius.home.kraxel.org> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 1/11/23 08:29, Gerd Hoffmann wrote: >> (9) BTW, still regarding commit 2a0bd3bffc80 ("OvmfPkg/PlatformInitLib: >> q35 mtrr setup fix", 2022-09-28) -- does the following code comment >> still apply? >> >> // >> // Set the memory range from the start of the 32-bit MMIO area (32-bit PCI >> // MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable. >> // >> >> Because, in case the new branch introduced by commit 2a0bd3bffc80 takes >> effect (namely, "Uc32Base = BASE_2GB"), I'm not sure where the PCIEXBAR >> starts, and then the above comment may no longer hold. > > PCIEXBAR location doesn't change, it's still at 0xb0000000. >>> @@ -158,11 +157,11 @@ PlatformMemMapInitialization ( >>> // the base of the 32-bit PCI host aperture. >>> // >>> PciExBarBase = PcdGet64 (PcdPciExpressBaseAddress); >>> - ASSERT (TopOfLowRam <= PciExBarBase); >>> + ASSERT (PlatformInfoHob->LowMemory <= PciExBarBase); >>> ASSERT (PciExBarBase <= MAX_UINT32 - SIZE_256MB); >>> PciBase = (UINT32)(PciExBarBase + SIZE_256MB); >> >> This change looks OK, but, similarly to my question (9): >> >> (11) Is the following comment still up to date: >> >> // >> // The MMCONFIG area is expected to fall between the top of low RAM and >> // the base of the 32-bit PCI host aperture. >> // >> >> with regard to the new branch introduced by commit 2a0bd3bffc80 >> ("OvmfPkg/PlatformInitLib: q35 mtrr setup fix", 2022-09-28)? > > root@fedora ~# cat /proc/iomem > [ ... ] > 7ebfe000-7effffff : System RAM > 7f000000-7fffffff : Reserved > 80000000-afffffff : PCI Bus 0000:00 > b0000000-bfffffff : PCI MMCONFIG 0000 [bus 00-ff] > b0000000-bfffffff : Reserved > b0000000-bfffffff : pnp 00:04 > c0000000-febfffff : PCI Bus 0000:00 > [ ... ] > root@fedora ~# cat /proc/mtrr > reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable > reg01: base=0x800000000 (32768MB), size=32768MB, count=1: uncachable Ugh, what? :) I was about to point out a contradiction between (a) the following from commit 2a0bd3bffc80: + // Newer qemu with gigabyte aligned memory, + // 32-bit pci mmio window is 2G -> 4G then. and (b) your confirmation that the PCIEXBAR location does not change. Namely, I was about to point out that PCIEXBAR -- *config space* expressed as MMIO -- would then overlap the 32-bit MMIO aperture, the one that's assignable to BARs as MMIO space. But then your /proc/iomem quote actually confirms this is what happens in practice -- and apparently works??? In Linux anyways? FWIW I don't see how this is safe with regard to the firmware. Even if QEMU is capable of generating a set of discontiguous resource descriptors in the DSDT / _CRS, and Linux is capable of dealing with that, I don't understand how the firmware does it. Has it only been working by chance, perhaps? PciHostBridgeDxe uses this intersection-based MMIO addition that we discussed in the BZ, so I certainly don't expect a conflict there; after all, the MMIO space used for MMCONFIG and the MMIO space used for BAR assignments should mostly be the same; IOW there should be no conflict or compatibility problem at the GCD level that PciHostBridgeDxe chokes on. But when the actual BARs are assigned (allocated), what prevents them from being allocated from the overlapping MMCONFIG "sub" interval? I think there's a problem there, we just have not hit it yet. OK, OK, let me review what the code actually does. Here's the relevant part of the call tree: InitializePlatform() [OvmfPkg/PlatformPei/Platform.c] [1] PlatformQemuUc32BaseInitialization() [OvmfPkg/Library/PlatformInitLib/MemDetect.c] InitializeRamRegions() [OvmfPkg/PlatformPei/MemDetect.c] [2] PlatformQemuInitializeRam() [OvmfPkg/Library/PlatformInitLib/MemDetect.c] MemMapInitialization() [OvmfPkg/PlatformPei/Platform.c] [3] PlatformMemMapInitialization() [OvmfPkg/Library/PlatformInitLib/Platform.c] MiscInitialization() [OvmfPkg/PlatformPei/Platform.c] PlatformMiscInitialization() [OvmfPkg/Library/PlatformInitLib/Platform.c] [4] PciExBarInitialization() [OvmfPkg/Library/PlatformInitLib/Platform.c] [5] AmdSevDxeEntryPoint() [OvmfPkg/AmdSevDxe/AmdSevDxe.c] - At [1], the most recent state is from commit 2a0bd3bffc80 ("OvmfPkg/PlatformInitLib: q35 mtrr setup fix", 2022-09-28). What this commit does is, it lowers (conditionally) the base of the area that we'll mark as UC ("Uc32Base"), from 0xB0000000 (2816 MB, the start of the EXBAR) to 2048 MB. It does not change the location of MMCONFIG *or* the 32-bit MMIO aperture. It only adds a *comment* like this: + // Newer qemu with gigabyte aligned memory, + // 32-bit pci mmio window is 2G -> 4G then. which is a *natural language statement* about the 32-bit MMIO aperture, but no code change. - At [2], we mark the area [Uc32Base, 4GB) as uncacheable in the MTRRs. There is a comment saying: // // Set the memory range from the start of the 32-bit MMIO area (32-bit PCI // MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable. // And this comment *conflicts* with the one from [1]. Because on Q35, the new Uc32Base (2GB) may not actually match the PCIEXBAR start (2816 MB). Therefore commit 2a0bd3bffc80 made the comment at [2] stale. This is issue {1}. Still, this is not a functional error, as we're only marking a *larger* (and simpler-to-express) area as UC than before. When this new logic fires, we have *nothing* in the space between the 32-bit RAM and the EXBAR base -- thus far, anyway!. - At [3], we have a comment saying // // The MMCONFIG area is expected to fall between the top of low RAM and // the base of the 32-bit PCI host aperture. // plus code that actually *implements* this. In spite of the *comment* added in commit 2a0bd3bffc80, at [1], the logic at [3] *still* -- correctly -- places the 32-bit MMIO aperture *above* the PCIEXBAR. This is what PciHostBridgeDxe will expose as aperture, and what PciBusDxe will assign BARs from. No conflict is possible with the MMCONFIG area. This is in fact confirmed by the following log snippet, when booting a pc-q35-7.2 guest with 1792 MB of RAM: > PciHostBridgeUtilityInitRootBridge: populated root bus 0, with room for 255 subordinate bus(es) > RootBridge: PciRoot(0x0) > Support/Attr: 70069 / 70069 > DmaAbove4G: No > NoExtConfSpace: No > AllocAttr: 3 (CombineMemPMem Mem64Decode) > Bus: 0 - FF Translation=0 > Io: 6000 - FFFF Translation=0 > Mem: C0000000 - FBFFFFFF Translation=0 > MemAbove4G: 800000000 - FFFFFFFFF Translation=0 > PMem: FFFFFFFFFFFFFFFF - 0 Translation=0 > PMemAbove4G: FFFFFFFFFFFFFFFF - 0 Translation=0 Note "Mem: C0000000 - FBFFFFFF Translation=0". - Still at [3], we cover the MMCONFIG area with a reserved memory HOB (not MMIO HOB). This is alright; and again the reason we have no conflict in PciHostBridgeDxe is that the 32-bit MMIO aperture *still* starts above the EXBAR; it does not "surround" it. - At [4], we actually program the EXBAR. - At [5], early in the DXE phase, knowing that we marked the EXBAR as reserved memory and not as MMIO, we explicitly decrypt (when on SEV) the EXBAR. OK, so what do we learn from the above? - issue {1}: the statement in [2] PlatformQemuInitializeRam(), in the following comment: // // Set the memory range from the start of the 32-bit MMIO area (32-bit PCI // MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable. // was broken by commit 2a0bd3bffc80. - issue {2}: the statement from commit 2a0bd3bffc80 + // 32-bit pci mmio window is 2G -> 4G then. is not factual, as far as the firmware is concerned. I don't know *how* Linux ends up extending the MMIO aperture so far down that it streddles / embeds MMCONFIG, but AFAICT it has nothing to do with the firmware. Therefore, I also don't understand where the requirement comes (from Linux? where?) that the firmware mark the "gap" between 2048 MB and 2816 MB as uncached. The firmware does not use it for anything, so why does the Linux kernel do? And if the Linux kernel does, then why does it not reprogram the MTRRs as well? The commit message from commit 2a0bd3bffc80 states, "Which effectively makes the 32-bit pci mmio window start at 0x80000000". I'm precisely after that "effectively" adverb here: placing the 32-bit MMIO aperture at 2048 MB is not *at all* what the firmware does. ... OK, I think I've found it! It's the following QEMU commit: 4a4418369d6d ("q35: fix mmconfig and PCI0._CRS", 2019-06-16). I've even found the original discussion: - v1: http://mid.mail-archive.com/20190528204331.5280-1-kraxel@redhat.com - v2: http://mid.mail-archive.com/20190607073429.3436-1-kraxel@redhat.com So, this seems to happen by QEMU moving the hole as low as possible in the \SB.PCI0 _CRS, in the DSDT, punching gaps as neeed. And Linux adheres to that. I've tested it with said pc-q35-7.2 guest with 1792 MB of RAM, running RHEL-7.9 (that's what I've had handy). The MTRR update from edk2 commit 2a0bd3bffc80 takes effect: > [ 0.000000] MTRR variable ranges enabled: > [ 0.000000] 0 base 0080000000 mask FF80000000 uncachable *but* the start of the 32-bit MMIO range (which is discontiguous) appears even lower, per QEMU commit 4a4418369d6d: > [ 0.000000] e820: [mem 0x70000000-0xafffffff] available for PCI devices > ... > [ 0.281120] pci_bus 0000:00: root bus resource [mem 0x70000000-0xafffffff window] > ... > [ 0.529741] pci 0000:00:02.0: BAR 6: assigned [mem 0x70000000-0x7000ffff pref] > ... > [ 0.565903] pci_bus 0000:00: resource 7 [mem 0x70000000-0xafffffff window] and > 70000000-afffffff : PCI Bus 0000:00 > 70000000-7000ffff : 0000:00:02.0 > b0000000-bfffffff : PCI MMCONFIG 0000 [bus 00-ff] > b0000000-bfffffff : reserved > b0000000-bfffffff : pnp 00:04 > c0000000-febfffff : PCI Bus 0000:00 > c0000000-c0ffffff : 0000:00:02.0 > c0000000-c0ffffff : bochs-drm There is *no sign* of 0x70000000 in the firmware log. So Linux effectively ressigns some PCI resources, and utilizes the (discontiguous) area that QEMU exposes in the DSDT, with the firmware knowing nothing about it. Note, again, that our UC settings in the MTRR start *higher*, at 2GB. So not only am I unconvinced (or at least confused!) that edk2 commit 2a0bd3bffc80 was necessary, it even *seems* insufficient, for UC-coverage of the 32-bit MMIO aperture. In practice though, it doesn't seem to cause issues! (Which again questions if the original change in 2a0bd3bffc80 was really necessary.) I've filed a new TianoCore BZ about clarifying the comments please: https://bugzilla.tianocore.org/show_bug.cgi?id=4289 > With gigabyte-alignment being the common case these days it might make > sense to place the MMCONFIG area at 0xe0000000 instead ... I feel really unsafe about complicating this code even further. Laszlo