From: julien@xen.org
To: Igor Druzhinin <igor.druzhinin@citrix.com>,
devel@edk2.groups.io, xen-devel@lists.xenproject.org
Cc: jordan.l.justen@intel.com, lersek@redhat.com,
ard.biesheuvel@arm.com, anthony.perard@citrix.com
Subject: Re: [PATCH] OvmfPkg/XenPlatformPei: Use CPUID to get physical address width on Xen
Date: Tue, 19 Jan 2021 09:37:57 +0000 [thread overview]
Message-ID: <2a806f26-04f7-a96b-522c-118ac94df8c0@xen.org> (raw)
In-Reply-To: <1610509335-23314-1-git-send-email-igor.druzhinin@citrix.com>
Hi Igor,
On 13/01/2021 03:42, Igor Druzhinin wrote:
> We faced a problem with passing through a PCI device with 64GB BAR to
> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
> 64G address which pushes physical address space to 37 bits. That is above
> 36-bit width that OVMF exposes currently to a guest without tweaking
> PcdPciMmio64Size knob.
>
> The reverse calculation using this knob was inhereted from QEMU-KVM platform
> code where it serves the purpose of finding max accessible physical
> address without necessary trusting emulated CPUID physbits value (that could
> be different from host physbits). On Xen we expect to use CPUID policy
> to level the data correctly to prevent situations with guest physbits >
> host physbits e.g. across migrations.
>
> The next aspect raising concern - resource consumption for DXE IPL page tables
> and time required to map the whole address space in case of using CPUID
> bits directly. That could be mitigated by enabling support for 1G pages
> in DXE IPL configuration. 1G pages are available on most CPUs produced in
> the last 10 years and those without don't have many phys bits.
>
> Remove all the redundant code now (including PcdPciMmio64.. handling that's
> not used on Xen anyway) and grab physbits directly from CPUID that should
> be what baremetal UEFI systems do.
>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>
Cheers,
> ---
> OvmfPkg/OvmfXen.dsc | 3 +
> OvmfPkg/XenPlatformPei/MemDetect.c | 166 +++----------------------------------
> 2 files changed, 15 insertions(+), 154 deletions(-)
>
> diff --git a/OvmfPkg/OvmfXen.dsc b/OvmfPkg/OvmfXen.dsc
> index 7d31e88..8ae6ed0 100644
> --- a/OvmfPkg/OvmfXen.dsc
> +++ b/OvmfPkg/OvmfXen.dsc
> @@ -444,6 +444,9 @@
> ## Xen vlapic's frequence is 100 MHz
> gEfiMdePkgTokenSpaceGuid.PcdFSBClock|100000000
>
> + # We populate DXE IPL tables with 1G pages preferably on Xen
> + gEfiMdeModulePkgTokenSpaceGuid.PcdUse1GPageTable|TRUE
> +
> ################################################################################
> #
> # Pcd Dynamic Section - list of all EDK II PCD Entries defined by this Platform
> diff --git a/OvmfPkg/XenPlatformPei/MemDetect.c b/OvmfPkg/XenPlatformPei/MemDetect.c
> index 1f81eee..1970b63 100644
> --- a/OvmfPkg/XenPlatformPei/MemDetect.c
> +++ b/OvmfPkg/XenPlatformPei/MemDetect.c
> @@ -172,175 +172,33 @@ GetSystemMemorySizeBelow4gb (
> return (UINT32) (((UINTN)((Cmos0x35 << 8) + Cmos0x34) << 16) + SIZE_16MB);
> }
>
> -
> -STATIC
> -UINT64
> -GetSystemMemorySizeAbove4gb (
> - )
> -{
> - UINT32 Size;
> - UINTN CmosIndex;
> -
> - //
> - // In PVH case, there is no CMOS, we have to calculate the memory size
> - // from parsing the E820
> - //
> - if (XenPvhDetected ()) {
> - UINT64 HighestAddress;
> -
> - HighestAddress = GetHighestSystemMemoryAddress (FALSE);
> - ASSERT (HighestAddress == 0 || HighestAddress >= BASE_4GB);
> -
> - if (HighestAddress >= BASE_4GB) {
> - HighestAddress -= BASE_4GB;
> - }
> -
> - return HighestAddress;
> - }
> -
> - //
> - // CMOS 0x5b-0x5d specifies the system memory above 4GB MB.
> - // * CMOS(0x5d) is the most significant size byte
> - // * CMOS(0x5c) is the middle size byte
> - // * CMOS(0x5b) is the least significant size byte
> - // * The size is specified in 64kb chunks
> - //
> -
> - Size = 0;
> - for (CmosIndex = 0x5d; CmosIndex >= 0x5b; CmosIndex--) {
> - Size = (UINT32) (Size << 8) + (UINT32) CmosRead8 (CmosIndex);
> - }
> -
> - return LShiftU64 (Size, 16);
> -}
> -
> -
> -/**
> - Return the highest address that DXE could possibly use, plus one.
> -**/
> -STATIC
> -UINT64
> -GetFirstNonAddress (
> - VOID
> - )
> -{
> - UINT64 FirstNonAddress;
> - UINT64 Pci64Base, Pci64Size;
> - RETURN_STATUS PcdStatus;
> -
> - FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
> -
> - //
> - // If DXE is 32-bit, then we're done; PciBusDxe will degrade 64-bit MMIO
> - // resources to 32-bit anyway. See DegradeResource() in
> - // "PciResourceSupport.c".
> - //
> -#ifdef MDE_CPU_IA32
> - if (!FeaturePcdGet (PcdDxeIplSwitchToLongMode)) {
> - return FirstNonAddress;
> - }
> -#endif
> -
> - //
> - // Otherwise, in order to calculate the highest address plus one, we must
> - // consider the 64-bit PCI host aperture too. Fetch the default size.
> - //
> - Pci64Size = PcdGet64 (PcdPciMmio64Size);
> -
> - if (Pci64Size == 0) {
> - if (mBootMode != BOOT_ON_S3_RESUME) {
> - DEBUG ((DEBUG_INFO, "%a: disabling 64-bit PCI host aperture\n",
> - __FUNCTION__));
> - PcdStatus = PcdSet64S (PcdPciMmio64Size, 0);
> - ASSERT_RETURN_ERROR (PcdStatus);
> - }
> -
> - //
> - // There's nothing more to do; the amount of memory above 4GB fully
> - // determines the highest address plus one. The memory hotplug area (see
> - // below) plays no role for the firmware in this case.
> - //
> - return FirstNonAddress;
> - }
> -
> - //
> - // SeaBIOS aligns both boundaries of the 64-bit PCI host aperture to 1GB, so
> - // that the host can map it with 1GB hugepages. Follow suit.
> - //
> - Pci64Base = ALIGN_VALUE (FirstNonAddress, (UINT64)SIZE_1GB);
> - Pci64Size = ALIGN_VALUE (Pci64Size, (UINT64)SIZE_1GB);
> -
> - //
> - // The 64-bit PCI host aperture should also be "naturally" aligned. The
> - // alignment is determined by rounding the size of the aperture down to the
> - // next smaller or equal power of two. That is, align the aperture by the
> - // largest BAR size that can fit into it.
> - //
> - Pci64Base = ALIGN_VALUE (Pci64Base, GetPowerOfTwo64 (Pci64Size));
> -
> - if (mBootMode != BOOT_ON_S3_RESUME) {
> - //
> - // The core PciHostBridgeDxe driver will automatically add this range to
> - // the GCD memory space map through our PciHostBridgeLib instance; here we
> - // only need to set the PCDs.
> - //
> - PcdStatus = PcdSet64S (PcdPciMmio64Base, Pci64Base);
> - ASSERT_RETURN_ERROR (PcdStatus);
> - PcdStatus = PcdSet64S (PcdPciMmio64Size, Pci64Size);
> - ASSERT_RETURN_ERROR (PcdStatus);
> -
> - DEBUG ((DEBUG_INFO, "%a: Pci64Base=0x%Lx Pci64Size=0x%Lx\n",
> - __FUNCTION__, Pci64Base, Pci64Size));
> - }
> -
> - //
> - // The useful address space ends with the 64-bit PCI host aperture.
> - //
> - FirstNonAddress = Pci64Base + Pci64Size;
> - return FirstNonAddress;
> -}
> -
> -
> /**
> - Initialize the mPhysMemAddressWidth variable, based on guest RAM size.
> + Initialize the mPhysMemAddressWidth variable, based on CPUID data.
> **/
> VOID
> AddressWidthInitialization (
> VOID
> )
> {
> - UINT64 FirstNonAddress;
> + UINT32 RegEax;
>
> - //
> - // As guest-physical memory size grows, the permanent PEI RAM requirements
> - // are dominated by the identity-mapping page tables built by the DXE IPL.
> - // The DXL IPL keys off of the physical address bits advertized in the CPU
> - // HOB. To conserve memory, we calculate the minimum address width here.
> - //
> - FirstNonAddress = GetFirstNonAddress ();
> - mPhysMemAddressWidth = (UINT8)HighBitSet64 (FirstNonAddress);
> -
> - //
> - // If FirstNonAddress is not an integral power of two, then we need an
> - // additional bit.
> - //
> - if ((FirstNonAddress & (FirstNonAddress - 1)) != 0) {
> - ++mPhysMemAddressWidth;
> + AsmCpuid (0x80000000, &RegEax, NULL, NULL, NULL);
> + if (RegEax >= 0x80000008) {
> + AsmCpuid (0x80000008, &RegEax, NULL, NULL, NULL);
> + mPhysMemAddressWidth = (UINT8) RegEax;
> + } else {
> + mPhysMemAddressWidth = 36;
> }
>
> //
> - // The minimum address width is 36 (covers up to and excluding 64 GB, which
> - // is the maximum for Ia32 + PAE). The theoretical architecture maximum for
> - // X64 long mode is 52 bits, but the DXE IPL clamps that down to 48 bits. We
> - // can simply assert that here, since 48 bits are good enough for 256 TB.
> + // IA-32e paging translates 48-bit linear addresses to 52-bit physical addresses.
> //
> - if (mPhysMemAddressWidth <= 36) {
> - mPhysMemAddressWidth = 36;
> + ASSERT (mPhysMemAddressWidth <= 52);
> + if (mPhysMemAddressWidth > 48) {
> + mPhysMemAddressWidth = 48;
> }
> - ASSERT (mPhysMemAddressWidth <= 48);
> }
>
> -
> /**
> Calculate the cap for the permanent PEI memory.
> **/
>
--
Julien Grall
next prev parent reply other threads:[~2021-01-19 9:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-13 3:42 [PATCH] OvmfPkg/XenPlatformPei: Use CPUID to get physical address width on Xen Igor Druzhinin
2021-01-13 3:55 ` Laszlo Ersek
2021-01-19 9:37 ` julien [this message]
2021-01-19 14:49 ` [edk2-devel] " Laszlo Ersek
2021-01-19 15:52 ` Anthony PERARD
2021-01-19 17:23 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2a806f26-04f7-a96b-522c-118ac94df8c0@xen.org \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox