public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Subject: [PATCH v2 1/1] OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM
Date: Sat,  5 Aug 2017 01:00:43 +0200	[thread overview]
Message-ID: <20170804230043.12977-2-lersek@redhat.com> (raw)
In-Reply-To: <20170804230043.12977-1-lersek@redhat.com>

In OVMF we currently get the upper (>=4GB) memory size with the
GetSystemMemorySizeAbove4gb() function.

The GetSystemMemorySizeAbove4gb() function is used in two places:

(1) It is the starting point of the calculations in GetFirstNonAddress().
    GetFirstNonAddress() in turn
    - determines the placement of the 64-bit PCI MMIO aperture,
    - provides input for the GCD memory space map's sizing (see
      AddressWidthInitialization(), and the CPU HOB in
      MiscInitialization()),
    - influences the permanent PEI RAM cap (the DXE core's page tables,
      built in permanent PEI RAM, grow as the RAM to map grows).

(2) In QemuInitializeRam(), GetSystemMemorySizeAbove4gb() determines the
    single memory descriptor HOB that we produce for the upper memory.

Respectively, there are two problems with GetSystemMemorySizeAbove4gb():

(1) It reads a 24-bit count of 64KB RAM chunks from the CMOS, and
    therefore cannot return a larger value than one terabyte.

(2) It cannot express discontiguous high RAM.

Starting with version 1.7.0, QEMU has provided the fw_cfg file called
"etc/e820". Refer to the following QEMU commits:

- 0624c7f916b4 ("e820: pass high memory too.", 2013-10-10),
- 7d67110f2d9a ("pc: add etc/e820 fw_cfg file", 2013-10-18)
- 7db16f2480db ("pc: register e820 entries for ram", 2013-10-10)

Ever since these commits in v1.7.0 -- with the last QEMU release being
v2.9.0, and v2.10.0 under development --, the only two RAM entries added
to this E820 map correspond to the below-4GB RAM range, and the above-4GB
RAM range. And, the above-4GB range exactly matches the CMOS registers in
question; see the use of "pcms->above_4g_mem_size":

  pc_q35_init() | pc_init1()
    pc_memory_init()
      e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM);
    pc_cmos_init()
      val = pcms->above_4g_mem_size / 65536;
      rtc_set_memory(s, 0x5b, val);
      rtc_set_memory(s, 0x5c, val >> 8);
      rtc_set_memory(s, 0x5d, val >> 16);

Therefore, remedy the above OVMF limitations as follows:

(1) Start off GetFirstNonAddress() by scanning the E820 map for the
    highest exclusive >=4GB RAM address. Fall back to the CMOS if the E820
    map is unavailable. Base all further calculations (such as 64-bit PCI
    MMIO aperture placement, GCD sizing etc) on this value.

    At the moment, the only difference this change makes is that we can
    have more than 1TB above 4GB -- given that the sole "high RAM" entry
    in the E820 map matches the CMOS exactly, modulo the most significant
    bits (see above).

    However, Igor plans to add discontiguous (cold-plugged) high RAM to
    the fw_cfg E820 RAM map later on, and then this scanning will adapt
    automatically.

(2) In QemuInitializeRam(), describe the high RAM regions from the E820
    map one by one with memory HOBs. Fall back to the CMOS only if the
    E820 map is missing.

    Again, right now this change only makes a difference if there is at
    least 1TB high RAM. Later on it will adapt to discontiguous high RAM
    (regardless of its size) automatically.

-*-

Implementation details: introduce the ScanOrAdd64BitE820Ram() function,
which reads the E820 entries from fw_cfg, and finds the highest exclusive
>=4GB RAM address, or produces memory resource descriptor HOBs for RAM
entries that start at or above 4GB. The RAM map is not read in a single
go, because its size can vary, and in PlatformPei we should stay away from
dynamic memory allocation, for the following reasons:

- "Pool" allocations are limited to ~64KB, are served from HOBs, and
  cannot be released ever.

- "Page" allocations are seriously limited before PlatformPei installs the
  permanent PEI RAM. Furthermore, page allocations can only be released in
  DXE, with dedicated code (so the address would have to be passed on with
  a HOB or PCD).

- Raw memory allocation HOBs would require the same freeing in DXE.

Therefore we process each E820 entry as soon as it is read from fw_cfg.

-*-

Considering the impact of high RAM on the DXE core:

A few years ago, installing high RAM as *tested* would cause the DXE core
to inhabit such ranges rather than carving out its home from the permanent
PEI RAM. Fortunately, this was fixed in the following edk2 commit:

  3a05b13106d1, "MdeModulePkg DxeCore: Take the range in resource HOB for
                PHIT as higher priority", 2015-09-18

which I regression-tested at the time:

  http://mid.mail-archive.com/55FC27B0.4070807@redhat.com

Later on, OVMF was changed to install its high RAM as tested (effectively
"arming" the earlier DXE core change for OVMF), in the following edk2
commit:

  035ce3b37c90, "OvmfPkg/PlatformPei: Add memory above 4GB as tested",
                2016-04-21

which I also regression-tested at the time:

  http://mid.mail-archive.com/571E8B90.1020102@redhat.com

Therefore adding more "tested memory" HOBs is safe.

Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1468526
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---

Notes:
    v2:
    
    - rename E820HighRamIterate() to ScanOrAdd64BitE820Ram() [Jordan]
    
    - fold E820HighRamFindHighestExclusiveAddress() and
      E820HighRamAddMemoryHob() into ScanOrAdd64BitE820Ram(), dependent on
      the optional "MaxAddress" output parameter [Jordan]

 OvmfPkg/PlatformPei/MemDetect.c | 129 +++++++++++++++++++-
 1 file changed, 127 insertions(+), 2 deletions(-)

diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index 97f3fa5afcf5..2b2f3e4bec55 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
@@ -19,6 +19,7 @@ Module Name:
 //
 // The package level header files this module uses
 //
+#include <IndustryStandard/E820.h>
 #include <IndustryStandard/Q35MchIch9.h>
 #include <PiPei.h>
 
@@ -103,6 +104,109 @@ Q35TsegMbytesInitialization (
 }
 
 
+/**
+  Iterate over the RAM entries in QEMU's fw_cfg E820 RAM map that start outside
+  of the 32-bit address range.
+
+  Find the highest exclusive >=4GB RAM address, or produce memory resource
+  descriptor HOBs for RAM entries that start at or above 4GB.
+
+  @param[out] MaxAddress  If MaxAddress is NULL, then ScanOrAdd64BitE820Ram()
+                          produces memory resource descriptor HOBs for RAM
+                          entries that start at or above 4GB.
+
+                          Otherwise, MaxAddress holds the highest exclusive
+                          >=4GB RAM address on output. If QEMU's fw_cfg E820
+                          RAM map contains no RAM entry that starts outside of
+                          the 32-bit address range, then MaxAddress is exactly
+                          4GB on output.
+
+  @retval EFI_SUCCESS         The fw_cfg E820 RAM map was found and processed.
+
+  @retval EFI_PROTOCOL_ERROR  The RAM map was found, but its size wasn't a
+                              whole multiple of sizeof(EFI_E820_ENTRY64). No
+                              RAM entry was processed.
+
+  @return                     Error codes from QemuFwCfgFindFile(). No RAM
+                              entry was processed.
+**/
+STATIC
+EFI_STATUS
+ScanOrAdd64BitE820Ram (
+  OUT UINT64 *MaxAddress OPTIONAL
+  )
+{
+  EFI_STATUS           Status;
+  FIRMWARE_CONFIG_ITEM FwCfgItem;
+  UINTN                FwCfgSize;
+  EFI_E820_ENTRY64     E820Entry;
+  UINTN                Processed;
+
+  Status = QemuFwCfgFindFile ("etc/e820", &FwCfgItem, &FwCfgSize);
+  if (EFI_ERROR (Status)) {
+    return Status;
+  }
+  if (FwCfgSize % sizeof E820Entry != 0) {
+    return EFI_PROTOCOL_ERROR;
+  }
+
+  if (MaxAddress != NULL) {
+    *MaxAddress = BASE_4GB;
+  }
+
+  QemuFwCfgSelectItem (FwCfgItem);
+  for (Processed = 0; Processed < FwCfgSize; Processed += sizeof E820Entry) {
+    QemuFwCfgReadBytes (sizeof E820Entry, &E820Entry);
+    DEBUG ((
+      DEBUG_VERBOSE,
+      "%a: Base=0x%Lx Length=0x%Lx Type=%u\n",
+      __FUNCTION__,
+      E820Entry.BaseAddr,
+      E820Entry.Length,
+      E820Entry.Type
+      ));
+    if (E820Entry.Type == EfiAcpiAddressRangeMemory &&
+        E820Entry.BaseAddr >= BASE_4GB) {
+      if (MaxAddress == NULL) {
+        UINT64 Base;
+        UINT64 End;
+
+        //
+        // Round up the start address, and round down the end address.
+        //
+        Base = ALIGN_VALUE (E820Entry.BaseAddr, (UINT64)EFI_PAGE_SIZE);
+        End = (E820Entry.BaseAddr + E820Entry.Length) &
+              ~(UINT64)EFI_PAGE_MASK;
+        if (Base < End) {
+          AddMemoryRangeHob (Base, End);
+          DEBUG ((
+            DEBUG_VERBOSE,
+            "%a: AddMemoryRangeHob [0x%Lx, 0x%Lx)\n",
+            __FUNCTION__,
+            Base,
+            End
+            ));
+        }
+      } else {
+        UINT64 Candidate;
+
+        Candidate = E820Entry.BaseAddr + E820Entry.Length;
+        if (Candidate > *MaxAddress) {
+          *MaxAddress = Candidate;
+          DEBUG ((
+            DEBUG_VERBOSE,
+            "%a: MaxAddress=0x%Lx\n",
+            __FUNCTION__,
+            *MaxAddress
+            ));
+        }
+      }
+    }
+  }
+  return EFI_SUCCESS;
+}
+
+
 UINT32
 GetSystemMemorySizeBelow4gb (
   VOID
@@ -170,7 +274,22 @@ GetFirstNonAddress (
   UINT64               HotPlugMemoryEnd;
   RETURN_STATUS        PcdStatus;
 
-  FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
+  //
+  // set FirstNonAddress to suppress incorrect compiler/analyzer warnings
+  //
+  FirstNonAddress = 0;
+
+  //
+  // If QEMU presents an E820 map, then get the highest exclusive >=4GB RAM
+  // address from it. This can express an address >= 4GB+1TB.
+  //
+  // Otherwise, get the flat size of the memory above 4GB from the CMOS (which
+  // can only express a size smaller than 1TB), and add it to 4GB.
+  //
+  Status = ScanOrAdd64BitE820Ram (&FirstNonAddress);
+  if (EFI_ERROR (Status)) {
+    FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
+  }
 
   //
   // If DXE is 32-bit, then we're done; PciBusDxe will degrade 64-bit MMIO
@@ -525,7 +644,13 @@ QemuInitializeRam (
       AddMemoryRangeHob (BASE_1MB, LowerMemorySize);
     }
 
-    if (UpperMemorySize != 0) {
+    //
+    // If QEMU presents an E820 map, then create memory HOBs for the >=4GB RAM
+    // entries. Otherwise, create a single memory HOB with the flat >=4GB
+    // memory size read from the CMOS.
+    //
+    Status = ScanOrAdd64BitE820Ram (NULL);
+    if (EFI_ERROR (Status) && UpperMemorySize != 0) {
       AddMemoryBaseSizeHob (BASE_4GB, UpperMemorySize);
     }
   }
-- 
2.13.1.3.g8be5a757fa67



  reply	other threads:[~2017-08-04 22:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-04 23:00 [PATCH v2 0/1] OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM Laszlo Ersek
2017-08-04 23:00 ` Laszlo Ersek [this message]
2017-08-05  0:17   ` [PATCH v2 1/1] " Jordan Justen
2017-08-05  1:54     ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170804230043.12977-2-lersek@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox