From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 3CD3E21D147AF for ; Mon, 10 Jul 2017 20:20:56 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5B21A4E4CA; Tue, 11 Jul 2017 03:22:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 5B21A4E4CA Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=lersek@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 5B21A4E4CA Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-79.phx2.redhat.com [10.3.116.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id A91B753C32; Tue, 11 Jul 2017 03:22:38 +0000 (UTC) From: Laszlo Ersek To: edk2-devel-01 Cc: "Dr. David Alan Gilbert" , Gerd Hoffmann , Igor Mammedov , Jordan Justen Date: Tue, 11 Jul 2017 05:22:31 +0200 Message-Id: <20170711032231.29280-2-lersek@redhat.com> In-Reply-To: <20170711032231.29280-1-lersek@redhat.com> References: <20170711032231.29280-1-lersek@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 11 Jul 2017 03:22:41 +0000 (UTC) Subject: [PATCH 1/1] OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jul 2017 03:20:56 -0000 In OVMF we currently get the upper (>=4GB) memory size with the GetSystemMemorySizeAbove4gb() function. The GetSystemMemorySizeAbove4gb() function is used in two places: (1) It is the starting point of the calculations in GetFirstNonAddress(). GetFirstNonAddress() in turn - determines the placement of the 64-bit PCI MMIO aperture, - provides input for the GCD memory space map's sizing (see AddressWidthInitialization(), and the CPU HOB in MiscInitialization()), - influences the permanent PEI RAM cap (the DXE core's page tables, built in permanent PEI RAM, grow as the RAM to map grows). (2) In QemuInitializeRam(), GetSystemMemorySizeAbove4gb() determines the single memory descriptor HOB that we produce for the upper memory. Respectively, there are two problems with GetSystemMemorySizeAbove4gb(): (1) It reads a 24-bit count of 64KB RAM chunks from the CMOS, and therefore cannot return a larger value than one terabyte. (2) It cannot express discontiguous high RAM. Starting with version 1.7.0, QEMU has provided the fw_cfg file called "etc/e820". Refer to the following QEMU commits: - 0624c7f916b4 ("e820: pass high memory too.", 2013-10-10), - 7d67110f2d9a ("pc: add etc/e820 fw_cfg file", 2013-10-18) - 7db16f2480db ("pc: register e820 entries for ram", 2013-10-10) Ever since these commits in v1.7.0 -- with the last QEMU release being v2.9.0, and v2.10.0 under development --, the only two RAM entries added to this E820 map correspond to the below-4GB RAM range, and the above-4GB RAM range. And, the above-4GB range exactly matches the CMOS registers in question; see the use of "pcms->above_4g_mem_size": pc_q35_init() | pc_init1() pc_memory_init() e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM); pc_cmos_init() val = pcms->above_4g_mem_size / 65536; rtc_set_memory(s, 0x5b, val); rtc_set_memory(s, 0x5c, val >> 8); rtc_set_memory(s, 0x5d, val >> 16); Therefore, remedy the above OVMF limitations as follows: (1) Start off GetFirstNonAddress() by scanning the E820 map for the highest exclusive >=4GB RAM address. Fall back to the CMOS if the E820 map is unavailable. Base all further calculations (such as 64-bit PCI MMIO aperture placement, GCD sizing etc) on this value. At the moment, the only difference this change makes is that we can have more than 1TB above 4GB -- given that the sole "high RAM" entry in the E820 map matches the CMOS exactly, modulo the most significant bits (see above). However, Igor plans to add discontiguous (cold-plugged) high RAM to the fw_cfg E820 RAM map later on, and then this scanning will adapt automatically. (2) In QemuInitializeRam(), describe the high RAM regions from the E820 map one by one with memory HOBs. Fall back to the CMOS only if the E820 map is missing. Again, right now this change only makes a difference if there is at least 1TB high RAM. Later on it will adapt to discontiguous high RAM (regardless of its size) automatically. -*- Implementation details: introduce the E820HighRamIterate() function, which reads the E820 entries from fw_cfg, and calls the requested callback function on each high RAM entry found. The RAM map is not read in a single go, because its size can vary, and in PlatformPei we should stay away from dynamic memory allocation, for the following reasons: - "Pool" allocations are limited to ~64KB, are served from HOBs, and cannot be released ever. - "Page" allocations are seriously limited before PlatformPei installs the permanent PEI RAM. Furthermore, page allocations can only be released in DXE, with dedicated code (so the address would have to be passed on with a HOB or PCD). - Raw memory allocation HOBs would require the same freeing in DXE. Therefore we process each E820 entry as soon as it is read from fw_cfg. -*- Considering the impact of high RAM on the DXE core: A few years ago, installing high RAM as *tested* would cause the DXE core to inhabit such ranges rather than carving out its home from the permanent PEI RAM. Fortunately, this was fixed in the following edk2 commit: 3a05b13106d1, "MdeModulePkg DxeCore: Take the range in resource HOB for PHIT as higher priority", 2015-09-18 which I regression-tested at the time: http://mid.mail-archive.com/55FC27B0.4070807@redhat.com Later on, OVMF was changed to install its high RAM as tested (effectively "arming" the earlier DXE core change for OVMF), in the following edk2 commit: 035ce3b37c90, "OvmfPkg/PlatformPei: Add memory above 4GB as tested", 2016-04-21 which I also regression-tested at the time: http://mid.mail-archive.com/571E8B90.1020102@redhat.com Therefore adding more "tested memory" HOBs is safe. Cc: "Dr. David Alan Gilbert" Cc: Gerd Hoffmann Cc: Igor Mammedov Cc: Jordan Justen Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1468526 Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Laszlo Ersek --- OvmfPkg/PlatformPei/MemDetect.c | 161 +++++++++++++++++++- 1 file changed, 159 insertions(+), 2 deletions(-) diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c index 97f3fa5afcf5..67e136252e1f 100644 --- a/OvmfPkg/PlatformPei/MemDetect.c +++ b/OvmfPkg/PlatformPei/MemDetect.c @@ -19,6 +19,7 @@ Module Name: // // The package level header files this module uses // +#include #include #include @@ -103,6 +104,142 @@ Q35TsegMbytesInitialization ( } +/** + Callback function for the high RAM entries in QEMU's fw_cfg E820 RAM map. + + @param[in] HighRamEntry The EFI_E820_ENTRY64 structure to process. + + @param[in,out] Context Opaque context object used while looping over the + RAM map. +**/ +typedef +VOID +(*E820_HIGH_RAM_ENTRY_CALLBACK) ( + IN CONST EFI_E820_ENTRY64 *HighRamEntry, + IN OUT VOID *Context + ); + + +/** + Iterate over the high RAM entries in QEMU's fw_cfg E820 RAM map. + + @param[in] Callback The callback function to pass each high RAM entry to. + + @param[in,out] Context Context to pass to Callback invariably on each + invocation. + + @retval EFI_SUCCESS The fw_cfg E820 RAM map was found and processed. + + @retval EFI_PROTOCOL_ERROR The RAM map was found, but its size wasn't a + whole multiple of sizeof(EFI_E820_ENTRY64). + Callback() was not invoked. + + @return Error codes from QemuFwCfgFindFile(). Callback() + was not invoked. +**/ +STATIC +EFI_STATUS +E820HighRamIterate ( + IN E820_HIGH_RAM_ENTRY_CALLBACK Callback, + IN OUT VOID *Context + ) +{ + EFI_STATUS Status; + FIRMWARE_CONFIG_ITEM FwCfgItem; + UINTN FwCfgSize; + EFI_E820_ENTRY64 E820Entry; + UINTN Processed; + + Status = QemuFwCfgFindFile ("etc/e820", &FwCfgItem, &FwCfgSize); + if (EFI_ERROR (Status)) { + return Status; + } + if (FwCfgSize % sizeof E820Entry != 0) { + return EFI_PROTOCOL_ERROR; + } + + QemuFwCfgSelectItem (FwCfgItem); + for (Processed = 0; Processed < FwCfgSize; Processed += sizeof E820Entry) { + QemuFwCfgReadBytes (sizeof E820Entry, &E820Entry); + DEBUG (( + DEBUG_VERBOSE, + "%a: Base=0x%Lx Length=0x%Lx Type=%u\n", + __FUNCTION__, + E820Entry.BaseAddr, + E820Entry.Length, + E820Entry.Type + )); + if (E820Entry.Type == EfiAcpiAddressRangeMemory && + E820Entry.BaseAddr >= BASE_4GB) { + Callback (&E820Entry, Context); + } + } + return EFI_SUCCESS; +} + + +/** + Callback function for E820HighRamIterate() that finds the highest exclusive + >=4GB RAM address. + + @param[in] HighRamEntry The EFI_E820_ENTRY64 structure to process. + + @param[in,out] MaxAddress The highest exclusive >=4GB RAM address, + represented as a UINT64, that has been found thus + far in the search. Before calling + E820HighRamIterate(), the caller shall set + MaxAddress to BASE_4GB. When E820HighRamIterate() + returns with success, MaxAddress holds the highest + exclusive >=4GB RAM address. +**/ +VOID +E820HighRamFindHighestExclusiveAddress ( + IN CONST EFI_E820_ENTRY64 *HighRamEntry, + IN OUT VOID *MaxAddress + ) +{ + UINT64 *Current; + UINT64 Candidate; + + Current = MaxAddress; + Candidate = HighRamEntry->BaseAddr + HighRamEntry->Length; + if (Candidate > *Current) { + *Current = Candidate; + DEBUG ((DEBUG_VERBOSE, "%a: MaxAddress=0x%Lx\n", __FUNCTION__, *Current)); + } +} + + +/** + Callback function for E820HighRamIterate() that produces memory resource + descriptor HOBs. + + @param[in] HighRamEntry The EFI_E820_ENTRY64 structure to process. + + @param[in,out] Context Ignored. +**/ +VOID +E820HighRamAddMemoryHob ( + IN CONST EFI_E820_ENTRY64 *HighRamEntry, + IN OUT VOID *Context + ) +{ + UINT64 Base; + UINT64 End; + + // + // Round up the start address, and round down the end address. + // + Base = ALIGN_VALUE (HighRamEntry->BaseAddr, (UINT64)EFI_PAGE_SIZE); + End = (HighRamEntry->BaseAddr + HighRamEntry->Length) & + ~(UINT64)EFI_PAGE_MASK; + if (Base < End) { + AddMemoryRangeHob (Base, End); + DEBUG ((DEBUG_VERBOSE, "%a: [0x%Lx, 0x%Lx)\n", __FUNCTION__, Base, End)); + } +} + + UINT32 GetSystemMemorySizeBelow4gb ( VOID @@ -170,7 +307,21 @@ GetFirstNonAddress ( UINT64 HotPlugMemoryEnd; RETURN_STATUS PcdStatus; - FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb (); + // + // If QEMU presents an E820 map, then get the highest exclusive >=4GB RAM + // address from it. This can express an address >= 4GB+1TB. + // + // Otherwise, get the flat size of the memory above 4GB from the CMOS (which + // can only express a size smaller than 1TB), and add it to 4GB. + // + FirstNonAddress = BASE_4GB; + Status = E820HighRamIterate ( + E820HighRamFindHighestExclusiveAddress, + &FirstNonAddress + ); + if (EFI_ERROR (Status)) { + FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb (); + } // // If DXE is 32-bit, then we're done; PciBusDxe will degrade 64-bit MMIO @@ -525,7 +676,13 @@ QemuInitializeRam ( AddMemoryRangeHob (BASE_1MB, LowerMemorySize); } - if (UpperMemorySize != 0) { + // + // If QEMU presents an E820 map, then create memory HOBs for the >=4GB RAM + // entries. Otherwise, create a single memory HOB with the flat >=4GB + // memory size read from the CMOS. + // + Status = E820HighRamIterate (E820HighRamAddMemoryHob, NULL); + if (EFI_ERROR (Status) && UpperMemorySize != 0) { AddMemoryBaseSizeHob (BASE_4GB, UpperMemorySize); } } -- 2.13.1.3.g8be5a757fa67