From: Laszlo Ersek <lersek@redhat.com>
To: edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Alex Williamson <alex.williamson@redhat.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Jordan Justen <jordan.l.justen@intel.com>,
Liming Gao <liming.gao@intel.com>,
Michael Kinney <michael.d.kinney@intel.com>,
Yonghong Zhu <yonghong.zhu@intel.com>
Subject: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
Date: Fri, 11 Aug 2017 02:34:26 +0200 [thread overview]
Message-ID: <20170811003426.2332-2-lersek@redhat.com> (raw)
In-Reply-To: <20170811003426.2332-1-lersek@redhat.com>
Several users have recently reported boot failures with OVMF. The symptoms
are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found
the commit (via bisection) that exposes the issue. In this patch I'll
attempt to analyze the symptoms and fix the root problem.
(1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that
provides multiprocessing services. It is built into the CpuMpPei
module (for the PEI phase) and the CpuDxe module (for the DXE and BDS
phases). When either of these modules starts up during boot, it
briefly boots up all of the CPUs, perfoms some counting /
synchronization, and then all the APs (application processors) are
quiesced until further work arrives.
The APs boot in real mode, and need assembly language init code (a
"reset vector") that gradually takes them into 64-bit mode, before
they can call back into normal C code. The reset vector code is
assembled at build time, but it is copied as data under 1MB (for real
mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe
launches.
Part of the reset vector code is FPU initialization. The reset vector
in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly
language function InitializeFloatingPointUnits(), which is however
provided by an external library. Normally, when the object files were
linked together, this function call would result in an absolute
reference (and matching relocation, performed at module loading), and
the instance of the reset vector that was copied under 1MB during boot
would refer to the same, unchanged, absolute address of
InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe.
This didn't work with X64 XCODE5 builds however. XCODE5 prefers
RIP-relative addressing for X64 (i.e., the assembly language function
call depended on the distance between the call site and the callee).
When the call site was copied under 1MB but
InitializeFloatingPointUnits() would not move, the call broke.
This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987
("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues",
2017-05-17). The solution was to add another function pointer to the
MP_CPU_EXCHANGE_INFO communication area. (The function pointer would
have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would
assign the absolute address of InitializeFloatingPointUnits() to the
new field, and the APs would retrieve it -- matching the pointer size
correctly --, and call it. The C assignment can be seen in the
following hunk (from commit 3b2928b46987):
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 407c44c95e62..735e099b32f2 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -751,6 +751,8 @@ FillExchangeInfoData (
>
> ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();
>
> + ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;
> +
> //
> // Get the BSP's data of GDT and IDT
> //
This moved the relocation of the InitializeFloatingPointUnits()
reference (or, with XCODE5, the RIP-relative
InitializeFloatingPointUnits() reference) from the assembly code
(which is copied around) to the C code (which is not copied around).
(2) Unfortunately, the C-language assignment to
"MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is
miscompiled under the following circumstances:
- the edk2 build target is DEBUG (or RELEASE),
- the edk2 toolchain selection is GCC5,
- (from the above two it follows that -Os and LTO are enabled,)
- MpInitLib is built for X64,
- and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).
(If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the
PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks
in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)
Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show
the correct assembly code generated by gcc for the assignment, and
flip the build target to DEBUG for showing the broken assembly code.
(3) Correct code for the assignment:
> 5406: 48 8d 15 73 24 00 00 lea 0x2473(%rip),%rdx # 7880 <InitializeFloatingPointUnits>
> 540d: 48 8b 45 f8 mov -0x8(%rbp),%rax
> 5411: 48 89 90 84 00 00 00 mov %rdx,0x84(%rax)
Here the absolute address of InitializeFloatingPointUnits() is loaded
into RDX with RIP-relative addressing. Then RAX is pointed to the
ExchangeInfo structure, and finally RDX is stored into
ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is
0x84 in the structure).
Here the only relocation happens when CpuMpPei is linked. The distance
between the call site and the callee is calculated and encoded in the
LEA instruction. The same distance is valid when CpuMpPei runs, so the
PEI core doesn't have to relocate the LEA instruction when it loads
CpuMpPei.
(4) Incorrect code for the assignment:
> 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx
> ...
> 2d7b: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp)
> ...
> 0000000000005860 <InitializeFloatingPointUnits>:
> 5860: 9b db e3 finit
This is not RIP-relative addressing. Therefore gcc generates a
relocation record for the address encoded in the MOV instruction, at
offset 0x2d69+0x3 -- initial value being 0x5860:
> Sections:
> Idx Name Size VMA LMA File off Algn
> 0 .text 00006e66 0000000000000240 0000000000000240 000000c0 2**6
> CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> ...
> RELOCATION RECORDS FOR [.text]:
> OFFSET TYPE VALUE
> ...
> 0000000000002b2c R_X86_64_32S InitializeFloatingPointUnits
(The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which
equals the .text section base plus the relocation offset, i.e.,
0x240+0x2b2c.)
This is the only R_X86_64_32S relocation record in the linked-together
CpuMpPei.debug file (still an ELF file). In the rest of the build
process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image
(CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF
relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).
The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon first
when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi
could theoretically execute from flash, and PEIFV has a nonzero base
address. (However, CpuMpPei has a DEPEX on permanent PEI RAM
discovery, so the PEI core will only launch it from permanent PEI RAM,
in practice.)
The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for
the second time when the PEI core loads CpuMpPei into permanent PEI
RAM, and performs the relocations. I confirmed (with debug messages)
that the relocated address (within the instruction) is correct:
> PeiCore:PeCoffLoaderRelocateImage: Fixup32=BFF25D6C FixupData=0
> PeiCore:PeCoffLoaderRelocateImage: *Fixup32=0x8487A0 Adjust=0xBF6E00C0
> Loading PEIM at 0x000BFF23000 EntryPoint=0x000BFF271BB CpuMpPei.efi
The "Fixup32" pointer points to the address constant that should be
relocated (0xBFF25D6C - 0xBFF23000 == 0x2D6C). The initial value of
(*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that
I described above (as "first relocation").
After incrementing (*Fixup32) by "Adjust", the final value is
0xBFF28860. This is the absolute address that is patched into the MOV
instruction that sets RDX for the assignment. This value is correct;
the InitializeFloatingPointUnits() function really exists in CpuMpPei
at this address, in permanent PEI RAM. It is at an offset of 0x5860
from the load address 0x000BFF23000 (see third line).
So where do things go wrong? Again, the MOV instruction itself is
wrong. I modified the MpInitLib C code to print the value of the field
right after the assignment, and I got:
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
> CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @ 0xFFFFFFFFBFF28860
The APs are ejected into outer space when they try to init their FPUs
-- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should
store 0x00000000_BFF28860!
(5) So open the Intel SDM, and decode the instruction by hand:
> 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx
0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W).
Looking up REX.W + 0xc7 (the next byte) under the MOV specification,
we get:
- opcode: REX.W + C7 /0
- instruction: MOV r/m64, imm32
- Description: Move imm32 sign extended to 64-bits to r/m64.
"Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860
-- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the
sign bit is set. And MOV sign-extends 0xBFF28860 to
0xFFFFFFFF_BFF28860 in the assignment to RDX.
So... why on earth generates gcc this MOV instruction, with sign
extension?
(6) It does because we told it to. From the gcc manual:
> -mcmodel=small
> Generate code for the small code model: the program and its symbols
> must be linked in the lower 2 GB of the address space. Pointers are
> 64 bits. Programs can be statically or dynamically linked. This is
> the default code model.
and please refer to the following commit (message slightly rewrapped
here):
> commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0
> Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Date: Sat Jul 16 00:16:11 2016 +0200
>
> BaseTools/tools_def: switch GCC/X64 to the PIE small model
>
> The ordinary small code model for x86_64 cannot be used in UEFI,
> since it assumes the executable is loaded in the first 2 GB of
> memory. Therefore, we use the large model instead, which can execute
> anywhere, but uses absolute 64-bit wide quantities for all symbol
> references, which is costly in terms of code size.
>
> So switch to the PIE small code model, this uses 32-bit relative
> references where possible, but does not make any assumptions about
> the load address (i.e., all absolute symbol references are 64-bits
> wide). Note that, due to the 'protected' visibility pragma
> introduced in an earlier patch, there is no need for the EDK2 build
> system to deal with GOT related ELF relocation types.
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
> Tested-by: Laszlo Ersek <lersek@redhat.com>
> Tested-By: Liming Gao <liming.gao@intel.com>
> Reviewed-by: Liming Gao <liming.gao@intel.com>
Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make
assumptions about the load address, despite the fact that we didn't
just replace "-mcmodel=large" with "-mcmodel=small", but we also added
"-fpie".
(7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do
not revert it whole-sale. Instead, undo it only for the LTO build env
where it might cause problems, by appending "-fno-pie -mcmodel=large".
The assignment is then compiled like this:
> 33d7: 48 ba 60 64 00 00 00 movabs $0x6460,%rdx
> 33de: 00 00 00
> ...
> 33e5: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp)
> ...
> 0000000000006460 <InitializeFloatingPointUnits>:
And the matching relocation is (note the target offset
0x240+0x3199=0x33d9):
> Sections:
> Idx Name Size VMA LMA File off Algn
> 0 .text 00007c5e 0000000000000240 0000000000000240 000000c0 2**6
> CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> ...
> RELOCATION RECORDS FOR [.text]:
> OFFSET TYPE VALUE
> ...
> 0000000000003199 R_X86_64_64 InitializeFloatingPointUnits
(8) Size changes by this patch:
$ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \
-t GCC5 -b DEBUG \
-D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE
Before:
> SECFV [10%Full] 212992 total, 21616 used, 191376 free
> FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free
> DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free
> PEIFV [21%Full] 917504 total, 194480 used, 723024 free
After:
> SECFV [11%Full] 212992 total, 23728 used, 189264 free
> FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free
> DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free
> PEIFV [24%Full] 917504 total, 224304 used, 693200 free
$ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \
-t GCC5 -b DEBUG \
-D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \
-D SECURE_BOOT_ENABLE -D SMM_REQUIRE
Before:
> SECFV [10%Full] 212992 total, 21536 used, 191456 free
> FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free
> DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free
> PEIFV [22%Full] 917504 total, 202032 used, 715472 free
After:
> SECFV [10%Full] 212992 total, 21536 used, 191456 free
> FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free
> DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free
> PEIFV [22%Full] 917504 total, 202032 used, 715472 free
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: Yonghong Zhu <yonghong.zhu@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
BaseTools/Conf/tools_def.template | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
index 1fa3ca3ceae9..2ef495540e5f 100755
--- a/BaseTools/Conf/tools_def.template
+++ b/BaseTools/Conf/tools_def.template
@@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS = DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl,
*_GCC5_X64_OBJCOPY_FLAGS =
*_GCC5_X64_NASM_FLAGS = -f elf64
- DEBUG_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os
+ DEBUG_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os
DEBUG_GCC5_X64_DLINK_FLAGS = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
-RELEASE_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
+RELEASE_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
RELEASE_GCC5_X64_DLINK_FLAGS = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
NOOPT_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -O0
--
2.13.1.3.g8be5a757fa67
next prev parent reply other threads:[~2017-08-11 0:32 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-11 0:34 [PATCH 0/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO Laszlo Ersek
2017-08-11 0:34 ` Laszlo Ersek [this message]
2017-08-11 5:28 ` [PATCH 1/1] " Shi, Steven
2017-08-11 11:18 ` Laszlo Ersek
2017-08-12 3:05 ` Shi, Steven
2017-08-15 15:45 ` Laszlo Ersek
2017-08-22 8:00 ` Shi, Steven
2017-08-22 11:59 ` Laszlo Ersek
2017-08-22 12:23 ` Gao, Liming
2017-08-22 13:27 ` Paolo Bonzini
2017-08-22 14:03 ` Ard Biesheuvel
2017-08-22 14:15 ` Paolo Bonzini
2017-08-22 16:04 ` Laszlo Ersek
2017-08-22 16:06 ` Paolo Bonzini
2017-08-11 10:03 ` Ard Biesheuvel
2017-08-11 10:30 ` Laszlo Ersek
2017-08-11 22:21 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170811003426.2332-2-lersek@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox