From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Laszlo Ersek <lersek@redhat.com>
Cc: edk2-devel-01 <edk2-devel@lists.01.org>,
Alex Williamson <alex.williamson@redhat.com>,
Jordan Justen <jordan.l.justen@intel.com>,
Liming Gao <liming.gao@intel.com>,
Michael Kinney <michael.d.kinney@intel.com>,
Yonghong Zhu <yonghong.zhu@intel.com>
Subject: Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
Date: Fri, 11 Aug 2017 11:03:10 +0100 [thread overview]
Message-ID: <CAKv+Gu_sM7oueLzK-PYJO5Uah7px5+q1H5CEiByXN4kAYtgwUw@mail.gmail.com> (raw)
In-Reply-To: <20170811003426.2332-2-lersek@redhat.com>
On 11 August 2017 at 01:34, Laszlo Ersek <lersek@redhat.com> wrote:
> Several users have recently reported boot failures with OVMF. The symptoms
> are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found
> the commit (via bisection) that exposes the issue. In this patch I'll
> attempt to analyze the symptoms and fix the root problem.
>
> (1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that
> provides multiprocessing services. It is built into the CpuMpPei
> module (for the PEI phase) and the CpuDxe module (for the DXE and BDS
> phases). When either of these modules starts up during boot, it
> briefly boots up all of the CPUs, perfoms some counting /
> synchronization, and then all the APs (application processors) are
> quiesced until further work arrives.
>
> The APs boot in real mode, and need assembly language init code (a
> "reset vector") that gradually takes them into 64-bit mode, before
> they can call back into normal C code. The reset vector code is
> assembled at build time, but it is copied as data under 1MB (for real
> mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe
> launches.
>
> Part of the reset vector code is FPU initialization. The reset vector
> in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly
> language function InitializeFloatingPointUnits(), which is however
> provided by an external library. Normally, when the object files were
> linked together, this function call would result in an absolute
> reference (and matching relocation, performed at module loading), and
> the instance of the reset vector that was copied under 1MB during boot
> would refer to the same, unchanged, absolute address of
> InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe.
>
> This didn't work with X64 XCODE5 builds however. XCODE5 prefers
> RIP-relative addressing for X64 (i.e., the assembly language function
> call depended on the distance between the call site and the callee).
> When the call site was copied under 1MB but
> InitializeFloatingPointUnits() would not move, the call broke.
>
> This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987
> ("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues",
> 2017-05-17). The solution was to add another function pointer to the
> MP_CPU_EXCHANGE_INFO communication area. (The function pointer would
> have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would
> assign the absolute address of InitializeFloatingPointUnits() to the
> new field, and the APs would retrieve it -- matching the pointer size
> correctly --, and call it. The C assignment can be seen in the
> following hunk (from commit 3b2928b46987):
>
>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> index 407c44c95e62..735e099b32f2 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> @@ -751,6 +751,8 @@ FillExchangeInfoData (
>>
>> ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();
>>
>> + ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;
>> +
>> //
>> // Get the BSP's data of GDT and IDT
>> //
>
> This moved the relocation of the InitializeFloatingPointUnits()
> reference (or, with XCODE5, the RIP-relative
> InitializeFloatingPointUnits() reference) from the assembly code
> (which is copied around) to the C code (which is not copied around).
>
> (2) Unfortunately, the C-language assignment to
> "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is
> miscompiled under the following circumstances:
>
> - the edk2 build target is DEBUG (or RELEASE),
> - the edk2 toolchain selection is GCC5,
> - (from the above two it follows that -Os and LTO are enabled,)
> - MpInitLib is built for X64,
> - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).
>
> (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the
> PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks
> in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)
>
> Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show
> the correct assembly code generated by gcc for the assignment, and
> flip the build target to DEBUG for showing the broken assembly code.
>
> (3) Correct code for the assignment:
>
>> 5406: 48 8d 15 73 24 00 00 lea 0x2473(%rip),%rdx # 7880 <InitializeFloatingPointUnits>
>> 540d: 48 8b 45 f8 mov -0x8(%rbp),%rax
>> 5411: 48 89 90 84 00 00 00 mov %rdx,0x84(%rax)
>
> Here the absolute address of InitializeFloatingPointUnits() is loaded
> into RDX with RIP-relative addressing. Then RAX is pointed to the
> ExchangeInfo structure, and finally RDX is stored into
> ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is
> 0x84 in the structure).
>
> Here the only relocation happens when CpuMpPei is linked. The distance
> between the call site and the callee is calculated and encoded in the
> LEA instruction. The same distance is valid when CpuMpPei runs, so the
> PEI core doesn't have to relocate the LEA instruction when it loads
> CpuMpPei.
>
> (4) Incorrect code for the assignment:
>
>> 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx
>> ...
>> 2d7b: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp)
>> ...
>> 0000000000005860 <InitializeFloatingPointUnits>:
>> 5860: 9b db e3 finit
>
> This is not RIP-relative addressing. Therefore gcc generates a
> relocation record for the address encoded in the MOV instruction, at
> offset 0x2d69+0x3 -- initial value being 0x5860:
>
>> Sections:
>> Idx Name Size VMA LMA File off Algn
>> 0 .text 00006e66 0000000000000240 0000000000000240 000000c0 2**6
>> CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>> ...
>> RELOCATION RECORDS FOR [.text]:
>> OFFSET TYPE VALUE
>> ...
>> 0000000000002b2c R_X86_64_32S InitializeFloatingPointUnits
>
> (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which
> equals the .text section base plus the relocation offset, i.e.,
> 0x240+0x2b2c.)
>
> This is the only R_X86_64_32S relocation record in the linked-together
> CpuMpPei.debug file (still an ELF file). In the rest of the build
> process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image
> (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF
> relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).
>
> The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon first
> when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi
> could theoretically execute from flash, and PEIFV has a nonzero base
> address. (However, CpuMpPei has a DEPEX on permanent PEI RAM
> discovery, so the PEI core will only launch it from permanent PEI RAM,
> in practice.)
>
> The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for
> the second time when the PEI core loads CpuMpPei into permanent PEI
> RAM, and performs the relocations. I confirmed (with debug messages)
> that the relocated address (within the instruction) is correct:
>
>> PeiCore:PeCoffLoaderRelocateImage: Fixup32=BFF25D6C FixupData=0
>> PeiCore:PeCoffLoaderRelocateImage: *Fixup32=0x8487A0 Adjust=0xBF6E00C0
>> Loading PEIM at 0x000BFF23000 EntryPoint=0x000BFF271BB CpuMpPei.efi
>
> The "Fixup32" pointer points to the address constant that should be
> relocated (0xBFF25D6C - 0xBFF23000 == 0x2D6C). The initial value of
> (*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that
> I described above (as "first relocation").
>
> After incrementing (*Fixup32) by "Adjust", the final value is
> 0xBFF28860. This is the absolute address that is patched into the MOV
> instruction that sets RDX for the assignment. This value is correct;
> the InitializeFloatingPointUnits() function really exists in CpuMpPei
> at this address, in permanent PEI RAM. It is at an offset of 0x5860
> from the load address 0x000BFF23000 (see third line).
>
> So where do things go wrong? Again, the MOV instruction itself is
> wrong. I modified the MpInitLib C code to print the value of the field
> right after the assignment, and I got:
>
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>> CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @ 0xFFFFFFFFBFF28860
>
> The APs are ejected into outer space when they try to init their FPUs
> -- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should
> store 0x00000000_BFF28860!
>
> (5) So open the Intel SDM, and decode the instruction by hand:
>
>> 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx
>
> 0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W).
> Looking up REX.W + 0xc7 (the next byte) under the MOV specification,
> we get:
> - opcode: REX.W + C7 /0
> - instruction: MOV r/m64, imm32
> - Description: Move imm32 sign extended to 64-bits to r/m64.
>
> "Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860
> -- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the
> sign bit is set. And MOV sign-extends 0xBFF28860 to
> 0xFFFFFFFF_BFF28860 in the assignment to RDX.
>
> So... why on earth generates gcc this MOV instruction, with sign
> extension?
>
> (6) It does because we told it to. From the gcc manual:
>
>> -mcmodel=small
>> Generate code for the small code model: the program and its symbols
>> must be linked in the lower 2 GB of the address space. Pointers are
>> 64 bits. Programs can be statically or dynamically linked. This is
>> the default code model.
>
> and please refer to the following commit (message slightly rewrapped
> here):
>
>> commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0
>> Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Date: Sat Jul 16 00:16:11 2016 +0200
>>
>> BaseTools/tools_def: switch GCC/X64 to the PIE small model
>>
>> The ordinary small code model for x86_64 cannot be used in UEFI,
>> since it assumes the executable is loaded in the first 2 GB of
>> memory. Therefore, we use the large model instead, which can execute
>> anywhere, but uses absolute 64-bit wide quantities for all symbol
>> references, which is costly in terms of code size.
>>
>> So switch to the PIE small code model, this uses 32-bit relative
>> references where possible, but does not make any assumptions about
>> the load address (i.e., all absolute symbol references are 64-bits
>> wide). Note that, due to the 'protected' visibility pragma
>> introduced in an earlier patch, there is no need for the EDK2 build
>> system to deal with GOT related ELF relocation types.
>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
>> Tested-by: Laszlo Ersek <lersek@redhat.com>
>> Tested-By: Liming Gao <liming.gao@intel.com>
>> Reviewed-by: Liming Gao <liming.gao@intel.com>
>
> Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make
> assumptions about the load address, despite the fact that we didn't
> just replace "-mcmodel=large" with "-mcmodel=small", but we also added
> "-fpie".
>
> (7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do
> not revert it whole-sale. Instead, undo it only for the LTO build env
> where it might cause problems, by appending "-fno-pie -mcmodel=large".
>
> The assignment is then compiled like this:
>
>> 33d7: 48 ba 60 64 00 00 00 movabs $0x6460,%rdx
>> 33de: 00 00 00
>> ...
>> 33e5: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp)
>> ...
>> 0000000000006460 <InitializeFloatingPointUnits>:
>
> And the matching relocation is (note the target offset
> 0x240+0x3199=0x33d9):
>
>> Sections:
>> Idx Name Size VMA LMA File off Algn
>> 0 .text 00007c5e 0000000000000240 0000000000000240 000000c0 2**6
>> CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>> ...
>> RELOCATION RECORDS FOR [.text]:
>> OFFSET TYPE VALUE
>> ...
>> 0000000000003199 R_X86_64_64 InitializeFloatingPointUnits
>
> (8) Size changes by this patch:
>
> $ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \
> -t GCC5 -b DEBUG \
> -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE
>
> Before:
>
>> SECFV [10%Full] 212992 total, 21616 used, 191376 free
>> FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free
>> DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free
>> PEIFV [21%Full] 917504 total, 194480 used, 723024 free
>
> After:
>
>> SECFV [11%Full] 212992 total, 23728 used, 189264 free
>> FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free
>> DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free
>> PEIFV [24%Full] 917504 total, 224304 used, 693200 free
>
> $ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \
> -t GCC5 -b DEBUG \
> -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \
> -D SECURE_BOOT_ENABLE -D SMM_REQUIRE
>
> Before:
>
>> SECFV [10%Full] 212992 total, 21536 used, 191456 free
>> FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free
>> DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free
>> PEIFV [22%Full] 917504 total, 202032 used, 715472 free
>
> After:
>
>> SECFV [10%Full] 212992 total, 21536 used, 191456 free
>> FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free
>> DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free
>> PEIFV [22%Full] 917504 total, 202032 used, 715472 free
>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Michael Kinney <michael.d.kinney@intel.com>
> Cc: Yonghong Zhu <yonghong.zhu@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
> BaseTools/Conf/tools_def.template | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
> index 1fa3ca3ceae9..2ef495540e5f 100755
> --- a/BaseTools/Conf/tools_def.template
> +++ b/BaseTools/Conf/tools_def.template
> @@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS = DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl,
> *_GCC5_X64_OBJCOPY_FLAGS =
> *_GCC5_X64_NASM_FLAGS = -f elf64
>
> - DEBUG_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os
> + DEBUG_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os
> DEBUG_GCC5_X64_DLINK_FLAGS = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
>
> -RELEASE_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
> +RELEASE_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
> RELEASE_GCC5_X64_DLINK_FLAGS = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
>
> NOOPT_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -O0
> --
> 2.13.1.3.g8be5a757fa67
>
Thanks for the interesting read. One question: is all code potentially
miscompiled? Or does it only affect code executing in real mode?
next prev parent reply other threads:[~2017-08-11 10:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-11 0:34 [PATCH 0/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO Laszlo Ersek
2017-08-11 0:34 ` [PATCH 1/1] " Laszlo Ersek
2017-08-11 5:28 ` Shi, Steven
2017-08-11 11:18 ` Laszlo Ersek
2017-08-12 3:05 ` Shi, Steven
2017-08-15 15:45 ` Laszlo Ersek
2017-08-22 8:00 ` Shi, Steven
2017-08-22 11:59 ` Laszlo Ersek
2017-08-22 12:23 ` Gao, Liming
2017-08-22 13:27 ` Paolo Bonzini
2017-08-22 14:03 ` Ard Biesheuvel
2017-08-22 14:15 ` Paolo Bonzini
2017-08-22 16:04 ` Laszlo Ersek
2017-08-22 16:06 ` Paolo Bonzini
2017-08-11 10:03 ` Ard Biesheuvel [this message]
2017-08-11 10:30 ` Laszlo Ersek
2017-08-11 22:21 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKv+Gu_sM7oueLzK-PYJO5Uah7px5+q1H5CEiByXN4kAYtgwUw@mail.gmail.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox