Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Laszlo Ersek <lersek@redhat.com>
Cc: edk2-devel-01 <edk2-devel@lists.01.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	 Jordan Justen <jordan.l.justen@intel.com>,
	Liming Gao <liming.gao@intel.com>,
	 Michael Kinney <michael.d.kinney@intel.com>,
	Yonghong Zhu <yonghong.zhu@intel.com>
Subject: Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
Date: Fri, 11 Aug 2017 11:03:10 +0100	[thread overview]
Message-ID: <CAKv+Gu_sM7oueLzK-PYJO5Uah7px5+q1H5CEiByXN4kAYtgwUw@mail.gmail.com> (raw)
In-Reply-To: <20170811003426.2332-2-lersek@redhat.com>

On 11 August 2017 at 01:34, Laszlo Ersek <lersek@redhat.com> wrote:
> Several users have recently reported boot failures with OVMF. The symptoms
> are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found
> the commit (via bisection) that exposes the issue. In this patch I'll
> attempt to analyze the symptoms and fix the root problem.
>
> (1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that
>     provides multiprocessing services. It is built into the CpuMpPei
>     module (for the PEI phase) and the CpuDxe module (for the DXE and BDS
>     phases). When either of these modules starts up during boot, it
>     briefly boots up all of the CPUs, perfoms some counting /
>     synchronization, and then all the APs (application processors) are
>     quiesced until further work arrives.
>
>     The APs boot in real mode, and need assembly language init code (a
>     "reset vector") that gradually takes them into 64-bit mode, before
>     they can call back into normal C code. The reset vector code is
>     assembled at build time, but it is copied as data under 1MB (for real
>     mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe
>     launches.
>
>     Part of the reset vector code is FPU initialization. The reset vector
>     in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly
>     language function InitializeFloatingPointUnits(), which is however
>     provided by an external library. Normally, when the object files were
>     linked together, this function call would result in an absolute
>     reference (and matching relocation, performed at module loading), and
>     the instance of the reset vector that was copied under 1MB during boot
>     would refer to the same, unchanged, absolute address of
>     InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe.
>
>     This didn't work with X64 XCODE5 builds however. XCODE5 prefers
>     RIP-relative addressing for X64 (i.e., the assembly language function
>     call depended on the distance between the call site and the callee).
>     When the call site was copied under 1MB but
>     InitializeFloatingPointUnits() would not move, the call broke.
>
>     This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987
>     ("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues",
>     2017-05-17). The solution was to add another function pointer to the
>     MP_CPU_EXCHANGE_INFO communication area. (The function pointer would
>     have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would
>     assign the absolute address of InitializeFloatingPointUnits() to the
>     new field, and the APs would retrieve it -- matching the pointer size
>     correctly --, and call it. The C assignment can be seen in the
>     following hunk (from commit 3b2928b46987):
>
>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> index 407c44c95e62..735e099b32f2 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> @@ -751,6 +751,8 @@ FillExchangeInfoData (
>>
>>    ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();
>>
>> +  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;
>> +
>>    //
>>    // Get the BSP's data of GDT and IDT
>>    //
>
>     This moved the relocation of the InitializeFloatingPointUnits()
>     reference (or, with XCODE5, the RIP-relative
>     InitializeFloatingPointUnits() reference) from the assembly code
>     (which is copied around) to the C code (which is not copied around).
>
> (2) Unfortunately, the C-language assignment to
>     "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is
>     miscompiled under the following circumstances:
>
>     - the edk2 build target is DEBUG (or RELEASE),
>     - the edk2 toolchain selection is GCC5,
>     - (from the above two it follows that -Os and LTO are enabled,)
>     - MpInitLib is built for X64,
>     - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).
>
>     (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the
>     PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks
>     in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)
>
>     Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show
>     the correct assembly code generated by gcc for the assignment, and
>     flip the build target to DEBUG for showing the broken assembly code.
>
> (3) Correct code for the assignment:
>
>>     5406:       48 8d 15 73 24 00 00    lea    0x2473(%rip),%rdx        # 7880 <InitializeFloatingPointUnits>
>>     540d:       48 8b 45 f8             mov    -0x8(%rbp),%rax
>>     5411:       48 89 90 84 00 00 00    mov    %rdx,0x84(%rax)
>
>     Here the absolute address of InitializeFloatingPointUnits() is loaded
>     into RDX with RIP-relative addressing. Then RAX is pointed to the
>     ExchangeInfo structure, and finally RDX is stored into
>     ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is
>     0x84 in the structure).
>
>     Here the only relocation happens when CpuMpPei is linked. The distance
>     between the call site and the callee is calculated and encoded in the
>     LEA instruction. The same distance is valid when CpuMpPei runs, so the
>     PEI core doesn't have to relocate the LEA instruction when it loads
>     CpuMpPei.
>
> (4) Incorrect code for the assignment:
>
>>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx
>>     ...
>>     2d7b:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>>     ...
>> 0000000000005860 <InitializeFloatingPointUnits>:
>>     5860:       9b db e3                finit
>
>     This is not RIP-relative addressing. Therefore gcc generates a
>     relocation record for the address encoded in the MOV instruction, at
>     offset 0x2d69+0x3 -- initial value being 0x5860:
>
>> Sections:
>> Idx Name          Size      VMA               LMA               File off  Algn
>>   0 .text         00006e66  0000000000000240  0000000000000240  000000c0  2**6
>>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>> ...
>> RELOCATION RECORDS FOR [.text]:
>> OFFSET           TYPE              VALUE
>> ...
>> 0000000000002b2c R_X86_64_32S      InitializeFloatingPointUnits
>
>     (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which
>     equals the .text section base plus the relocation offset, i.e.,
>     0x240+0x2b2c.)
>
>     This is the only R_X86_64_32S relocation record in the linked-together
>     CpuMpPei.debug file (still an ELF file). In the rest of the build
>     process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image
>     (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF
>     relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).
>
>     The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon first
>     when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi
>     could theoretically execute from flash, and PEIFV has a nonzero base
>     address. (However, CpuMpPei has a DEPEX on permanent PEI RAM
>     discovery, so the PEI core will only launch it from permanent PEI RAM,
>     in practice.)
>
>     The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for
>     the second time when the PEI core loads CpuMpPei into permanent PEI
>     RAM, and performs the relocations. I confirmed (with debug messages)
>     that the relocated address (within the instruction) is correct:
>
>> PeiCore:PeCoffLoaderRelocateImage: Fixup32=BFF25D6C FixupData=0
>> PeiCore:PeCoffLoaderRelocateImage: *Fixup32=0x8487A0 Adjust=0xBF6E00C0
>> Loading PEIM at 0x000BFF23000 EntryPoint=0x000BFF271BB CpuMpPei.efi
>
>     The "Fixup32" pointer points to the address constant that should be
>     relocated (0xBFF25D6C - 0xBFF23000 == 0x2D6C). The initial value of
>     (*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that
>     I described above (as "first relocation").
>
>     After incrementing (*Fixup32) by "Adjust", the final value is
>     0xBFF28860. This is the absolute address that is patched into the MOV
>     instruction that sets RDX for the assignment. This value is correct;
>     the InitializeFloatingPointUnits() function really exists in CpuMpPei
>     at this address, in permanent PEI RAM. It is at an offset of 0x5860
>     from the load address 0x000BFF23000 (see third line).
>
>     So where do things go wrong? Again, the MOV instruction itself is
>     wrong. I modified the MpInitLib C code to print the value of the field
>     right after the assignment, and I got:
>
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>> CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @ 0xFFFFFFFFBFF28860
>
>     The APs are ejected into outer space when they try to init their FPUs
>     -- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should
>     store 0x00000000_BFF28860!
>
> (5) So open the Intel SDM, and decode the instruction by hand:
>
>>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx
>
>     0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W).
>     Looking up REX.W + 0xc7 (the next byte) under the MOV specification,
>     we get:
>     - opcode: REX.W + C7 /0
>     - instruction: MOV r/m64, imm32
>     - Description: Move imm32 sign extended to 64-bits to r/m64.
>
>     "Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860
>     -- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the
>     sign bit is set. And MOV sign-extends 0xBFF28860 to
>     0xFFFFFFFF_BFF28860 in the assignment to RDX.
>
>     So... why on earth generates gcc this MOV instruction, with sign
>     extension?
>
> (6) It does because we told it to. From the gcc manual:
>
>> -mcmodel=small
>>     Generate code for the small code model: the program and its symbols
>>     must be linked in the lower 2 GB of the address space.  Pointers are
>>     64 bits.  Programs can be statically or dynamically linked.  This is
>>     the default code model.
>
>     and please refer to the following commit (message slightly rewrapped
>     here):
>
>> commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0
>> Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Date:   Sat Jul 16 00:16:11 2016 +0200
>>
>>     BaseTools/tools_def: switch GCC/X64 to the PIE small model
>>
>>     The ordinary small code model for x86_64 cannot be used in UEFI,
>>     since it assumes the executable is loaded in the first 2 GB of
>>     memory. Therefore, we use the large model instead, which can execute
>>     anywhere, but uses absolute 64-bit wide quantities for all symbol
>>     references, which is costly in terms of code size.
>>
>>     So switch to the PIE small code model, this uses 32-bit relative
>>     references where possible, but does not make any assumptions about
>>     the load address (i.e., all absolute symbol references are 64-bits
>>     wide). Note that, due to the 'protected' visibility pragma
>>     introduced in an earlier patch, there is no need for the EDK2 build
>>     system to deal with GOT related ELF relocation types.
>>
>>     Contributed-under: TianoCore Contribution Agreement 1.0
>>     Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>     Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
>>     Tested-by: Laszlo Ersek <lersek@redhat.com>
>>     Tested-By: Liming Gao <liming.gao@intel.com>
>>     Reviewed-by: Liming Gao <liming.gao@intel.com>
>
>     Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make
>     assumptions about the load address, despite the fact that we didn't
>     just replace "-mcmodel=large" with "-mcmodel=small", but we also added
>     "-fpie".
>
> (7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do
>     not revert it whole-sale. Instead, undo it only for the LTO build env
>     where it might cause problems, by appending "-fno-pie -mcmodel=large".
>
>     The assignment is then compiled like this:
>
>>     33d7:       48 ba 60 64 00 00 00    movabs $0x6460,%rdx
>>     33de:       00 00 00
>>     ...
>>     33e5:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>>     ...
>> 0000000000006460 <InitializeFloatingPointUnits>:
>
>     And the matching relocation is (note the target offset
>     0x240+0x3199=0x33d9):
>
>> Sections:
>> Idx Name          Size      VMA               LMA               File off  Algn
>>   0 .text         00007c5e  0000000000000240  0000000000000240  000000c0  2**6
>>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>> ...
>> RELOCATION RECORDS FOR [.text]:
>> OFFSET           TYPE              VALUE
>> ...
>> 0000000000003199 R_X86_64_64       InitializeFloatingPointUnits
>
> (8) Size changes by this patch:
>
>     $ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \
>         -t GCC5 -b DEBUG \
>         -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE
>
>       Before:
>
>> SECFV [10%Full] 212992 total, 21616 used, 191376 free
>> FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free
>> DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free
>> PEIFV [21%Full] 917504 total, 194480 used, 723024 free
>
>       After:
>
>> SECFV [11%Full] 212992 total, 23728 used, 189264 free
>> FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free
>> DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free
>> PEIFV [24%Full] 917504 total, 224304 used, 693200 free
>
>     $ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \
>         -t GCC5 -b DEBUG \
>         -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \
>         -D SECURE_BOOT_ENABLE -D SMM_REQUIRE
>
>       Before:
>
>> SECFV [10%Full] 212992 total, 21536 used, 191456 free
>> FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free
>> DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free
>> PEIFV [22%Full] 917504 total, 202032 used, 715472 free
>
>       After:
>
>> SECFV [10%Full] 212992 total, 21536 used, 191456 free
>> FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free
>> DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free
>> PEIFV [22%Full] 917504 total, 202032 used, 715472 free
>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Michael Kinney <michael.d.kinney@intel.com>
> Cc: Yonghong Zhu <yonghong.zhu@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  BaseTools/Conf/tools_def.template | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
> index 1fa3ca3ceae9..2ef495540e5f 100755
> --- a/BaseTools/Conf/tools_def.template
> +++ b/BaseTools/Conf/tools_def.template
> @@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS    = DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl,
>  *_GCC5_X64_OBJCOPY_FLAGS         =
>  *_GCC5_X64_NASM_FLAGS            = -f elf64
>
> -  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os
> +  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os
>    DEBUG_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
>
> -RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
> +RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
>  RELEASE_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
>
>    NOOPT_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -O0
> --
> 2.13.1.3.g8be5a757fa67
>


Thanks for the interesting read. One question: is all code potentially
miscompiled? Or does it only affect code executing in real mode?

next prev parent reply	other threads:[~2017-08-11 10:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-11  0:34 [PATCH 0/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO Laszlo Ersek
2017-08-11  0:34 ` [PATCH 1/1] " Laszlo Ersek
2017-08-11  5:28   ` Shi, Steven
2017-08-11 11:18     ` Laszlo Ersek
2017-08-12  3:05       ` Shi, Steven
2017-08-15 15:45         ` Laszlo Ersek
2017-08-22  8:00           ` Shi, Steven
2017-08-22 11:59             ` Laszlo Ersek
2017-08-22 12:23               ` Gao, Liming
2017-08-22 13:27               ` Paolo Bonzini
2017-08-22 14:03                 ` Ard Biesheuvel
2017-08-22 14:15                   ` Paolo Bonzini
2017-08-22 16:04                     ` Laszlo Ersek
2017-08-22 16:06                       ` Paolo Bonzini
2017-08-11 10:03   ` Ard Biesheuvel [this message]
2017-08-11 10:30     ` Laszlo Ersek
2017-08-11 22:21   ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKv+Gu_sM7oueLzK-PYJO5Uah7px5+q1H5CEiByXN4kAYtgwUw@mail.gmail.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox