[PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

From: Laszlo Ersek <lersek@redhat.com>
To: edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Jordan Justen <jordan.l.justen@intel.com>,
	Liming Gao <liming.gao@intel.com>,
	Michael Kinney <michael.d.kinney@intel.com>,
	Yonghong Zhu <yonghong.zhu@intel.com>
Subject: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
Date: Fri, 11 Aug 2017 02:34:26 +0200	[thread overview]
Message-ID: <20170811003426.2332-2-lersek@redhat.com> (raw)
In-Reply-To: <20170811003426.2332-1-lersek@redhat.com>

Several users have recently reported boot failures with OVMF. The symptoms
are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found
the commit (via bisection) that exposes the issue. In this patch I'll
attempt to analyze the symptoms and fix the root problem.

(1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that
    provides multiprocessing services. It is built into the CpuMpPei
    module (for the PEI phase) and the CpuDxe module (for the DXE and BDS
    phases). When either of these modules starts up during boot, it
    briefly boots up all of the CPUs, perfoms some counting /
    synchronization, and then all the APs (application processors) are
    quiesced until further work arrives.

    The APs boot in real mode, and need assembly language init code (a
    "reset vector") that gradually takes them into 64-bit mode, before
    they can call back into normal C code. The reset vector code is
    assembled at build time, but it is copied as data under 1MB (for real
    mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe
    launches.

    Part of the reset vector code is FPU initialization. The reset vector
    in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly
    language function InitializeFloatingPointUnits(), which is however
    provided by an external library. Normally, when the object files were
    linked together, this function call would result in an absolute
    reference (and matching relocation, performed at module loading), and
    the instance of the reset vector that was copied under 1MB during boot
    would refer to the same, unchanged, absolute address of
    InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe.

    This didn't work with X64 XCODE5 builds however. XCODE5 prefers
    RIP-relative addressing for X64 (i.e., the assembly language function
    call depended on the distance between the call site and the callee).
    When the call site was copied under 1MB but
    InitializeFloatingPointUnits() would not move, the call broke.

    This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987
    ("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues",
    2017-05-17). The solution was to add another function pointer to the
    MP_CPU_EXCHANGE_INFO communication area. (The function pointer would
    have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would
    assign the absolute address of InitializeFloatingPointUnits() to the
    new field, and the APs would retrieve it -- matching the pointer size
    correctly --, and call it. The C assignment can be seen in the
    following hunk (from commit 3b2928b46987):

> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 407c44c95e62..735e099b32f2 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -751,6 +751,8 @@ FillExchangeInfoData (
>
>    ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();
>
> +  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;
> +
>    //
>    // Get the BSP's data of GDT and IDT
>    //

    This moved the relocation of the InitializeFloatingPointUnits()
    reference (or, with XCODE5, the RIP-relative
    InitializeFloatingPointUnits() reference) from the assembly code
    (which is copied around) to the C code (which is not copied around).

(2) Unfortunately, the C-language assignment to
    "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is
    miscompiled under the following circumstances:

    - the edk2 build target is DEBUG (or RELEASE),
    - the edk2 toolchain selection is GCC5,
    - (from the above two it follows that -Os and LTO are enabled,)
    - MpInitLib is built for X64,
    - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).

    (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the
    PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks
    in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)

    Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show
    the correct assembly code generated by gcc for the assignment, and
    flip the build target to DEBUG for showing the broken assembly code.

(3) Correct code for the assignment:

>     5406:       48 8d 15 73 24 00 00    lea    0x2473(%rip),%rdx        # 7880 <InitializeFloatingPointUnits>
>     540d:       48 8b 45 f8             mov    -0x8(%rbp),%rax
>     5411:       48 89 90 84 00 00 00    mov    %rdx,0x84(%rax)

    Here the absolute address of InitializeFloatingPointUnits() is loaded
    into RDX with RIP-relative addressing. Then RAX is pointed to the
    ExchangeInfo structure, and finally RDX is stored into
    ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is
    0x84 in the structure).

    Here the only relocation happens when CpuMpPei is linked. The distance
    between the call site and the callee is calculated and encoded in the
    LEA instruction. The same distance is valid when CpuMpPei runs, so the
    PEI core doesn't have to relocate the LEA instruction when it loads
    CpuMpPei.

(4) Incorrect code for the assignment:

>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx
>     ...
>     2d7b:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>     ...
> 0000000000005860 <InitializeFloatingPointUnits>:
>     5860:       9b db e3                finit

    This is not RIP-relative addressing. Therefore gcc generates a
    relocation record for the address encoded in the MOV instruction, at
    offset 0x2d69+0x3 -- initial value being 0x5860:

> Sections:
> Idx Name          Size      VMA               LMA               File off  Algn
>   0 .text         00006e66  0000000000000240  0000000000000240  000000c0  2**6
>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> ...
> RELOCATION RECORDS FOR [.text]:
> OFFSET           TYPE              VALUE
> ...
> 0000000000002b2c R_X86_64_32S      InitializeFloatingPointUnits

    (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which
    equals the .text section base plus the relocation offset, i.e.,
    0x240+0x2b2c.)

    This is the only R_X86_64_32S relocation record in the linked-together
    CpuMpPei.debug file (still an ELF file). In the rest of the build
    process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image
    (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF
    relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).

    The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon first
    when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi
    could theoretically execute from flash, and PEIFV has a nonzero base
    address. (However, CpuMpPei has a DEPEX on permanent PEI RAM
    discovery, so the PEI core will only launch it from permanent PEI RAM,
    in practice.)

    The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for
    the second time when the PEI core loads CpuMpPei into permanent PEI
    RAM, and performs the relocations. I confirmed (with debug messages)
    that the relocated address (within the instruction) is correct:

> PeiCore:PeCoffLoaderRelocateImage: Fixup32=BFF25D6C FixupData=0
> PeiCore:PeCoffLoaderRelocateImage: *Fixup32=0x8487A0 Adjust=0xBF6E00C0
> Loading PEIM at 0x000BFF23000 EntryPoint=0x000BFF271BB CpuMpPei.efi

    The "Fixup32" pointer points to the address constant that should be
    relocated (0xBFF25D6C - 0xBFF23000 == 0x2D6C). The initial value of
    (*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that
    I described above (as "first relocation").

    After incrementing (*Fixup32) by "Adjust", the final value is
    0xBFF28860. This is the absolute address that is patched into the MOV
    instruction that sets RDX for the assignment. This value is correct;
    the InitializeFloatingPointUnits() function really exists in CpuMpPei
    at this address, in permanent PEI RAM. It is at an offset of 0x5860
    from the load address 0x000BFF23000 (see third line).

    So where do things go wrong? Again, the MOV instruction itself is
    wrong. I modified the MpInitLib C code to print the value of the field
    right after the assignment, and I got:

> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
> CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @ 0xFFFFFFFFBFF28860

    The APs are ejected into outer space when they try to init their FPUs
    -- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should
    store 0x00000000_BFF28860!

(5) So open the Intel SDM, and decode the instruction by hand:

>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx

    0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W).
    Looking up REX.W + 0xc7 (the next byte) under the MOV specification,
    we get:
    - opcode: REX.W + C7 /0
    - instruction: MOV r/m64, imm32
    - Description: Move imm32 sign extended to 64-bits to r/m64.

    "Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860
    -- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the
    sign bit is set. And MOV sign-extends 0xBFF28860 to
    0xFFFFFFFF_BFF28860 in the assignment to RDX.

    So... why on earth generates gcc this MOV instruction, with sign
    extension?

(6) It does because we told it to. From the gcc manual:

> -mcmodel=small
>     Generate code for the small code model: the program and its symbols
>     must be linked in the lower 2 GB of the address space.  Pointers are
>     64 bits.  Programs can be statically or dynamically linked.  This is
>     the default code model.

    and please refer to the following commit (message slightly rewrapped
    here):

> commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0
> Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Date:   Sat Jul 16 00:16:11 2016 +0200
>
>     BaseTools/tools_def: switch GCC/X64 to the PIE small model
>
>     The ordinary small code model for x86_64 cannot be used in UEFI,
>     since it assumes the executable is loaded in the first 2 GB of
>     memory. Therefore, we use the large model instead, which can execute
>     anywhere, but uses absolute 64-bit wide quantities for all symbol
>     references, which is costly in terms of code size.
>
>     So switch to the PIE small code model, this uses 32-bit relative
>     references where possible, but does not make any assumptions about
>     the load address (i.e., all absolute symbol references are 64-bits
>     wide). Note that, due to the 'protected' visibility pragma
>     introduced in an earlier patch, there is no need for the EDK2 build
>     system to deal with GOT related ELF relocation types.
>
>     Contributed-under: TianoCore Contribution Agreement 1.0
>     Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>     Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
>     Tested-by: Laszlo Ersek <lersek@redhat.com>
>     Tested-By: Liming Gao <liming.gao@intel.com>
>     Reviewed-by: Liming Gao <liming.gao@intel.com>

    Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make
    assumptions about the load address, despite the fact that we didn't
    just replace "-mcmodel=large" with "-mcmodel=small", but we also added
    "-fpie".

(7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do
    not revert it whole-sale. Instead, undo it only for the LTO build env
    where it might cause problems, by appending "-fno-pie -mcmodel=large".

    The assignment is then compiled like this:

>     33d7:       48 ba 60 64 00 00 00    movabs $0x6460,%rdx
>     33de:       00 00 00
>     ...
>     33e5:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>     ...
> 0000000000006460 <InitializeFloatingPointUnits>:

    And the matching relocation is (note the target offset
    0x240+0x3199=0x33d9):

> Sections:
> Idx Name          Size      VMA               LMA               File off  Algn
>   0 .text         00007c5e  0000000000000240  0000000000000240  000000c0  2**6
>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> ...
> RELOCATION RECORDS FOR [.text]:
> OFFSET           TYPE              VALUE
> ...
> 0000000000003199 R_X86_64_64       InitializeFloatingPointUnits

(8) Size changes by this patch:

    $ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \
        -t GCC5 -b DEBUG \
        -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE

      Before:

> SECFV [10%Full] 212992 total, 21616 used, 191376 free
> FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free
> DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free
> PEIFV [21%Full] 917504 total, 194480 used, 723024 free

      After:

> SECFV [11%Full] 212992 total, 23728 used, 189264 free
> FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free
> DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free
> PEIFV [24%Full] 917504 total, 224304 used, 693200 free

    $ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \
        -t GCC5 -b DEBUG \
        -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \
        -D SECURE_BOOT_ENABLE -D SMM_REQUIRE

      Before:

> SECFV [10%Full] 212992 total, 21536 used, 191456 free
> FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free
> DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free
> PEIFV [22%Full] 917504 total, 202032 used, 715472 free

      After:

> SECFV [10%Full] 212992 total, 21536 used, 191456 free
> FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free
> DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free
> PEIFV [22%Full] 917504 total, 202032 used, 715472 free

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: Yonghong Zhu <yonghong.zhu@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 BaseTools/Conf/tools_def.template | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
index 1fa3ca3ceae9..2ef495540e5f 100755
--- a/BaseTools/Conf/tools_def.template
+++ b/BaseTools/Conf/tools_def.template
@@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS    = DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl,
 *_GCC5_X64_OBJCOPY_FLAGS         =
 *_GCC5_X64_NASM_FLAGS            = -f elf64
 
-  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os
+  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os
   DEBUG_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
 
-RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
+RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
 RELEASE_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
 
   NOOPT_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -O0
-- 
2.13.1.3.g8be5a757fa67

next prev parent reply	other threads:[~2017-08-11  0:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-11  0:34 [PATCH 0/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO Laszlo Ersek
2017-08-11  0:34 ` Laszlo Ersek [this message]
2017-08-11  5:28   ` [PATCH 1/1] " Shi, Steven
2017-08-11 11:18     ` Laszlo Ersek
2017-08-12  3:05       ` Shi, Steven
2017-08-15 15:45         ` Laszlo Ersek
2017-08-22  8:00           ` Shi, Steven
2017-08-22 11:59             ` Laszlo Ersek
2017-08-22 12:23               ` Gao, Liming
2017-08-22 13:27               ` Paolo Bonzini
2017-08-22 14:03                 ` Ard Biesheuvel
2017-08-22 14:15                   ` Paolo Bonzini
2017-08-22 16:04                     ` Laszlo Ersek
2017-08-22 16:06                       ` Paolo Bonzini
2017-08-11 10:03   ` Ard Biesheuvel
2017-08-11 10:30     ` Laszlo Ersek
2017-08-11 22:21   ` Alex Williamson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:1fa3ca3ceae dfblob:2ef495540e5 )
 OR (
bs:"[PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170811003426.2332-2-lersek@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox