[PATCH 0/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

* [PATCH 0/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
@ 2017-08-11  0:34 Laszlo Ersek
  2017-08-11  0:34 ` [PATCH 1/1] " Laszlo Ersek
  0 siblings, 1 reply; 17+ messages in thread
From: Laszlo Ersek @ 2017-08-11  0:34 UTC (permalink / raw)
  To: edk2-devel-01
  Cc: Alex Williamson, Ard Biesheuvel, Jordan Justen, Liming Gao,
	Michael Kinney, Yonghong Zhu

Patch #1 explains it all (grab a coffee first). For testing, it's likely
most robust to run

  git clean -fdx
  exit

in your terminal, before fetching & checking out my branch, and
rebuilding OVMF.

Repo:   https://github.com/lersek/edk2.git
Branch: gcc7_lto

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: Yonghong Zhu <yonghong.zhu@intel.com>

Thanks
Laszlo

Laszlo Ersek (1):
  BaseTools/tools_def.template: revert to large code model for
    X64/GCC5/LTO

 BaseTools/Conf/tools_def.template | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.13.1.3.g8be5a757fa67



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-11  0:34 [PATCH 0/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO Laszlo Ersek
@ 2017-08-11  0:34 ` Laszlo Ersek
  2017-08-11  5:28   ` Shi, Steven
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Laszlo Ersek @ 2017-08-11  0:34 UTC (permalink / raw)
  To: edk2-devel-01
  Cc: Alex Williamson, Ard Biesheuvel, Jordan Justen, Liming Gao,
	Michael Kinney, Yonghong Zhu

Several users have recently reported boot failures with OVMF. The symptoms
are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found
the commit (via bisection) that exposes the issue. In this patch I'll
attempt to analyze the symptoms and fix the root problem.

(1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that
    provides multiprocessing services. It is built into the CpuMpPei
    module (for the PEI phase) and the CpuDxe module (for the DXE and BDS
    phases). When either of these modules starts up during boot, it
    briefly boots up all of the CPUs, perfoms some counting /
    synchronization, and then all the APs (application processors) are
    quiesced until further work arrives.

    The APs boot in real mode, and need assembly language init code (a
    "reset vector") that gradually takes them into 64-bit mode, before
    they can call back into normal C code. The reset vector code is
    assembled at build time, but it is copied as data under 1MB (for real
    mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe
    launches.

    Part of the reset vector code is FPU initialization. The reset vector
    in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly
    language function InitializeFloatingPointUnits(), which is however
    provided by an external library. Normally, when the object files were
    linked together, this function call would result in an absolute
    reference (and matching relocation, performed at module loading), and
    the instance of the reset vector that was copied under 1MB during boot
    would refer to the same, unchanged, absolute address of
    InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe.

    This didn't work with X64 XCODE5 builds however. XCODE5 prefers
    RIP-relative addressing for X64 (i.e., the assembly language function
    call depended on the distance between the call site and the callee).
    When the call site was copied under 1MB but
    InitializeFloatingPointUnits() would not move, the call broke.

    This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987
    ("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues",
    2017-05-17). The solution was to add another function pointer to the
    MP_CPU_EXCHANGE_INFO communication area. (The function pointer would
    have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would
    assign the absolute address of InitializeFloatingPointUnits() to the
    new field, and the APs would retrieve it -- matching the pointer size
    correctly --, and call it. The C assignment can be seen in the
    following hunk (from commit 3b2928b46987):

> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 407c44c95e62..735e099b32f2 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -751,6 +751,8 @@ FillExchangeInfoData (
>
>    ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();
>
> +  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;
> +
>    //
>    // Get the BSP's data of GDT and IDT
>    //

    This moved the relocation of the InitializeFloatingPointUnits()
    reference (or, with XCODE5, the RIP-relative
    InitializeFloatingPointUnits() reference) from the assembly code
    (which is copied around) to the C code (which is not copied around).

(2) Unfortunately, the C-language assignment to
    "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is
    miscompiled under the following circumstances:

    - the edk2 build target is DEBUG (or RELEASE),
    - the edk2 toolchain selection is GCC5,
    - (from the above two it follows that -Os and LTO are enabled,)
    - MpInitLib is built for X64,
    - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).

    (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the
    PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks
    in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)

    Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show
    the correct assembly code generated by gcc for the assignment, and
    flip the build target to DEBUG for showing the broken assembly code.

(3) Correct code for the assignment:

>     5406:       48 8d 15 73 24 00 00    lea    0x2473(%rip),%rdx        # 7880 <InitializeFloatingPointUnits>
>     540d:       48 8b 45 f8             mov    -0x8(%rbp),%rax
>     5411:       48 89 90 84 00 00 00    mov    %rdx,0x84(%rax)

    Here the absolute address of InitializeFloatingPointUnits() is loaded
    into RDX with RIP-relative addressing. Then RAX is pointed to the
    ExchangeInfo structure, and finally RDX is stored into
    ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is
    0x84 in the structure).

    Here the only relocation happens when CpuMpPei is linked. The distance
    between the call site and the callee is calculated and encoded in the
    LEA instruction. The same distance is valid when CpuMpPei runs, so the
    PEI core doesn't have to relocate the LEA instruction when it loads
    CpuMpPei.

(4) Incorrect code for the assignment:

>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx
>     ...
>     2d7b:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>     ...
> 0000000000005860 <InitializeFloatingPointUnits>:
>     5860:       9b db e3                finit

    This is not RIP-relative addressing. Therefore gcc generates a
    relocation record for the address encoded in the MOV instruction, at
    offset 0x2d69+0x3 -- initial value being 0x5860:

> Sections:
> Idx Name          Size      VMA               LMA               File off  Algn
>   0 .text         00006e66  0000000000000240  0000000000000240  000000c0  2**6
>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> ...
> RELOCATION RECORDS FOR [.text]:
> OFFSET           TYPE              VALUE
> ...
> 0000000000002b2c R_X86_64_32S      InitializeFloatingPointUnits

    (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which
    equals the .text section base plus the relocation offset, i.e.,
    0x240+0x2b2c.)

    This is the only R_X86_64_32S relocation record in the linked-together
    CpuMpPei.debug file (still an ELF file). In the rest of the build
    process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image
    (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF
    relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).

    The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon first
    when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi
    could theoretically execute from flash, and PEIFV has a nonzero base
    address. (However, CpuMpPei has a DEPEX on permanent PEI RAM
    discovery, so the PEI core will only launch it from permanent PEI RAM,
    in practice.)

    The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for
    the second time when the PEI core loads CpuMpPei into permanent PEI
    RAM, and performs the relocations. I confirmed (with debug messages)
    that the relocated address (within the instruction) is correct:

> PeiCore:PeCoffLoaderRelocateImage: Fixup32=BFF25D6C FixupData=0
> PeiCore:PeCoffLoaderRelocateImage: *Fixup32=0x8487A0 Adjust=0xBF6E00C0
> Loading PEIM at 0x000BFF23000 EntryPoint=0x000BFF271BB CpuMpPei.efi

    The "Fixup32" pointer points to the address constant that should be
    relocated (0xBFF25D6C - 0xBFF23000 == 0x2D6C). The initial value of
    (*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that
    I described above (as "first relocation").

    After incrementing (*Fixup32) by "Adjust", the final value is
    0xBFF28860. This is the absolute address that is patched into the MOV
    instruction that sets RDX for the assignment. This value is correct;
    the InitializeFloatingPointUnits() function really exists in CpuMpPei
    at this address, in permanent PEI RAM. It is at an offset of 0x5860
    from the load address 0x000BFF23000 (see third line).

    So where do things go wrong? Again, the MOV instruction itself is
    wrong. I modified the MpInitLib C code to print the value of the field
    right after the assignment, and I got:

> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
> CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @ 0xFFFFFFFFBFF28860

    The APs are ejected into outer space when they try to init their FPUs
    -- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should
    store 0x00000000_BFF28860!

(5) So open the Intel SDM, and decode the instruction by hand:

>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx

    0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W).
    Looking up REX.W + 0xc7 (the next byte) under the MOV specification,
    we get:
    - opcode: REX.W + C7 /0
    - instruction: MOV r/m64, imm32
    - Description: Move imm32 sign extended to 64-bits to r/m64.

    "Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860
    -- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the
    sign bit is set. And MOV sign-extends 0xBFF28860 to
    0xFFFFFFFF_BFF28860 in the assignment to RDX.

    So... why on earth generates gcc this MOV instruction, with sign
    extension?

(6) It does because we told it to. From the gcc manual:

> -mcmodel=small
>     Generate code for the small code model: the program and its symbols
>     must be linked in the lower 2 GB of the address space.  Pointers are
>     64 bits.  Programs can be statically or dynamically linked.  This is
>     the default code model.

    and please refer to the following commit (message slightly rewrapped
    here):

> commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0
> Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Date:   Sat Jul 16 00:16:11 2016 +0200
>
>     BaseTools/tools_def: switch GCC/X64 to the PIE small model
>
>     The ordinary small code model for x86_64 cannot be used in UEFI,
>     since it assumes the executable is loaded in the first 2 GB of
>     memory. Therefore, we use the large model instead, which can execute
>     anywhere, but uses absolute 64-bit wide quantities for all symbol
>     references, which is costly in terms of code size.
>
>     So switch to the PIE small code model, this uses 32-bit relative
>     references where possible, but does not make any assumptions about
>     the load address (i.e., all absolute symbol references are 64-bits
>     wide). Note that, due to the 'protected' visibility pragma
>     introduced in an earlier patch, there is no need for the EDK2 build
>     system to deal with GOT related ELF relocation types.
>
>     Contributed-under: TianoCore Contribution Agreement 1.0
>     Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>     Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
>     Tested-by: Laszlo Ersek <lersek@redhat.com>
>     Tested-By: Liming Gao <liming.gao@intel.com>
>     Reviewed-by: Liming Gao <liming.gao@intel.com>

    Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make
    assumptions about the load address, despite the fact that we didn't
    just replace "-mcmodel=large" with "-mcmodel=small", but we also added
    "-fpie".

(7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do
    not revert it whole-sale. Instead, undo it only for the LTO build env
    where it might cause problems, by appending "-fno-pie -mcmodel=large".

    The assignment is then compiled like this:

>     33d7:       48 ba 60 64 00 00 00    movabs $0x6460,%rdx
>     33de:       00 00 00
>     ...
>     33e5:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>     ...
> 0000000000006460 <InitializeFloatingPointUnits>:

    And the matching relocation is (note the target offset
    0x240+0x3199=0x33d9):

> Sections:
> Idx Name          Size      VMA               LMA               File off  Algn
>   0 .text         00007c5e  0000000000000240  0000000000000240  000000c0  2**6
>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> ...
> RELOCATION RECORDS FOR [.text]:
> OFFSET           TYPE              VALUE
> ...
> 0000000000003199 R_X86_64_64       InitializeFloatingPointUnits

(8) Size changes by this patch:

    $ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \
        -t GCC5 -b DEBUG \
        -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE

      Before:

> SECFV [10%Full] 212992 total, 21616 used, 191376 free
> FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free
> DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free
> PEIFV [21%Full] 917504 total, 194480 used, 723024 free

      After:

> SECFV [11%Full] 212992 total, 23728 used, 189264 free
> FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free
> DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free
> PEIFV [24%Full] 917504 total, 224304 used, 693200 free

    $ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \
        -t GCC5 -b DEBUG \
        -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \
        -D SECURE_BOOT_ENABLE -D SMM_REQUIRE

      Before:

> SECFV [10%Full] 212992 total, 21536 used, 191456 free
> FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free
> DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free
> PEIFV [22%Full] 917504 total, 202032 used, 715472 free

      After:

> SECFV [10%Full] 212992 total, 21536 used, 191456 free
> FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free
> DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free
> PEIFV [22%Full] 917504 total, 202032 used, 715472 free

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: Yonghong Zhu <yonghong.zhu@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 BaseTools/Conf/tools_def.template | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
index 1fa3ca3ceae9..2ef495540e5f 100755
--- a/BaseTools/Conf/tools_def.template
+++ b/BaseTools/Conf/tools_def.template
@@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS    = DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl,
 *_GCC5_X64_OBJCOPY_FLAGS         =
 *_GCC5_X64_NASM_FLAGS            = -f elf64
 
-  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os
+  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os
   DEBUG_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
 
-RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
+RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
 RELEASE_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
 
   NOOPT_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -O0
-- 
2.13.1.3.g8be5a757fa67



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-11  0:34 ` [PATCH 1/1] " Laszlo Ersek
@ 2017-08-11  5:28   ` Shi, Steven
  2017-08-11 11:18     ` Laszlo Ersek
  2017-08-11 10:03   ` Ard Biesheuvel
  2017-08-11 22:21   ` Alex Williamson
  2 siblings, 1 reply; 17+ messages in thread
From: Shi, Steven @ 2017-08-11  5:28 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel-01
  Cc: Ard Biesheuvel, Justen, Jordan L, Gao, Liming, Kinney, Michael D

Hi Laszlo,

I'm trying to reproduce your boot failure with OVMF in my Ubuntu system, but not succeed.  My GCC was built from GCC main trunk code in 20170601, and my ld linker is version 2.28. Could you try the ld 2.28 with your gcc-7.1 and check whether it works in your side?  Or, do you know where can I download the gcc-7.1 pre-built binaries?





jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ gcc --version

gcc (GCC) 8.0.0 20170601 (experimental)



jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ ld --version

GNU ld (GNU Binutils) 2.28



Below are my trying steps, and the Qemu can boot well into shell:

jshi19@jshi19-desktop:~/wksp_efi$ git clone https://github.com/lersek/edk2.git

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ git checkout gcc7_lto

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ git checkout 7ef0dae092afcfb6fab7e8372c78097672168c4a

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ rm Build -r

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ rm Conf/tools_def.txt

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ rm Conf/target.txt

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ rm Conf/build_rule.txt

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ source edksetup.sh

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ make -C BaseTools/Source/C/ clean

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ make -C BaseTools/Source/C/
jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ build -p OvmfPkg/OvmfPkgX64.dsc -t GCC5 -DSECURE_BOOT_ENABLE=TRUE -a X64 -b DEBUG  -DDEBUG_ON_SERIAL_PORT -n 5

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ build -p OvmfPkg/OvmfPkgIa32X64.dsc -t GCC5 -DSECURE_BOOT_ENABLE=TRUE -a IA32 -a X64 -b DEBUG  -DDEBUG_ON_SERIAL_PORT -n 5

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2/Build/OvmfX64/DEBUG_GCC5/FV$ qemu-system-x86_64  -bios OVMF.fd -serial file:serial.log -m 512 -hda fat:. -monitor stdio --enable-kvm

jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2/Build/Ovmf3264/DEBUG_GCC5/FV$ qemu-system-x86_64  -bios OVMF.fd -serial file:serial.log -m 512 -hda fat:. -monitor stdio --enable-kvm







Steven Shi

Intel\SSG\STO\UEFI Firmware



Tel: +86 021-61166522

iNet: 821-6522





> -----Original Message-----

> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of

> Laszlo Ersek

> Sent: Friday, August 11, 2017 8:34 AM

> To: edk2-devel-01 <edk2-devel@lists.01.org>

> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>; Justen, Jordan L

> <jordan.l.justen@intel.com>; Gao, Liming <liming.gao@intel.com>; Kinney,

> Michael D <michael.d.kinney@intel.com>

> Subject: [edk2] [PATCH 1/1] BaseTools/tools_def.template: revert to large

> code model for X64/GCC5/LTO

>

> Several users have recently reported boot failures with OVMF. The symptoms

> are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found

> the commit (via bisection) that exposes the issue. In this patch I'll

> attempt to analyze the symptoms and fix the root problem.

>

> (1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that

>     provides multiprocessing services. It is built into the CpuMpPei

>     module (for the PEI phase) and the CpuDxe module (for the DXE and BDS

>     phases). When either of these modules starts up during boot, it

>     briefly boots up all of the CPUs, perfoms some counting /

>     synchronization, and then all the APs (application processors) are

>     quiesced until further work arrives.

>

>     The APs boot in real mode, and need assembly language init code (a

>     "reset vector") that gradually takes them into 64-bit mode, before

>     they can call back into normal C code. The reset vector code is

>     assembled at build time, but it is copied as data under 1MB (for real

>     mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe

>     launches.

>

>     Part of the reset vector code is FPU initialization. The reset vector

>     in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly

>     language function InitializeFloatingPointUnits(), which is however

>     provided by an external library. Normally, when the object files were

>     linked together, this function call would result in an absolute

>     reference (and matching relocation, performed at module loading), and

>     the instance of the reset vector that was copied under 1MB during boot

>     would refer to the same, unchanged, absolute address of

>     InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe.

>

>     This didn't work with X64 XCODE5 builds however. XCODE5 prefers

>     RIP-relative addressing for X64 (i.e., the assembly language function

>     call depended on the distance between the call site and the callee).

>     When the call site was copied under 1MB but

>     InitializeFloatingPointUnits() would not move, the call broke.

>

>     This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987

>     ("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues",

>     2017-05-17). The solution was to add another function pointer to the

>     MP_CPU_EXCHANGE_INFO communication area. (The function pointer

> would

>     have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would

>     assign the absolute address of InitializeFloatingPointUnits() to the

>     new field, and the APs would retrieve it -- matching the pointer size

>     correctly --, and call it. The C assignment can be seen in the

>     following hunk (from commit 3b2928b46987):

>

> > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c

> b/UefiCpuPkg/Library/MpInitLib/MpLib.c

> > index 407c44c95e62..735e099b32f2 100644

> > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c

> > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c

> > @@ -751,6 +751,8 @@ FillExchangeInfoData (

> >

> >    ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();

> >

> > +  ExchangeInfo->InitializeFloatingPointUnitsAddress =

> (UINTN)InitializeFloatingPointUnits;

> > +

> >    //

> >    // Get the BSP's data of GDT and IDT

> >    //

>

>     This moved the relocation of the InitializeFloatingPointUnits()

>     reference (or, with XCODE5, the RIP-relative

>     InitializeFloatingPointUnits() reference) from the assembly code

>     (which is copied around) to the C code (which is not copied around).

>

> (2) Unfortunately, the C-language assignment to

>     "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is

>     miscompiled under the following circumstances:

>

>     - the edk2 build target is DEBUG (or RELEASE),

>     - the edk2 toolchain selection is GCC5,

>     - (from the above two it follows that -Os and LTO are enabled,)

>     - MpInitLib is built for X64,

>     - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).

>

>     (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the

>     PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks

>     in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)

>

>     Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show

>     the correct assembly code generated by gcc for the assignment, and

>     flip the build target to DEBUG for showing the broken assembly code.

>

> (3) Correct code for the assignment:

>

> >     5406:       48 8d 15 73 24 00 00    lea    0x2473(%rip),%rdx        # 7880

> <InitializeFloatingPointUnits>

> >     540d:       48 8b 45 f8             mov    -0x8(%rbp),%rax

> >     5411:       48 89 90 84 00 00 00    mov    %rdx,0x84(%rax)

>

>     Here the absolute address of InitializeFloatingPointUnits() is loaded

>     into RDX with RIP-relative addressing. Then RAX is pointed to the

>     ExchangeInfo structure, and finally RDX is stored into

>     ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is

>     0x84 in the structure).

>

>     Here the only relocation happens when CpuMpPei is linked. The distance

>     between the call site and the callee is calculated and encoded in the

>     LEA instruction. The same distance is valid when CpuMpPei runs, so the

>     PEI core doesn't have to relocate the LEA instruction when it loads

>     CpuMpPei.

>

> (4) Incorrect code for the assignment:

>

> >     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx

> >     ...

> >     2d7b:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)

> >     ...

> > 0000000000005860 <InitializeFloatingPointUnits>:

> >     5860:       9b db e3                finit

>

>     This is not RIP-relative addressing. Therefore gcc generates a

>     relocation record for the address encoded in the MOV instruction, at

>     offset 0x2d69+0x3 -- initial value being 0x5860:

>

> > Sections:

> > Idx Name          Size      VMA               LMA               File off  Algn

> >   0 .text         00006e66  0000000000000240  0000000000000240  000000c0

> 2**6

> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE

> > ...

> > RELOCATION RECORDS FOR [.text]:

> > OFFSET           TYPE              VALUE

> > ...

> > 0000000000002b2c R_X86_64_32S      InitializeFloatingPointUnits

>

>     (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which

>     equals the .text section base plus the relocation offset, i.e.,

>     0x240+0x2b2c.)

>

>     This is the only R_X86_64_32S relocation record in the linked-together

>     CpuMpPei.debug file (still an ELF file). In the rest of the build

>     process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image

>     (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF

>     relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).

>

>     The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon

> first

>     when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi

>     could theoretically execute from flash, and PEIFV has a nonzero base

>     address. (However, CpuMpPei has a DEPEX on permanent PEI RAM

>     discovery, so the PEI core will only launch it from permanent PEI RAM,

>     in practice.)

>

>     The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for

>     the second time when the PEI core loads CpuMpPei into permanent PEI

>     RAM, and performs the relocations. I confirmed (with debug messages)

>     that the relocated address (within the instruction) is correct:

>

> > PeiCore:PeCoffLoaderRelocateImage: Fixup32=BFF25D6C FixupData=0

> > PeiCore:PeCoffLoaderRelocateImage: *Fixup32=0x8487A0

> Adjust=0xBF6E00C0

> > Loading PEIM at 0x000BFF23000 EntryPoint=0x000BFF271BB CpuMpPei.efi

>

>     The "Fixup32" pointer points to the address constant that should be

>     relocated (0xBFF25D6C - 0xBFF23000 == 0x2D6C). The initial value of

>     (*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that

>     I described above (as "first relocation").

>

>     After incrementing (*Fixup32) by "Adjust", the final value is

>     0xBFF28860. This is the absolute address that is patched into the MOV

>     instruction that sets RDX for the assignment. This value is correct;

>     the InitializeFloatingPointUnits() function really exists in CpuMpPei

>     at this address, in permanent PEI RAM. It is at an offset of 0x5860

>     from the load address 0x000BFF23000 (see third line).

>

>     So where do things go wrong? Again, the MOV instruction itself is

>     wrong. I modified the MpInitLib C code to print the value of the field

>     right after the assignment, and I got:

>

> > AP Loop Mode is 1

> > WakeupBufferStart = 9F000, WakeupBufferSize = 1000

> > CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @

> 0xFFFFFFFFBFF28860

>

>     The APs are ejected into outer space when they try to init their FPUs

>     -- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should

>     store 0x00000000_BFF28860!

>

> (5) So open the Intel SDM, and decode the instruction by hand:

>

> >     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx

>

>     0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W).

>     Looking up REX.W + 0xc7 (the next byte) under the MOV specification,

>     we get:

>     - opcode: REX.W + C7 /0

>     - instruction: MOV r/m64, imm32

>     - Description: Move imm32 sign extended to 64-bits to r/m64.

>

>     "Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860

>     -- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the

>     sign bit is set. And MOV sign-extends 0xBFF28860 to

>     0xFFFFFFFF_BFF28860 in the assignment to RDX.

>

>     So... why on earth generates gcc this MOV instruction, with sign

>     extension?

>

> (6) It does because we told it to. From the gcc manual:

>

> > -mcmodel=small

> >     Generate code for the small code model: the program and its symbols

> >     must be linked in the lower 2 GB of the address space.  Pointers are

> >     64 bits.  Programs can be statically or dynamically linked.  This is

> >     the default code model.

>

>     and please refer to the following commit (message slightly rewrapped

>     here):

>

> > commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0

> > Author: Ard Biesheuvel <ard.biesheuvel@linaro.org<mailto:ard.biesheuvel@linaro.org>>

> > Date:   Sat Jul 16 00:16:11 2016 +0200

> >

> >     BaseTools/tools_def: switch GCC/X64 to the PIE small model

> >

> >     The ordinary small code model for x86_64 cannot be used in UEFI,

> >     since it assumes the executable is loaded in the first 2 GB of

> >     memory. Therefore, we use the large model instead, which can execute

> >     anywhere, but uses absolute 64-bit wide quantities for all symbol

> >     references, which is costly in terms of code size.

> >

> >     So switch to the PIE small code model, this uses 32-bit relative

> >     references where possible, but does not make any assumptions about

> >     the load address (i.e., all absolute symbol references are 64-bits

> >     wide). Note that, due to the 'protected' visibility pragma

> >     introduced in an earlier patch, there is no need for the EDK2 build

> >     system to deal with GOT related ELF relocation types.

> >

> >     Contributed-under: TianoCore Contribution Agreement 1.0

> >     Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org<mailto:ard.biesheuvel@linaro.org>>

> >     Reviewed-by: Jordan Justen <jordan.l.justen@intel.com<mailto:jordan.l.justen@intel.com>>

> >     Tested-by: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>

> >     Tested-By: Liming Gao <liming.gao@intel.com<mailto:liming.gao@intel.com>>

> >     Reviewed-by: Liming Gao <liming.gao@intel.com<mailto:liming.gao@intel.com>>

>

>     Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make

>     assumptions about the load address, despite the fact that we didn't

>     just replace "-mcmodel=large" with "-mcmodel=small", but we also added

>     "-fpie".

>

> (7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do

>     not revert it whole-sale. Instead, undo it only for the LTO build env

>     where it might cause problems, by appending "-fno-pie -mcmodel=large".

>

>     The assignment is then compiled like this:

>

> >     33d7:       48 ba 60 64 00 00 00    movabs $0x6460,%rdx

> >     33de:       00 00 00

> >     ...

> >     33e5:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)

> >     ...

> > 0000000000006460 <InitializeFloatingPointUnits>:

>

>     And the matching relocation is (note the target offset

>     0x240+0x3199=0x33d9):

>

> > Sections:

> > Idx Name          Size      VMA               LMA               File off  Algn

> >   0 .text         00007c5e  0000000000000240  0000000000000240  000000c0

> 2**6

> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE

> > ...

> > RELOCATION RECORDS FOR [.text]:

> > OFFSET           TYPE              VALUE

> > ...

> > 0000000000003199 R_X86_64_64       InitializeFloatingPointUnits

>

> (8) Size changes by this patch:

>

>     $ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \

>         -t GCC5 -b DEBUG \

>         -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE

>

>       Before:

>

> > SECFV [10%Full] 212992 total, 21616 used, 191376 free

> > FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free

> > DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free

> > PEIFV [21%Full] 917504 total, 194480 used, 723024 free

>

>       After:

>

> > SECFV [11%Full] 212992 total, 23728 used, 189264 free

> > FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free

> > DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free

> > PEIFV [24%Full] 917504 total, 224304 used, 693200 free

>

>     $ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \

>         -t GCC5 -b DEBUG \

>         -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \

>         -D SECURE_BOOT_ENABLE -D SMM_REQUIRE

>

>       Before:

>

> > SECFV [10%Full] 212992 total, 21536 used, 191456 free

> > FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free

> > DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free

> > PEIFV [22%Full] 917504 total, 202032 used, 715472 free

>

>       After:

>

> > SECFV [10%Full] 212992 total, 21536 used, 191456 free

> > FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free

> > DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free

> > PEIFV [22%Full] 917504 total, 202032 used, 715472 free

>

> Cc: Alex Williamson <alex.williamson@redhat.com<mailto:alex.williamson@redhat.com>>

> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org<mailto:ard.biesheuvel@linaro.org>>

> Cc: Jordan Justen <jordan.l.justen@intel.com<mailto:jordan.l.justen@intel.com>>

> Cc: Liming Gao <liming.gao@intel.com<mailto:liming.gao@intel.com>>

> Cc: Michael Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>

> Cc: Yonghong Zhu <yonghong.zhu@intel.com<mailto:yonghong.zhu@intel.com>>

> Contributed-under: TianoCore Contribution Agreement 1.1

> Signed-off-by: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>

> ---

>  BaseTools/Conf/tools_def.template | 4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

>

> diff --git a/BaseTools/Conf/tools_def.template

> b/BaseTools/Conf/tools_def.template

> index 1fa3ca3ceae9..2ef495540e5f 100755

> --- a/BaseTools/Conf/tools_def.template

> +++ b/BaseTools/Conf/tools_def.template

> @@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS    =

> DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl,

>  *_GCC5_X64_OBJCOPY_FLAGS         =

>  *_GCC5_X64_NASM_FLAGS            = -f elf64

>

> -  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -

> DUSING_LTO -Os

> +  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -

> mcmodel=large -flto -DUSING_LTO -Os

>    DEBUG_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os

>

> -RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -

> DUSING_LTO -Os -Wno-unused-but-set-variable

> +RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -

> mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable

>  RELEASE_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os

>

>    NOOPT_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -O0

> --

> 2.13.1.3.g8be5a757fa67

>

> _______________________________________________

> edk2-devel mailing list

> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>

> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-11  5:28   ` Shi, Steven
@ 2017-08-11 11:18     ` Laszlo Ersek
  2017-08-12  3:05       ` Shi, Steven
  0 siblings, 1 reply; 17+ messages in thread
From: Laszlo Ersek @ 2017-08-11 11:18 UTC (permalink / raw)
  To: Shi, Steven, edk2-devel-01
  Cc: Ard Biesheuvel, Justen, Jordan L, Gao, Liming, Kinney, Michael D

Hi Steven,

On 08/11/17 07:28, Shi, Steven wrote:
> Hi Laszlo,
> 
> I'm trying to reproduce your boot failure with OVMF in my Ubuntu
> system, but not succeed.  My GCC was built from GCC main trunk code
> in 20170601, and my ld linker is version 2.28. Could you try the ld
> 2.28 with your gcc-7.1 and check whether it works in your side?  Or,
> do you know where can I download the gcc-7.1 pre-built binaries?
This was reproduced on a stock Fedora 26 installation:

https://getfedora.org/

> jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ build -p OvmfPkg/OvmfPkgX64.dsc -t GCC5 -DSECURE_BOOT_ENABLE=TRUE -a X64 -b DEBUG  -DDEBUG_ON_SERIAL_PORT -n 5
> 
> jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ build -p OvmfPkg/OvmfPkgIa32X64.dsc -t GCC5 -DSECURE_BOOT_ENABLE=TRUE -a IA32 -a X64 -b DEBUG  -DDEBUG_ON_SERIAL_PORT -n 5

These build commands look good.

> jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2/Build/OvmfX64/DEBUG_GCC5/FV$ qemu-system-x86_64  -bios OVMF.fd -serial file:serial.log -m 512 -hda fat:. -monitor stdio --enable-kvm
> 
> jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2/Build/Ovmf3264/DEBUG_GCC5/FV$ qemu-system-x86_64  -bios OVMF.fd -serial file:serial.log -m 512 -hda fat:. -monitor stdio --enable-kvm

The QEMU command lines are missing two options:

* First, you have to make sure that the VM has enough memory under 4GB
so that the PEI RAM can grow above the 2GB limit. You are using the
i440fx board type, so that's OK (its 32-bit memory can go up to 3GB),
but the "-m" switch is too small. Instead, I recommend:

  -m 5120

This will give the VM 5GB of RAM, 3GB of which will be placed under the
4GB mark, and 2GB will be placed above.

* Second, there must be at least one AP (application processor) for
CpuMpPei to boot up. So please add something like:

  -smp 4

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-11 11:18     ` Laszlo Ersek
@ 2017-08-12  3:05       ` Shi, Steven
  2017-08-15 15:45         ` Laszlo Ersek
  0 siblings, 1 reply; 17+ messages in thread
From: Shi, Steven @ 2017-08-12  3:05 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel-01, Alex Williamson
  Cc: Ard Biesheuvel, Justen, Jordan L, Gao, Liming, Kinney, Michael D

OK. I can reproduce the failure with -smp 4 and -m 5120 in my side. 
It looks a linker bug about assemble function support in PIC/PIE code.  You know, if we only have C code, the compiler/linker will generate all the machine code and guarantee all the address reference are position independent under PIC/PIE build. But if we mix manually written assemble code in the C code,  the linker cannot really control the address reference in the assemble code, and  might got confused. So, it is not seldom we could see the compiler/linker generate wrong code for mixed code,  especially with very high level optimization, e.g. LTO.

Globally change memory model from small to large will bring not trivial impact (+15%) to code size, espcial for the uncomperssed option rom dirver. Below is some data of OvmfPkgX64.dsc platform.
		Dxecore.efi 	CpuDxe.efi	CpuMpPei	PeiCore.efi
Small+PIE: 	139520		47360		30144		46720
Large:		165696		55360		34496		53504

A simpler workaround could be to add a C function wrapper around the assemble lib function as below. This simple workaround works in my side.  But it is necessary to find this issue's root cause and fix it in the compiler/linker. I will try to raise this issue to compiler/linker guys.

diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
old mode 100644
new mode 100755
index a3eea29..7afe434
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -738,6 +738,15 @@ WaitApWakeup (
   }
 }

+VOID
+EFIAPI
+InitializeFloatingPointUnitsWrapper (
+  VOID
+  )
+{
+  InitializeFloatingPointUnits();
+}
+
 /**
   This function will fill the exchange info structure.

@@ -771,7 +780,7 @@ FillExchangeInfoData (

   ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();

-  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;
+  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnitsWrapper;

   //
   // Get the BSP's data of GDT and IDT


Steven Shi
Intel\SSG\STO\UEFI Firmware

Tel: +86 021-61166522
iNet: 821-6522

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Friday, August 11, 2017 7:18 PM
> To: Shi, Steven <steven.shi@intel.com>; edk2-devel-01 <edk2-
> devel@lists.01.org>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>; Justen, Jordan L
> <jordan.l.justen@intel.com>; Gao, Liming <liming.gao@intel.com>; Kinney,
> Michael D <michael.d.kinney@intel.com>
> Subject: Re: [edk2] [PATCH 1/1] BaseTools/tools_def.template: revert to large
> code model for X64/GCC5/LTO
> 
> Hi Steven,
> 
> On 08/11/17 07:28, Shi, Steven wrote:
> > Hi Laszlo,
> >
> > I'm trying to reproduce your boot failure with OVMF in my Ubuntu
> > system, but not succeed.  My GCC was built from GCC main trunk code
> > in 20170601, and my ld linker is version 2.28. Could you try the ld
> > 2.28 with your gcc-7.1 and check whether it works in your side?  Or,
> > do you know where can I download the gcc-7.1 pre-built binaries?
> This was reproduced on a stock Fedora 26 installation:
> 
> https://getfedora.org/
> 
> > jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ build -p
> OvmfPkg/OvmfPkgX64.dsc -t GCC5 -DSECURE_BOOT_ENABLE=TRUE -a X64 -b
> DEBUG  -DDEBUG_ON_SERIAL_PORT -n 5
> >
> > jshi19@jshi19-desktop:~/wksp_efi/Laszlo/edk2$ build -p
> OvmfPkg/OvmfPkgIa32X64.dsc -t GCC5 -DSECURE_BOOT_ENABLE=TRUE -a IA32
> -a X64 -b DEBUG  -DDEBUG_ON_SERIAL_PORT -n 5
> 
> These build commands look good.
> 
> > jshi19@jshi19-
> desktop:~/wksp_efi/Laszlo/edk2/Build/OvmfX64/DEBUG_GCC5/FV$ qemu-
> system-x86_64  -bios OVMF.fd -serial file:serial.log -m 512 -hda fat:. -monitor
> stdio --enable-kvm
> >
> > jshi19@jshi19-
> desktop:~/wksp_efi/Laszlo/edk2/Build/Ovmf3264/DEBUG_GCC5/FV$ qemu-
> system-x86_64  -bios OVMF.fd -serial file:serial.log -m 512 -hda fat:. -monitor
> stdio --enable-kvm
> 
> The QEMU command lines are missing two options:
> 
> * First, you have to make sure that the VM has enough memory under 4GB
> so that the PEI RAM can grow above the 2GB limit. You are using the
> i440fx board type, so that's OK (its 32-bit memory can go up to 3GB),
> but the "-m" switch is too small. Instead, I recommend:
> 
>   -m 5120
> 
> This will give the VM 5GB of RAM, 3GB of which will be placed under the
> 4GB mark, and 2GB will be placed above.
> 
> * Second, there must be at least one AP (application processor) for
> CpuMpPei to boot up. So please add something like:
> 
>   -smp 4
> 
> Thanks
> Laszlo


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-12  3:05       ` Shi, Steven
@ 2017-08-15 15:45         ` Laszlo Ersek
  2017-08-22  8:00           ` Shi, Steven
  0 siblings, 1 reply; 17+ messages in thread
From: Laszlo Ersek @ 2017-08-15 15:45 UTC (permalink / raw)
  To: Shi, Steven
  Cc: edk2-devel-01, Alex Williamson, Ard Biesheuvel, Justen, Jordan L,
	Gao, Liming, Kinney, Michael D

On 08/12/17 05:05, Shi, Steven wrote:
> OK. I can reproduce the failure with -smp 4 and -m 5120 in my side.
>
> It looks a linker bug about assemble function support in PIC/PIE
> code.  You know, if we only have C code, the compiler/linker will
> generate all the machine code and guarantee all the address reference
> are position independent under PIC/PIE build. But if we mix manually
> written assemble code in the C code,  the linker cannot really control
> the address reference in the assemble code, and  might got confused.

This is an incorrect description of the situation.

The address reference is *not* in assembly code. (It used to be in
assembly code, but Mike changed that earlier, for XCODE5 compatibility.)

At this moment the address reference is in *C code*. The C code takes
the address of an external function, and assigns it to a field in a data
structure. The assembly code calls the function through this field. The
assembly code makes no reference to the called function by name. See
Mike's commit 3b2928b46987:

-    mov        rax, ASM_PFX(InitializeFloatingPointUnits)
+    mov        rax, qword [esi + InitializeFloatingPointUnitsAddress]
     sub        rsp, 20h
     call       rax               ; Call assembly function to initialize FPU per UEFI spec

And, indeed, it is *not* the assembly code that's being miscompiled. It
is the C-language assignment below that is miscompiled:

+  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;

> So, it is not seldom we could see the compiler/linker generate wrong
> code for mixed code,  especially with very high level optimization,
> e.g. LTO.
>
> Globally change memory model from small to large will bring not
> trivial impact (+15%) to code size, espcial for the uncomperssed
> option rom dirver. Below is some data of OvmfPkgX64.dsc platform.
>
>               Dxecore.efi     CpuDxe.efi      CpuMpPei        PeiCore.efi
> Small+PIE:    139520          47360           30144           46720
> Large:        165696          55360           34496           53504

My argument is that the current "-mcmodel=small" option actually *lies*
to the compiler about our binaries. According to the GCC documentation,
"-mcmodel=small" implies that a binary built like this will never be
executed from above 2GB in the address space. This is why gcc-7 believes
it is allowed to generate a MOV instruction that sign-extends a 32-bit
address to 64-bit -- because we promise GCC that the sign bit will
always be clear to begin with.

IOW, we make a promise, gcc-7 generates code accordingly, and then we
break the promise, by executing the binary (the assignment in the C code
of MpInitLib) from above 2GB.

In particular, an X64 UEFI_DRIVER module, shipped as an option ROM on a
physical PCI(E) card, could be loaded anywhere at all in the 64-bit
address space (given sufficient memory in the computer). Building such a
driver with "-mcmodel=small" is wrong therefore; we cannot guarantee
that the driver will be executed from under 2GB.

Perhaps we should use "-mcmodel=large", but *keep* "-fpie". ... I've now
tried that, but it doesn't work. With "-mcmodel=large -fpie", the
compiler emits R_X86_64_GOTOFF64 relocations (type 25 decimal), and I
get errors like:

GenFw: ERROR 3000: Invalid
  Build/OvmfX64/DEBUG_GCC5/X64/MdeModulePkg/Universal/ReportStatusCodeRouter/Pei/ReportStatusCodeRouterPei/DEBUG/ReportStatusCodeRouterPei.dll
  unsupported ELF EM_X86_64 relocation 0x19.

I believe Ard's commit 28ade7b802e0 ("MdePkg: move to 'hidden'
visibility for all symbols under GCC/X64", 2016-08-01) was meant to
prevent this, but apparently it's not enough with gcc-7.1.

> A simpler workaround could be to add a C function wrapper around the
> assemble lib function as below. This simple workaround works in my
> side.  But it is necessary to find this issue's root cause and fix it
> in the compiler/linker. I will try to raise this issue to
> compiler/linker guys.
>
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> old mode 100644
> new mode 100755
> index a3eea29..7afe434
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -738,6 +738,15 @@ WaitApWakeup (
>    }
>  }
>
> +VOID
> +EFIAPI
> +InitializeFloatingPointUnitsWrapper (
> +  VOID
> +  )
> +{
> +  InitializeFloatingPointUnits();
> +}
> +
>  /**
>    This function will fill the exchange info structure.
>
> @@ -771,7 +780,7 @@ FillExchangeInfoData (
>
>    ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();
>
> -  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;
> +  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnitsWrapper;
>
>    //
>    // Get the BSP's data of GDT and IDT

I'm not convinced that this is the right fix, until we know exactly why
and how it changes the behavior of gcc-7. So I've now filed
<https://bugzilla.tianocore.org/show_bug.cgi?id=671> to track this
problem. (I also captured your suggestion in the BZ.)

We do know the exact symptoms and consequences of the miscompilation,
and I want to suppress those symptoms at least, as soon as possible. I
will post a patch for OvmfPkg's "build.sh" to use the GCC49 toolchain
settings with gcc-7.* (no LTO). I'll also ask Gerd to update the
toolchain selection in his SPEC file accordingly.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-15 15:45         ` Laszlo Ersek
@ 2017-08-22  8:00           ` Shi, Steven
  2017-08-22 11:59             ` Laszlo Ersek
  0 siblings, 1 reply; 17+ messages in thread
From: Shi, Steven @ 2017-08-22  8:00 UTC (permalink / raw)
  To: Laszlo Ersek, Ard Biesheuvel
  Cc: edk2-devel-01, Alex Williamson, Justen, Jordan L, Gao, Liming,
	Kinney, Michael D

It is a link flag misuse in our GCC build toolchains, not compiler/linker’s problem. Below patch can fix the wrong assembly function relocation type in PIE binary. With below patch, all the GCC5, GCC49 and GCC48 can build correctly images of OvmfPkgIa32X64 and OvmfPkgX64 platforms without my previous simple work around in my side. Please try it in your side.

Since we are using GCC as linker command, we MUST pass -pie to ld with “-Wl,-pie”, not just “--pie”  or “-fpie”.

So, this means we never correctly build small+PIE 64bits code with GCC toolchains before (CLANG38 is correct). If you failed to enable PIE/LTO build before, it is worthy to revisit those failures with “-Wl,-pie”. FYI.






diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
index 1fa3ca3..9e46d65 100755
--- a/BaseTools/Conf/tools_def.template
+++ b/BaseTools/Conf/tools_def.template
@@ -4375,7 +4375,7 @@ DEFINE GCC44_IA32_X64_DLINK_COMMON   = -nostdlib -Wl,-n,-q,--gc-sections -z comm
DEFINE GCC44_IA32_X64_ASLDLINK_FLAGS = DEF(GCC44_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable
DEFINE GCC44_IA32_X64_DLINK_FLAGS    = DEF(GCC44_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map
DEFINE GCC44_IA32_DLINK2_FLAGS       = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON)
-DEFINE GCC44_X64_DLINK_FLAGS         = DEF(GCC44_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64
+DEFINE GCC44_X64_DLINK_FLAGS         = DEF(GCC44_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64 -Wl,-pie
DEFINE GCC44_X64_DLINK2_FLAGS        = -Wl,--defsym=PECOFF_HEADER_SIZE=0x228 DEF(GCC_DLINK2_FLAGS_COMMON)
DEFINE GCC44_ASM_FLAGS               = DEF(GCC_ASM_FLAGS)

@@ -4455,7 +4455,7 @@ DEFINE GCC49_IA32_X64_DLINK_COMMON   = -nostdlib -Wl,-n,-q,--gc-sections -z comm
DEFINE GCC49_IA32_X64_ASLDLINK_FLAGS = DEF(GCC49_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable
DEFINE GCC49_IA32_X64_DLINK_FLAGS    = DEF(GCC49_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map
DEFINE GCC49_IA32_DLINK2_FLAGS       = DEF(GCC48_IA32_DLINK2_FLAGS)
-DEFINE GCC49_X64_DLINK_FLAGS         = DEF(GCC49_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64
+DEFINE GCC49_X64_DLINK_FLAGS         = DEF(GCC49_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64 -Wl,-pie
DEFINE GCC49_X64_DLINK2_FLAGS        = DEF(GCC48_X64_DLINK2_FLAGS)
DEFINE GCC49_ASM_FLAGS               = DEF(GCC48_ASM_FLAGS)
DEFINE GCC49_ARM_ASM_FLAGS           = DEF(GCC48_ARM_ASM_FLAGS)





Steven Shi

Intel\SSG\STO\UEFI Firmware



Tel: +86 021-61166522

iNet: 821-6522



> -----Original Message-----

> From: Laszlo Ersek [mailto:lersek@redhat.com]

> Sent: Tuesday, August 15, 2017 11:46 PM

> To: Shi, Steven <steven.shi@intel.com>

> Cc: edk2-devel-01 <edk2-devel@lists.01.org>; Alex Williamson

> <alex.williamson@redhat.com>; Ard Biesheuvel

> <ard.biesheuvel@linaro.org>; Justen, Jordan L <jordan.l.justen@intel.com>;

> Gao, Liming <liming.gao@intel.com>; Kinney, Michael D

> <michael.d.kinney@intel.com>

> Subject: Re: [edk2] [PATCH 1/1] BaseTools/tools_def.template: revert to

> large code model for X64/GCC5/LTO

>

> On 08/12/17 05:05, Shi, Steven wrote:

> > OK. I can reproduce the failure with -smp 4 and -m 5120 in my side.

> >

> > It looks a linker bug about assemble function support in PIC/PIE

> > code.  You know, if we only have C code, the compiler/linker will

> > generate all the machine code and guarantee all the address reference

> > are position independent under PIC/PIE build. But if we mix manually

> > written assemble code in the C code,  the linker cannot really control

> > the address reference in the assemble code, and  might got confused.

>

> This is an incorrect description of the situation.

>

> The address reference is *not* in assembly code. (It used to be in

> assembly code, but Mike changed that earlier, for XCODE5 compatibility.)

>

> At this moment the address reference is in *C code*. The C code takes

> the address of an external function, and assigns it to a field in a data

> structure. The assembly code calls the function through this field. The

> assembly code makes no reference to the called function by name. See

> Mike's commit 3b2928b46987:

>

> -    mov        rax, ASM_PFX(InitializeFloatingPointUnits)

> +    mov        rax, qword [esi + InitializeFloatingPointUnitsAddress]

>      sub        rsp, 20h

>      call       rax               ; Call assembly function to initialize FPU per UEFI spec

>

> And, indeed, it is *not* the assembly code that's being miscompiled. It

> is the C-language assignment below that is miscompiled:

>

> +  ExchangeInfo->InitializeFloatingPointUnitsAddress =

> (UINTN)InitializeFloatingPointUnits;

>

> > So, it is not seldom we could see the compiler/linker generate wrong

> > code for mixed code,  especially with very high level optimization,

> > e.g. LTO.

> >

> > Globally change memory model from small to large will bring not

> > trivial impact (+15%) to code size, espcial for the uncomperssed

> > option rom dirver. Below is some data of OvmfPkgX64.dsc platform.

> >

> >               Dxecore.efi     CpuDxe.efi      CpuMpPei        PeiCore.efi

> > Small+PIE:    139520          47360           30144           46720

> > Large:        165696          55360           34496           53504

>

> My argument is that the current "-mcmodel=small" option actually *lies*

> to the compiler about our binaries. According to the GCC documentation,

> "-mcmodel=small" implies that a binary built like this will never be

> executed from above 2GB in the address space. This is why gcc-7 believes

> it is allowed to generate a MOV instruction that sign-extends a 32-bit

> address to 64-bit -- because we promise GCC that the sign bit will

> always be clear to begin with.

>

> IOW, we make a promise, gcc-7 generates code accordingly, and then we

> break the promise, by executing the binary (the assignment in the C code

> of MpInitLib) from above 2GB.

>

>

> In particular, an X64 UEFI_DRIVER module, shipped as an option ROM on a

> physical PCI(E) card, could be loaded anywhere at all in the 64-bit

> address space (given sufficient memory in the computer). Building such a

> driver with "-mcmodel=small" is wrong therefore; we cannot guarantee

> that the driver will be executed from under 2GB.

>

> Perhaps we should use "-mcmodel=large", but *keep* "-fpie". ... I've now

> tried that, but it doesn't work. With "-mcmodel=large -fpie", the

> compiler emits R_X86_64_GOTOFF64 relocations (type 25 decimal), and I

> get errors like:

>

> GenFw: ERROR 3000: Invalid

>

> Build/OvmfX64/DEBUG_GCC5/X64/MdeModulePkg/Universal/ReportStatus

> CodeRouter/Pei/ReportStatusCodeRouterPei/DEBUG/ReportStatusCodeRou

> terPei.dll

>   unsupported ELF EM_X86_64 relocation 0x19.

>

> I believe Ard's commit 28ade7b802e0 ("MdePkg: move to 'hidden'

> visibility for all symbols under GCC/X64", 2016-08-01) was meant to

> prevent this, but apparently it's not enough with gcc-7.1.

>

> > A simpler workaround could be to add a C function wrapper around the

> > assemble lib function as below. This simple workaround works in my

> > side.  But it is necessary to find this issue's root cause and fix it

> > in the compiler/linker. I will try to raise this issue to

> > compiler/linker guys.

> >

> > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c

> b/UefiCpuPkg/Library/MpInitLib/MpLib.c

> > old mode 100644

> > new mode 100755

> > index a3eea29..7afe434

> > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c

> > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c

> > @@ -738,6 +738,15 @@ WaitApWakeup (

> >    }

> >  }

> >

> > +VOID

> > +EFIAPI

> > +InitializeFloatingPointUnitsWrapper (

> > +  VOID

> > +  )

> > +{

> > +  InitializeFloatingPointUnits();

> > +}

> > +

> >  /**

> >    This function will fill the exchange info structure.

> >

> > @@ -771,7 +780,7 @@ FillExchangeInfoData (

> >

> >    ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();

> >

> > -  ExchangeInfo->InitializeFloatingPointUnitsAddress =

> (UINTN)InitializeFloatingPointUnits;

> > +  ExchangeInfo->InitializeFloatingPointUnitsAddress =

> (UINTN)InitializeFloatingPointUnitsWrapper;

> >

> >    //

> >    // Get the BSP's data of GDT and IDT

>

> I'm not convinced that this is the right fix, until we know exactly why

> and how it changes the behavior of gcc-7. So I've now filed

> <https://bugzilla.tianocore.org/show_bug.cgi?id=671> to track this

> problem. (I also captured your suggestion in the BZ.)

>

> We do know the exact symptoms and consequences of the miscompilation,

> and I want to suppress those symptoms at least, as soon as possible. I

> will post a patch for OvmfPkg's "build.sh" to use the GCC49 toolchain

> settings with gcc-7.* (no LTO). I'll also ask Gerd to update the

> toolchain selection in his SPEC file accordingly.

>

> Thanks

> Laszlo

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-22  8:00           ` Shi, Steven
@ 2017-08-22 11:59             ` Laszlo Ersek
  2017-08-22 12:23               ` Gao, Liming
  2017-08-22 13:27               ` Paolo Bonzini
  0 siblings, 2 replies; 17+ messages in thread
From: Laszlo Ersek @ 2017-08-22 11:59 UTC (permalink / raw)
  To: Shi, Steven, Ard Biesheuvel
  Cc: edk2-devel-01, Alex Williamson, Justen, Jordan L, Gao, Liming,
	Kinney, Michael D, Paolo Bonzini

On 08/22/17 10:00, Shi, Steven wrote:
> It is a link flag misuse in our GCC build toolchains, not
> compiler/linker's problem. Below patch can fix the wrong assembly
> function relocation type in PIE binary. With below patch, all the
> GCC5, GCC49 and GCC48 can build correctly images of OvmfPkgIa32X64 and
> OvmfPkgX64 platforms without my previous simple work around in my
> side. Please try it in your side.
>
> Since we are using GCC as linker command, we MUST pass -pie to ld with
> "-Wl,-pie", not just "--pie"  or "-fpie".
>
> So, this means we never correctly build small+PIE 64bits code with GCC
> toolchains before (CLANG38 is correct). If you failed to enable
> PIE/LTO build before, it is worthy to revisit those failures with
> "-Wl,-pie". FYI.

The first question of Paolo (CC'd) was, when he saw this issue in my
last status report, whether we added "-fpie" to the link command line as
well. And, I confirmed, we did. This is how I responded to him:

> - in "BaseTools/Conf/build_rule.template", there's
>
> > [Static-Library-File]
> >     <InputFile>
> >         *.lib
> >
> >     <ExtraDependency>
> >         $(MAKE_FILE)
> >
> >     <OutputFile>
> >         $(DEBUG_DIR)(+)$(MODULE_NAME).dll
> >
> >     [...]
> >
> >     <Command.GCC>
> >         "$(DLINK)" -o ${dst} $(DLINK_FLAGS) -Wl,--start-group,@$(STATIC_LIBRARY_FILES_LIST),--end-group $(CC_FLAGS) $(DLINK2_FLAGS)
> >         "$(OBJCOPY)" $(OBJCOPY_FLAGS) ${dst}
>
> (see "$(CC_FLAGS)")
>
> - and in the build log, I see
>
> > "gcc" \
> >   -o Build/OvmfX64/DEBUG_GCC5/X64/UefiCpuPkg/CpuMpPei/CpuMpPei/DEBUG/CpuMpPei.dll \
> >   -nostdlib \
> >   -Wl,-n,-q,--gc-sections \
> >   -z common-page-size=0x40 \
> >   -Wl,--entry,_ModuleEntryPoint \
> >   -u _ModuleEntryPoint \
> >   -Wl,-Map,Build/OvmfX64/DEBUG_GCC5/X64/UefiCpuPkg/CpuMpPei/CpuMpPei/DEBUG/CpuMpPei.map \
> >   -Wl,-melf_x86_64,--oformat=elf64-x86-64 \
> >   -flto \
> >   -Os \
> >   -Wl,--start-group,@Build/OvmfX64/DEBUG_GCC5/X64/UefiCpuPkg/CpuMpPei/CpuMpPei/OUTPUT/static_library_files.lst,--end-group \
> >   -g \
> >   -fshort-wchar \
> >   -fno-builtin \
> >   -fno-strict-aliasing \
> >   -Wall \
> >   -Werror \
> >   -Wno-array-bounds \
> >   -ffunction-sections \
> >   -fdata-sections \
> >   -include AutoGen.h \
> >   -fno-common \
> >   -DSTRING_ARRAY_NAME=CpuMpPeiStrings \
> >   -m64 \
> >   -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" \
> >   -maccumulate-outgoing-args \
> >   -mno-red-zone \
> >   -Wno-address \
> >   -mcmodel=small \
> >   -fpie \
> >   -fno-asynchronous-unwind-tables \
> >   -Wno-address \
> >   -flto \
> >   -DUSING_LTO \
> >   -Os \
> >   -mno-mmx \
> >   -mno-sse \
> >   -D DISABLE_NEW_DEPRECATED_INTERFACES \
> >   -Wl,--defsym=PECOFF_HEADER_SIZE=0x228 \
> >   -Wl,--script=BaseTools/Scripts/GccBase.lds \
> >   -Wno-error

I don't understand why we need "-Wl,-pie" separately. The "gcc" binary
should pass it through to "ld" as necessary, IMO. This is what the gcc
documentation says:

> 3.13 Options for Linking
> ========================
> '-pie'
>      Produce a position independent executable on targets that support
>      it.  For predictable results, you must also specify the same set
>      of options used for compilation ('-fpie', '-fPIE', or model
>      suboptions) when you specify this linker option.
>
> 3.18 Options for Code Generation Conventions
> ============================================
> '-fpie'
> '-fPIE'
>      These options are similar to '-fpic' and '-fPIC', but generated
>      position independent code can be only linked into executables.
>      Usually these options are used when '-pie' GCC option is used
>      during linking.
>
>      '-fpie' and '-fPIE' both define the macros '__pie__' and
>      '__PIE__'. The macros have the value 1 for '-fpie' and 2 for
>      '-fPIE'.

This seems to suggest that "-pie" is the *master* switch (used only when
linking), and "-fpie" is a *prerequisite* for it (to be used both when
linking and compiling). Is this right?

If so, then I think this is a gcc usability bug. We don't generally
start our thinking from the linker side. The above implies that the
simple (hosted) command line:

$ gcc -o example -fpie source1.c source2.c

could also result in miscompilation, because "-pie" is not given, only
"-fpie".

On 08/22/17 10:00, Shi, Steven wrote:

> diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
> index 1fa3ca3..9e46d65 100755
> --- a/BaseTools/Conf/tools_def.template
> +++ b/BaseTools/Conf/tools_def.template
> @@ -4375,7 +4375,7 @@ DEFINE GCC44_IA32_X64_DLINK_COMMON   = -nostdlib -Wl,-n,-q,--gc-sections -z comm
> DEFINE GCC44_IA32_X64_ASLDLINK_FLAGS = DEF(GCC44_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable
> DEFINE GCC44_IA32_X64_DLINK_FLAGS    = DEF(GCC44_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map
> DEFINE GCC44_IA32_DLINK2_FLAGS       = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON)
> -DEFINE GCC44_X64_DLINK_FLAGS         = DEF(GCC44_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64
> +DEFINE GCC44_X64_DLINK_FLAGS         = DEF(GCC44_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64 -Wl,-pie
> DEFINE GCC44_X64_DLINK2_FLAGS        = -Wl,--defsym=PECOFF_HEADER_SIZE=0x228 DEF(GCC_DLINK2_FLAGS_COMMON)
> DEFINE GCC44_ASM_FLAGS               = DEF(GCC_ASM_FLAGS)
>
> @@ -4455,7 +4455,7 @@ DEFINE GCC49_IA32_X64_DLINK_COMMON   = -nostdlib -Wl,-n,-q,--gc-sections -z comm
> DEFINE GCC49_IA32_X64_ASLDLINK_FLAGS = DEF(GCC49_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable
> DEFINE GCC49_IA32_X64_DLINK_FLAGS    = DEF(GCC49_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map
> DEFINE GCC49_IA32_DLINK2_FLAGS       = DEF(GCC48_IA32_DLINK2_FLAGS)
> -DEFINE GCC49_X64_DLINK_FLAGS         = DEF(GCC49_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64
> +DEFINE GCC49_X64_DLINK_FLAGS         = DEF(GCC49_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64 -Wl,-pie
> DEFINE GCC49_X64_DLINK2_FLAGS        = DEF(GCC48_X64_DLINK2_FLAGS)
> DEFINE GCC49_ASM_FLAGS               = DEF(GCC48_ASM_FLAGS)
> DEFINE GCC49_ARM_ASM_FLAGS           = DEF(GCC48_ARM_ASM_FLAGS)

Do we think that this was an omission in commit a1b8baccc30b ("BaseTools
GCC: use 'gcc' as the linker command for GCC44 and later", 2016-07-23)?

Honestly, I'm quite a bit annoyed by this parameter forwarding mess
between "gcc" and "ld". If someone can submit a formal patch to fix
this, please do so (I also suggest to reassign BZ#671 to BaseTools
then). If there's an actual patch I can fetch or apply with git, I'll be
happy to test it.

Thanks,
Laszlo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-22 11:59             ` Laszlo Ersek
@ 2017-08-22 12:23               ` Gao, Liming
  2017-08-22 13:27               ` Paolo Bonzini
  1 sibling, 0 replies; 17+ messages in thread
From: Gao, Liming @ 2017-08-22 12:23 UTC (permalink / raw)
  To: Laszlo Ersek, Shi, Steven, Ard Biesheuvel
  Cc: edk2-devel-01, Alex Williamson, Justen, Jordan L,
	Kinney, Michael D, Paolo Bonzini

Laszlo:
  I will take BZ#671 and create the formal patch for review. 

Thanks
Liming
> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Tuesday, August 22, 2017 7:59 PM
> To: Shi, Steven <steven.shi@intel.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: edk2-devel-01 <edk2-devel@lists.01.org>; Alex Williamson <alex.williamson@redhat.com>; Justen, Jordan L
> <jordan.l.justen@intel.com>; Gao, Liming <liming.gao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini
> <pbonzini@redhat.com>
> Subject: Re: [edk2] [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
> 
> On 08/22/17 10:00, Shi, Steven wrote:
> > It is a link flag misuse in our GCC build toolchains, not
> > compiler/linker's problem. Below patch can fix the wrong assembly
> > function relocation type in PIE binary. With below patch, all the
> > GCC5, GCC49 and GCC48 can build correctly images of OvmfPkgIa32X64 and
> > OvmfPkgX64 platforms without my previous simple work around in my
> > side. Please try it in your side.
> >
> > Since we are using GCC as linker command, we MUST pass -pie to ld with
> > "-Wl,-pie", not just "--pie"  or "-fpie".
> >
> > So, this means we never correctly build small+PIE 64bits code with GCC
> > toolchains before (CLANG38 is correct). If you failed to enable
> > PIE/LTO build before, it is worthy to revisit those failures with
> > "-Wl,-pie". FYI.
> 
> The first question of Paolo (CC'd) was, when he saw this issue in my
> last status report, whether we added "-fpie" to the link command line as
> well. And, I confirmed, we did. This is how I responded to him:
> 
> > - in "BaseTools/Conf/build_rule.template", there's
> >
> > > [Static-Library-File]
> > >     <InputFile>
> > >         *.lib
> > >
> > >     <ExtraDependency>
> > >         $(MAKE_FILE)
> > >
> > >     <OutputFile>
> > >         $(DEBUG_DIR)(+)$(MODULE_NAME).dll
> > >
> > >     [...]
> > >
> > >     <Command.GCC>
> > >         "$(DLINK)" -o ${dst} $(DLINK_FLAGS) -Wl,--start-group,@$(STATIC_LIBRARY_FILES_LIST),--end-group $(CC_FLAGS)
> $(DLINK2_FLAGS)
> > >         "$(OBJCOPY)" $(OBJCOPY_FLAGS) ${dst}
> >
> > (see "$(CC_FLAGS)")
> >
> > - and in the build log, I see
> >
> > > "gcc" \
> > >   -o Build/OvmfX64/DEBUG_GCC5/X64/UefiCpuPkg/CpuMpPei/CpuMpPei/DEBUG/CpuMpPei.dll \
> > >   -nostdlib \
> > >   -Wl,-n,-q,--gc-sections \
> > >   -z common-page-size=0x40 \
> > >   -Wl,--entry,_ModuleEntryPoint \
> > >   -u _ModuleEntryPoint \
> > >   -Wl,-Map,Build/OvmfX64/DEBUG_GCC5/X64/UefiCpuPkg/CpuMpPei/CpuMpPei/DEBUG/CpuMpPei.map \
> > >   -Wl,-melf_x86_64,--oformat=elf64-x86-64 \
> > >   -flto \
> > >   -Os \
> > >
> -Wl,--start-group,@Build/OvmfX64/DEBUG_GCC5/X64/UefiCpuPkg/CpuMpPei/CpuMpPei/OUTPUT/static_library_files.lst,--end-group
> \
> > >   -g \
> > >   -fshort-wchar \
> > >   -fno-builtin \
> > >   -fno-strict-aliasing \
> > >   -Wall \
> > >   -Werror \
> > >   -Wno-array-bounds \
> > >   -ffunction-sections \
> > >   -fdata-sections \
> > >   -include AutoGen.h \
> > >   -fno-common \
> > >   -DSTRING_ARRAY_NAME=CpuMpPeiStrings \
> > >   -m64 \
> > >   -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" \
> > >   -maccumulate-outgoing-args \
> > >   -mno-red-zone \
> > >   -Wno-address \
> > >   -mcmodel=small \
> > >   -fpie \
> > >   -fno-asynchronous-unwind-tables \
> > >   -Wno-address \
> > >   -flto \
> > >   -DUSING_LTO \
> > >   -Os \
> > >   -mno-mmx \
> > >   -mno-sse \
> > >   -D DISABLE_NEW_DEPRECATED_INTERFACES \
> > >   -Wl,--defsym=PECOFF_HEADER_SIZE=0x228 \
> > >   -Wl,--script=BaseTools/Scripts/GccBase.lds \
> > >   -Wno-error
> 
> I don't understand why we need "-Wl,-pie" separately. The "gcc" binary
> should pass it through to "ld" as necessary, IMO. This is what the gcc
> documentation says:
> 
> > 3.13 Options for Linking
> > ========================
> > '-pie'
> >      Produce a position independent executable on targets that support
> >      it.  For predictable results, you must also specify the same set
> >      of options used for compilation ('-fpie', '-fPIE', or model
> >      suboptions) when you specify this linker option.
> >
> > 3.18 Options for Code Generation Conventions
> > ============================================
> > '-fpie'
> > '-fPIE'
> >      These options are similar to '-fpic' and '-fPIC', but generated
> >      position independent code can be only linked into executables.
> >      Usually these options are used when '-pie' GCC option is used
> >      during linking.
> >
> >      '-fpie' and '-fPIE' both define the macros '__pie__' and
> >      '__PIE__'. The macros have the value 1 for '-fpie' and 2 for
> >      '-fPIE'.
> 
> This seems to suggest that "-pie" is the *master* switch (used only when
> linking), and "-fpie" is a *prerequisite* for it (to be used both when
> linking and compiling). Is this right?
> 
> If so, then I think this is a gcc usability bug. We don't generally
> start our thinking from the linker side. The above implies that the
> simple (hosted) command line:
> 
> $ gcc -o example -fpie source1.c source2.c
> 
> could also result in miscompilation, because "-pie" is not given, only
> "-fpie".
> 
> On 08/22/17 10:00, Shi, Steven wrote:
> 
> > diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
> > index 1fa3ca3..9e46d65 100755
> > --- a/BaseTools/Conf/tools_def.template
> > +++ b/BaseTools/Conf/tools_def.template
> > @@ -4375,7 +4375,7 @@ DEFINE GCC44_IA32_X64_DLINK_COMMON   = -nostdlib -Wl,-n,-q,--gc-sections -z comm
> > DEFINE GCC44_IA32_X64_ASLDLINK_FLAGS = DEF(GCC44_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u
> ReferenceAcpiTable
> > DEFINE GCC44_IA32_X64_DLINK_FLAGS    = DEF(GCC44_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u
> $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map
> > DEFINE GCC44_IA32_DLINK2_FLAGS       = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON)
> > -DEFINE GCC44_X64_DLINK_FLAGS         = DEF(GCC44_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64
> > +DEFINE GCC44_X64_DLINK_FLAGS         = DEF(GCC44_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64
> -Wl,-pie
> > DEFINE GCC44_X64_DLINK2_FLAGS        = -Wl,--defsym=PECOFF_HEADER_SIZE=0x228 DEF(GCC_DLINK2_FLAGS_COMMON)
> > DEFINE GCC44_ASM_FLAGS               = DEF(GCC_ASM_FLAGS)
> >
> > @@ -4455,7 +4455,7 @@ DEFINE GCC49_IA32_X64_DLINK_COMMON   = -nostdlib -Wl,-n,-q,--gc-sections -z comm
> > DEFINE GCC49_IA32_X64_ASLDLINK_FLAGS = DEF(GCC49_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u
> ReferenceAcpiTable
> > DEFINE GCC49_IA32_X64_DLINK_FLAGS    = DEF(GCC49_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u
> $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map
> > DEFINE GCC49_IA32_DLINK2_FLAGS       = DEF(GCC48_IA32_DLINK2_FLAGS)
> > -DEFINE GCC49_X64_DLINK_FLAGS         = DEF(GCC49_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64
> > +DEFINE GCC49_X64_DLINK_FLAGS         = DEF(GCC49_IA32_X64_DLINK_FLAGS) -Wl,-melf_x86_64,--oformat=elf64-x86-64
> -Wl,-pie
> > DEFINE GCC49_X64_DLINK2_FLAGS        = DEF(GCC48_X64_DLINK2_FLAGS)
> > DEFINE GCC49_ASM_FLAGS               = DEF(GCC48_ASM_FLAGS)
> > DEFINE GCC49_ARM_ASM_FLAGS           = DEF(GCC48_ARM_ASM_FLAGS)
> 
> Do we think that this was an omission in commit a1b8baccc30b ("BaseTools
> GCC: use 'gcc' as the linker command for GCC44 and later", 2016-07-23)?
> 
> Honestly, I'm quite a bit annoyed by this parameter forwarding mess
> between "gcc" and "ld". If someone can submit a formal patch to fix
> this, please do so (I also suggest to reassign BZ#671 to BaseTools
> then). If there's an actual patch I can fetch or apply with git, I'll be
> happy to test it.
> 
> Thanks,
> Laszlo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-22 11:59             ` Laszlo Ersek
  2017-08-22 12:23               ` Gao, Liming
@ 2017-08-22 13:27               ` Paolo Bonzini
  2017-08-22 14:03                 ` Ard Biesheuvel
  1 sibling, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2017-08-22 13:27 UTC (permalink / raw)
  To: Laszlo Ersek, Shi, Steven, Ard Biesheuvel
  Cc: edk2-devel-01, Alex Williamson, Justen, Jordan L, Gao, Liming,
	Kinney, Michael D

On 22/08/2017 13:59, Laszlo Ersek wrote:
> This seems to suggest that "-pie" is the *master* switch (used only when
> linking), and "-fpie" is a *prerequisite* for it (to be used both when
> linking and compiling). Is this right?
> 
> If so, then I think this is a gcc usability bug. We don't generally
> start our thinking from the linker side. The above implies that the
> simple (hosted) command line:
> 
> $ gcc -o example -fpie source1.c source2.c
> 
> could also result in miscompilation, because "-pie" is not given, only
> "-fpie".

No, GCC should add -pie on its own.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-22 13:27               ` Paolo Bonzini
@ 2017-08-22 14:03                 ` Ard Biesheuvel
  2017-08-22 14:15                   ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-08-22 14:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Laszlo Ersek, Shi, Steven, edk2-devel-01, Alex Williamson,
	Justen, Jordan L, Gao, Liming, Kinney, Michael D

On 22 August 2017 at 14:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 22/08/2017 13:59, Laszlo Ersek wrote:
>> This seems to suggest that "-pie" is the *master* switch (used only when
>> linking), and "-fpie" is a *prerequisite* for it (to be used both when
>> linking and compiling). Is this right?
>>
>> If so, then I think this is a gcc usability bug. We don't generally
>> start our thinking from the linker side. The above implies that the
>> simple (hosted) command line:
>>
>> $ gcc -o example -fpie source1.c source2.c
>>
>> could also result in miscompilation, because "-pie" is not given, only
>> "-fpie".
>
> No, GCC should add -pie on its own.
>

I disagree. PIE linking and PIE code generation are two completely
different things.

PIE linking (in the absence of LTO) only involves emitting the
sections containing the metadata required by the loader at runtime.
Because dynamic ELF relocations are more restricted than static ones,
and only operate on native pointer sized quantities, this imposes
constraints on the code generation, which is why we need the -fpic or
-fpie compiler switch. On ARM, this means you should not emit absolute
symbol references where the address is encoded in the immediate field
of a sequence of movw/movt/movz/movk instructions. I'm sure there are
similar restrictions on other architectures. Note that the arm64 KASLR
kernel does use PIE linking but omits -fpic/-fpie simply because the
default small code model never uses such instructions, but always uses
relative references (and statically initialized function pointers etc
are guaranteed to be dynamically relocatable)

IIRC, PIE linking predates the availability of the -fpie GCC flag, and
so -fpic objects were used to create PIE binaries, because they
happened to fulfil these requirements, given that they apply to shared
libraries as well. However, -fpic is geared towards ELF symbol
preemption and other restrictions that do apply to shared libraries
but not to PIE executables, and so the -fpie switch was introduced as
an alternative, which generates code that may not be suitable for a
shared library but can be used in a PIE or ordinary executable.

None of this really applies to bare metal binaries, which is why we
need the visibility tweaks when using -fpic/-fpie, in which case the
compiler will relative references over absolute ones. Such objects
could be combined in different ways, and PIE linking is only one of
them.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-22 14:03                 ` Ard Biesheuvel
@ 2017-08-22 14:15                   ` Paolo Bonzini
  2017-08-22 16:04                     ` Laszlo Ersek
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2017-08-22 14:15 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Laszlo Ersek, Shi, Steven, edk2-devel-01, Alex Williamson,
	Justen, Jordan L, Gao, Liming, Kinney, Michael D

On 22/08/2017 16:03, Ard Biesheuvel wrote:
> On 22 August 2017 at 14:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 22/08/2017 13:59, Laszlo Ersek wrote:
>>> This seems to suggest that "-pie" is the *master* switch (used only when
>>> linking), and "-fpie" is a *prerequisite* for it (to be used both when
>>> linking and compiling). Is this right?
>>>
>>> If so, then I think this is a gcc usability bug. We don't generally
>>> start our thinking from the linker side. The above implies that the
>>> simple (hosted) command line:
>>>
>>> $ gcc -o example -fpie source1.c source2.c
>>>
>>> could also result in miscompilation, because "-pie" is not given, only
>>> "-fpie".
>>
>> No, GCC should add -pie on its own.
>>
> 
> I disagree. PIE linking and PIE code generation are two completely
> different things.

What I'm saying is that GCC should add -pie on its own if you add -fpie
to the linker command line.  But that would require changes to the
compiler driver.

That said, the extra "-Wl," in "-Wl,-pie" is not necessary; the compiler
driver knows "-pie" and swallows it when compiling (and passes it to the
linker).

Paolo

> PIE linking (in the absence of LTO) only involves emitting the
> sections containing the metadata required by the loader at runtime.
> Because dynamic ELF relocations are more restricted than static ones,
> and only operate on native pointer sized quantities, this imposes
> constraints on the code generation, which is why we need the -fpic or
> -fpie compiler switch. On ARM, this means you should not emit absolute
> symbol references where the address is encoded in the immediate field
> of a sequence of movw/movt/movz/movk instructions. I'm sure there are
> similar restrictions on other architectures. Note that the arm64 KASLR
> kernel does use PIE linking but omits -fpic/-fpie simply because the
> default small code model never uses such instructions, but always uses
> relative references (and statically initialized function pointers etc
> are guaranteed to be dynamically relocatable)
> 
> IIRC, PIE linking predates the availability of the -fpie GCC flag, and
> so -fpic objects were used to create PIE binaries, because they
> happened to fulfil these requirements, given that they apply to shared
> libraries as well. However, -fpic is geared towards ELF symbol
> preemption and other restrictions that do apply to shared libraries
> but not to PIE executables, and so the -fpie switch was introduced as
> an alternative, which generates code that may not be suitable for a
> shared library but can be used in a PIE or ordinary executable.
> 
> None of this really applies to bare metal binaries, which is why we
> need the visibility tweaks when using -fpic/-fpie, in which case the
> compiler will relative references over absolute ones. Such objects
> could be combined in different ways, and PIE linking is only one of
> them.
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-22 14:15                   ` Paolo Bonzini
@ 2017-08-22 16:04                     ` Laszlo Ersek
  2017-08-22 16:06                       ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Laszlo Ersek @ 2017-08-22 16:04 UTC (permalink / raw)
  To: Paolo Bonzini, Ard Biesheuvel
  Cc: Shi, Steven, edk2-devel-01, Alex Williamson, Justen, Jordan L,
	Gao, Liming, Kinney, Michael D

On 08/22/17 16:15, Paolo Bonzini wrote:
> On 22/08/2017 16:03, Ard Biesheuvel wrote:
>> On 22 August 2017 at 14:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> On 22/08/2017 13:59, Laszlo Ersek wrote:
>>>> This seems to suggest that "-pie" is the *master* switch (used only when
>>>> linking), and "-fpie" is a *prerequisite* for it (to be used both when
>>>> linking and compiling). Is this right?
>>>>
>>>> If so, then I think this is a gcc usability bug. We don't generally
>>>> start our thinking from the linker side. The above implies that the
>>>> simple (hosted) command line:
>>>>
>>>> $ gcc -o example -fpie source1.c source2.c
>>>>
>>>> could also result in miscompilation, because "-pie" is not given, only
>>>> "-fpie".
>>>
>>> No, GCC should add -pie on its own.
>>>
>>
>> I disagree. PIE linking and PIE code generation are two completely
>> different things.
> 
> What I'm saying is that GCC should add -pie on its own if you add -fpie
> to the linker command line.

Yes, that's my point, from a usability perspective.

While I hope to understand Ard's explanation (and it seems to confirm
that "-fpie" at compilation is a 'prerequisite' for "-pie" at linking),
again, this simply isn't how humans think about building binaries. If we
are required to call the compiler frontend for all purposes -- and we
seem to be required --, then we shouldn't have to express the same end
goal in different ways for different link-editing phases.

> But that would require changes to the
> compiler driver.
> 
> That said, the extra "-Wl," in "-Wl,-pie" is not necessary; the compiler
> driver knows "-pie" and swallows it when compiling (and passes it to the
> linker).

Now *that* I can get behind. If this works, then please let us do it --
replace "-fpie" with "-pie" in GCC44_X64_CC_FLAGS, and add no "-Wl,"
stuff to any DLINK defines.

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-22 16:04                     ` Laszlo Ersek
@ 2017-08-22 16:06                       ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2017-08-22 16:06 UTC (permalink / raw)
  To: Laszlo Ersek, Ard Biesheuvel
  Cc: Shi, Steven, edk2-devel-01, Alex Williamson, Justen, Jordan L,
	Gao, Liming, Kinney, Michael D

On 22/08/2017 18:04, Laszlo Ersek wrote:
>> That said, the extra "-Wl," in "-Wl,-pie" is not necessary; the compiler
>> driver knows "-pie" and swallows it when compiling (and passes it to the
>> linker).
> Now *that* I can get behind. If this works, then please let us do it --
> replace "-fpie" with "-pie" in GCC44_X64_CC_FLAGS, and add no "-Wl,"
> stuff to any DLINK defines.

Note that you still need -fpie in the CC flags (and it _should_ not need
"-pie" in CC flags, only in DLINK).

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-11  0:34 ` [PATCH 1/1] " Laszlo Ersek
  2017-08-11  5:28   ` Shi, Steven
@ 2017-08-11 10:03   ` Ard Biesheuvel
  2017-08-11 10:30     ` Laszlo Ersek
  2017-08-11 22:21   ` Alex Williamson
  2 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-08-11 10:03 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: edk2-devel-01, Alex Williamson, Jordan Justen, Liming Gao,
	Michael Kinney, Yonghong Zhu

On 11 August 2017 at 01:34, Laszlo Ersek <lersek@redhat.com> wrote:
> Several users have recently reported boot failures with OVMF. The symptoms
> are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found
> the commit (via bisection) that exposes the issue. In this patch I'll
> attempt to analyze the symptoms and fix the root problem.
>
> (1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that
>     provides multiprocessing services. It is built into the CpuMpPei
>     module (for the PEI phase) and the CpuDxe module (for the DXE and BDS
>     phases). When either of these modules starts up during boot, it
>     briefly boots up all of the CPUs, perfoms some counting /
>     synchronization, and then all the APs (application processors) are
>     quiesced until further work arrives.
>
>     The APs boot in real mode, and need assembly language init code (a
>     "reset vector") that gradually takes them into 64-bit mode, before
>     they can call back into normal C code. The reset vector code is
>     assembled at build time, but it is copied as data under 1MB (for real
>     mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe
>     launches.
>
>     Part of the reset vector code is FPU initialization. The reset vector
>     in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly
>     language function InitializeFloatingPointUnits(), which is however
>     provided by an external library. Normally, when the object files were
>     linked together, this function call would result in an absolute
>     reference (and matching relocation, performed at module loading), and
>     the instance of the reset vector that was copied under 1MB during boot
>     would refer to the same, unchanged, absolute address of
>     InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe.
>
>     This didn't work with X64 XCODE5 builds however. XCODE5 prefers
>     RIP-relative addressing for X64 (i.e., the assembly language function
>     call depended on the distance between the call site and the callee).
>     When the call site was copied under 1MB but
>     InitializeFloatingPointUnits() would not move, the call broke.
>
>     This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987
>     ("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues",
>     2017-05-17). The solution was to add another function pointer to the
>     MP_CPU_EXCHANGE_INFO communication area. (The function pointer would
>     have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would
>     assign the absolute address of InitializeFloatingPointUnits() to the
>     new field, and the APs would retrieve it -- matching the pointer size
>     correctly --, and call it. The C assignment can be seen in the
>     following hunk (from commit 3b2928b46987):
>
>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> index 407c44c95e62..735e099b32f2 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> @@ -751,6 +751,8 @@ FillExchangeInfoData (
>>
>>    ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled ();
>>
>> +  ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits;
>> +
>>    //
>>    // Get the BSP's data of GDT and IDT
>>    //
>
>     This moved the relocation of the InitializeFloatingPointUnits()
>     reference (or, with XCODE5, the RIP-relative
>     InitializeFloatingPointUnits() reference) from the assembly code
>     (which is copied around) to the C code (which is not copied around).
>
> (2) Unfortunately, the C-language assignment to
>     "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is
>     miscompiled under the following circumstances:
>
>     - the edk2 build target is DEBUG (or RELEASE),
>     - the edk2 toolchain selection is GCC5,
>     - (from the above two it follows that -Os and LTO are enabled,)
>     - MpInitLib is built for X64,
>     - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).
>
>     (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the
>     PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks
>     in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)
>
>     Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show
>     the correct assembly code generated by gcc for the assignment, and
>     flip the build target to DEBUG for showing the broken assembly code.
>
> (3) Correct code for the assignment:
>
>>     5406:       48 8d 15 73 24 00 00    lea    0x2473(%rip),%rdx        # 7880 <InitializeFloatingPointUnits>
>>     540d:       48 8b 45 f8             mov    -0x8(%rbp),%rax
>>     5411:       48 89 90 84 00 00 00    mov    %rdx,0x84(%rax)
>
>     Here the absolute address of InitializeFloatingPointUnits() is loaded
>     into RDX with RIP-relative addressing. Then RAX is pointed to the
>     ExchangeInfo structure, and finally RDX is stored into
>     ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is
>     0x84 in the structure).
>
>     Here the only relocation happens when CpuMpPei is linked. The distance
>     between the call site and the callee is calculated and encoded in the
>     LEA instruction. The same distance is valid when CpuMpPei runs, so the
>     PEI core doesn't have to relocate the LEA instruction when it loads
>     CpuMpPei.
>
> (4) Incorrect code for the assignment:
>
>>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx
>>     ...
>>     2d7b:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>>     ...
>> 0000000000005860 <InitializeFloatingPointUnits>:
>>     5860:       9b db e3                finit
>
>     This is not RIP-relative addressing. Therefore gcc generates a
>     relocation record for the address encoded in the MOV instruction, at
>     offset 0x2d69+0x3 -- initial value being 0x5860:
>
>> Sections:
>> Idx Name          Size      VMA               LMA               File off  Algn
>>   0 .text         00006e66  0000000000000240  0000000000000240  000000c0  2**6
>>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>> ...
>> RELOCATION RECORDS FOR [.text]:
>> OFFSET           TYPE              VALUE
>> ...
>> 0000000000002b2c R_X86_64_32S      InitializeFloatingPointUnits
>
>     (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which
>     equals the .text section base plus the relocation offset, i.e.,
>     0x240+0x2b2c.)
>
>     This is the only R_X86_64_32S relocation record in the linked-together
>     CpuMpPei.debug file (still an ELF file). In the rest of the build
>     process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image
>     (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF
>     relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).
>
>     The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon first
>     when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi
>     could theoretically execute from flash, and PEIFV has a nonzero base
>     address. (However, CpuMpPei has a DEPEX on permanent PEI RAM
>     discovery, so the PEI core will only launch it from permanent PEI RAM,
>     in practice.)
>
>     The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for
>     the second time when the PEI core loads CpuMpPei into permanent PEI
>     RAM, and performs the relocations. I confirmed (with debug messages)
>     that the relocated address (within the instruction) is correct:
>
>> PeiCore:PeCoffLoaderRelocateImage: Fixup32=BFF25D6C FixupData=0
>> PeiCore:PeCoffLoaderRelocateImage: *Fixup32=0x8487A0 Adjust=0xBF6E00C0
>> Loading PEIM at 0x000BFF23000 EntryPoint=0x000BFF271BB CpuMpPei.efi
>
>     The "Fixup32" pointer points to the address constant that should be
>     relocated (0xBFF25D6C - 0xBFF23000 == 0x2D6C). The initial value of
>     (*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that
>     I described above (as "first relocation").
>
>     After incrementing (*Fixup32) by "Adjust", the final value is
>     0xBFF28860. This is the absolute address that is patched into the MOV
>     instruction that sets RDX for the assignment. This value is correct;
>     the InitializeFloatingPointUnits() function really exists in CpuMpPei
>     at this address, in permanent PEI RAM. It is at an offset of 0x5860
>     from the load address 0x000BFF23000 (see third line).
>
>     So where do things go wrong? Again, the MOV instruction itself is
>     wrong. I modified the MpInitLib C code to print the value of the field
>     right after the assignment, and I got:
>
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>> CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @ 0xFFFFFFFFBFF28860
>
>     The APs are ejected into outer space when they try to init their FPUs
>     -- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should
>     store 0x00000000_BFF28860!
>
> (5) So open the Intel SDM, and decode the instruction by hand:
>
>>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx
>
>     0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W).
>     Looking up REX.W + 0xc7 (the next byte) under the MOV specification,
>     we get:
>     - opcode: REX.W + C7 /0
>     - instruction: MOV r/m64, imm32
>     - Description: Move imm32 sign extended to 64-bits to r/m64.
>
>     "Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860
>     -- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the
>     sign bit is set. And MOV sign-extends 0xBFF28860 to
>     0xFFFFFFFF_BFF28860 in the assignment to RDX.
>
>     So... why on earth generates gcc this MOV instruction, with sign
>     extension?
>
> (6) It does because we told it to. From the gcc manual:
>
>> -mcmodel=small
>>     Generate code for the small code model: the program and its symbols
>>     must be linked in the lower 2 GB of the address space.  Pointers are
>>     64 bits.  Programs can be statically or dynamically linked.  This is
>>     the default code model.
>
>     and please refer to the following commit (message slightly rewrapped
>     here):
>
>> commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0
>> Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Date:   Sat Jul 16 00:16:11 2016 +0200
>>
>>     BaseTools/tools_def: switch GCC/X64 to the PIE small model
>>
>>     The ordinary small code model for x86_64 cannot be used in UEFI,
>>     since it assumes the executable is loaded in the first 2 GB of
>>     memory. Therefore, we use the large model instead, which can execute
>>     anywhere, but uses absolute 64-bit wide quantities for all symbol
>>     references, which is costly in terms of code size.
>>
>>     So switch to the PIE small code model, this uses 32-bit relative
>>     references where possible, but does not make any assumptions about
>>     the load address (i.e., all absolute symbol references are 64-bits
>>     wide). Note that, due to the 'protected' visibility pragma
>>     introduced in an earlier patch, there is no need for the EDK2 build
>>     system to deal with GOT related ELF relocation types.
>>
>>     Contributed-under: TianoCore Contribution Agreement 1.0
>>     Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>     Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
>>     Tested-by: Laszlo Ersek <lersek@redhat.com>
>>     Tested-By: Liming Gao <liming.gao@intel.com>
>>     Reviewed-by: Liming Gao <liming.gao@intel.com>
>
>     Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make
>     assumptions about the load address, despite the fact that we didn't
>     just replace "-mcmodel=large" with "-mcmodel=small", but we also added
>     "-fpie".
>
> (7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do
>     not revert it whole-sale. Instead, undo it only for the LTO build env
>     where it might cause problems, by appending "-fno-pie -mcmodel=large".
>
>     The assignment is then compiled like this:
>
>>     33d7:       48 ba 60 64 00 00 00    movabs $0x6460,%rdx
>>     33de:       00 00 00
>>     ...
>>     33e5:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>>     ...
>> 0000000000006460 <InitializeFloatingPointUnits>:
>
>     And the matching relocation is (note the target offset
>     0x240+0x3199=0x33d9):
>
>> Sections:
>> Idx Name          Size      VMA               LMA               File off  Algn
>>   0 .text         00007c5e  0000000000000240  0000000000000240  000000c0  2**6
>>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>> ...
>> RELOCATION RECORDS FOR [.text]:
>> OFFSET           TYPE              VALUE
>> ...
>> 0000000000003199 R_X86_64_64       InitializeFloatingPointUnits
>
> (8) Size changes by this patch:
>
>     $ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \
>         -t GCC5 -b DEBUG \
>         -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE
>
>       Before:
>
>> SECFV [10%Full] 212992 total, 21616 used, 191376 free
>> FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free
>> DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free
>> PEIFV [21%Full] 917504 total, 194480 used, 723024 free
>
>       After:
>
>> SECFV [11%Full] 212992 total, 23728 used, 189264 free
>> FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free
>> DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free
>> PEIFV [24%Full] 917504 total, 224304 used, 693200 free
>
>     $ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \
>         -t GCC5 -b DEBUG \
>         -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \
>         -D SECURE_BOOT_ENABLE -D SMM_REQUIRE
>
>       Before:
>
>> SECFV [10%Full] 212992 total, 21536 used, 191456 free
>> FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free
>> DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free
>> PEIFV [22%Full] 917504 total, 202032 used, 715472 free
>
>       After:
>
>> SECFV [10%Full] 212992 total, 21536 used, 191456 free
>> FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free
>> DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free
>> PEIFV [22%Full] 917504 total, 202032 used, 715472 free
>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Michael Kinney <michael.d.kinney@intel.com>
> Cc: Yonghong Zhu <yonghong.zhu@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  BaseTools/Conf/tools_def.template | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
> index 1fa3ca3ceae9..2ef495540e5f 100755
> --- a/BaseTools/Conf/tools_def.template
> +++ b/BaseTools/Conf/tools_def.template
> @@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS    = DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl,
>  *_GCC5_X64_OBJCOPY_FLAGS         =
>  *_GCC5_X64_NASM_FLAGS            = -f elf64
>
> -  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os
> +  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os
>    DEBUG_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
>
> -RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
> +RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
>  RELEASE_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
>
>    NOOPT_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -O0
> --
> 2.13.1.3.g8be5a757fa67
>


Thanks for the interesting read. One question: is all code potentially
miscompiled? Or does it only affect code executing in real mode?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-11 10:03   ` Ard Biesheuvel
@ 2017-08-11 10:30     ` Laszlo Ersek
  0 siblings, 0 replies; 17+ messages in thread
From: Laszlo Ersek @ 2017-08-11 10:30 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: edk2-devel-01, Alex Williamson, Jordan Justen, Liming Gao,
	Michael Kinney, Yonghong Zhu

On 08/11/17 12:03, Ard Biesheuvel wrote:
> On 11 August 2017 at 01:34, Laszlo Ersek <lersek@redhat.com> wrote:

>> (2) Unfortunately, the C-language assignment to
>>     "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is
>>     miscompiled under the following circumstances:
>>
>>     - the edk2 build target is DEBUG (or RELEASE),
>>     - the edk2 toolchain selection is GCC5,
>>     - (from the above two it follows that -Os and LTO are enabled,)
>>     - MpInitLib is built for X64,
>>     - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).
>>
>>     (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the
>>     PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks
>>     in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)
>>
>>     Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show
>>     the correct assembly code generated by gcc for the assignment, and
>>     flip the build target to DEBUG for showing the broken assembly code.
>>
>> (3) Correct code for the assignment:
>>
>>>     5406:       48 8d 15 73 24 00 00    lea    0x2473(%rip),%rdx        # 7880 <InitializeFloatingPointUnits>
>>>     540d:       48 8b 45 f8             mov    -0x8(%rbp),%rax
>>>     5411:       48 89 90 84 00 00 00    mov    %rdx,0x84(%rax)
>>
>>     Here the absolute address of InitializeFloatingPointUnits() is loaded
>>     into RDX with RIP-relative addressing. Then RAX is pointed to the
>>     ExchangeInfo structure, and finally RDX is stored into
>>     ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is
>>     0x84 in the structure).
>>
>>     Here the only relocation happens when CpuMpPei is linked. The distance
>>     between the call site and the callee is calculated and encoded in the
>>     LEA instruction. The same distance is valid when CpuMpPei runs, so the
>>     PEI core doesn't have to relocate the LEA instruction when it loads
>>     CpuMpPei.
>>
>> (4) Incorrect code for the assignment:
>>
>>>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx
>>>     ...
>>>     2d7b:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>>>     ...
>>> 0000000000005860 <InitializeFloatingPointUnits>:
>>>     5860:       9b db e3                finit
>>
>>     This is not RIP-relative addressing. Therefore gcc generates a
>>     relocation record for the address encoded in the MOV instruction, at
>>     offset 0x2d69+0x3 -- initial value being 0x5860:
>>
>>> Sections:
>>> Idx Name          Size      VMA               LMA               File off  Algn
>>>   0 .text         00006e66  0000000000000240  0000000000000240  000000c0  2**6
>>>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
>>> ...
>>> RELOCATION RECORDS FOR [.text]:
>>> OFFSET           TYPE              VALUE
>>> ...
>>> 0000000000002b2c R_X86_64_32S      InitializeFloatingPointUnits
>>
>>     (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which
>>     equals the .text section base plus the relocation offset, i.e.,
>>     0x240+0x2b2c.)
>>
>>     This is the only R_X86_64_32S relocation record in the linked-together
>>     CpuMpPei.debug file (still an ELF file). In the rest of the build
>>     process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image
>>     (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF
>>     relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).

> Thanks for the interesting read. One question: is all code potentially
> miscompiled? Or does it only affect code executing in real mode?

The code that is miscompiled is a normal assignment, in normal C code.
The structure type itself is packed [1], and the pointer to access it is
volatile-qualified [2], so those could be contributing factors.

[1] UefiCpuPkg/Library/MpInitLib/MpLib.h

#pragma pack(1)

//
// MP CPU exchange information for AP reset code
// This structure is required to be packed because fixed field offsets
// into this structure are used in assembly code in this module
//
typedef struct {
  UINTN                 Lock;
  UINTN                 StackStart;
  UINTN                 StackSize;
  UINTN                 CFunction;
  IA32_DESCRIPTOR       GdtrProfile;
  IA32_DESCRIPTOR       IdtrProfile;
  UINTN                 BufferStart;
  UINTN                 ModeOffset;
  UINTN                 NumApsExecuting;
  UINTN                 CodeSegment;
  UINTN                 DataSegment;
  UINTN                 EnableExecuteDisable;
  UINTN                 Cr3;
  UINTN                 InitFlag;
  CPU_INFO_IN_HOB       *CpuInfo;
  CPU_MP_DATA           *CpuMpData;
  UINTN                 InitializeFloatingPointUnitsAddress;
} MP_CPU_EXCHANGE_INFO;

#pragma pack()

[2] UefiCpuPkg/Library/MpInitLib/MpLib.c

VOID
FillExchangeInfoData (
  IN CPU_MP_DATA               *CpuMpData
  )
{
  volatile MP_CPU_EXCHANGE_INFO    *ExchangeInfo;
...
  ExchangeInfo->InitializeFloatingPointUnitsAddress =
(UINTN)InitializeFloatingPointUnits;


The only reason the miscompilation affects real mode is that the
assignment populates an "exchange area field" that is consumed by the
application processors from real mode. To be clear, the issue happens on
the population (assignment) side, in normal C code, not on the
consumption side -- the latter is only the victim.

I think *technically* speaking gcc is entitled to generate such an
instruction, because with -mcmodel=small, we promise it that we're never
going to run the binary above 2GB in the address space. And, we do break
that promise, when the PEI core loads the PEIM just below 3GB. What's
interesting is that adding -fpie on top of -mcmodel=small, safety could
be restored (as explained in your commit message) under all gcc's prior
to gcc-7. For some reason, "-fpie" is not enough with gcc-7 + LTO, and
it really takes us up on our "-mcmodel=small" word (about not loading
executables above 2GB).

So, this is a behavioral change in gcc-7, but I'm unsure if we should
call it "gcc bug" or "build flag misconfiguration".

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO
  2017-08-11  0:34 ` [PATCH 1/1] " Laszlo Ersek
  2017-08-11  5:28   ` Shi, Steven
  2017-08-11 10:03   ` Ard Biesheuvel
@ 2017-08-11 22:21   ` Alex Williamson
  2 siblings, 0 replies; 17+ messages in thread
From: Alex Williamson @ 2017-08-11 22:21 UTC (permalink / raw)
  To: Laszlo Ersek, Yonghong Zhu
  Cc: edk2-devel-01, Ard Biesheuvel, Jordan Justen, Liming Gao,
	Michael Kinney

On Fri, 11 Aug 2017 02:34:26 +0200
Laszlo Ersek <lersek@redhat.com> wrote:
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Michael Kinney <michael.d.kinney@intel.com>
> Cc: Yonghong Zhu <yonghong.zhu@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  BaseTools/Conf/tools_def.template | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
> index 1fa3ca3ceae9..2ef495540e5f 100755
> --- a/BaseTools/Conf/tools_def.template
> +++ b/BaseTools/Conf/tools_def.template
> @@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS    = DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl,
>  *_GCC5_X64_OBJCOPY_FLAGS         =
>  *_GCC5_X64_NASM_FLAGS            = -f elf64
>  
> -  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os
> +  DEBUG_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os
>    DEBUG_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
>  
> -RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
> +RELEASE_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable
>  RELEASE_GCC5_X64_DLINK_FLAGS     = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os
>  
>    NOOPT_GCC5_X64_CC_FLAGS        = DEF(GCC5_X64_CC_FLAGS) -O0

VM boots again, Thanks Laszlo!

Tested-by: Alex Williamson <alex.williamson@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-08-22 16:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-11  0:34 [PATCH 0/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO Laszlo Ersek
2017-08-11  0:34 ` [PATCH 1/1] " Laszlo Ersek
2017-08-11  5:28   ` Shi, Steven
2017-08-11 11:18     ` Laszlo Ersek
2017-08-12  3:05       ` Shi, Steven
2017-08-15 15:45         ` Laszlo Ersek
2017-08-22  8:00           ` Shi, Steven
2017-08-22 11:59             ` Laszlo Ersek
2017-08-22 12:23               ` Gao, Liming
2017-08-22 13:27               ` Paolo Bonzini
2017-08-22 14:03                 ` Ard Biesheuvel
2017-08-22 14:15                   ` Paolo Bonzini
2017-08-22 16:04                     ` Laszlo Ersek
2017-08-22 16:06                       ` Paolo Bonzini
2017-08-11 10:03   ` Ard Biesheuvel
2017-08-11 10:30     ` Laszlo Ersek
2017-08-11 22:21   ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox