From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-x22c.google.com (mail-io0-x22c.google.com [IPv6:2607:f8b0:4001:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7840D21D490F5 for ; Fri, 11 Aug 2017 03:00:51 -0700 (PDT) Received: by mail-io0-x22c.google.com with SMTP id m88so18406260iod.2 for ; Fri, 11 Aug 2017 03:03:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=iKbGG/63lJVJzyGujgMzZsopnlOlkI4QFJ42N/swvK0=; b=RL/rJsujbtyrhvzct2Yyo7z86vGs/fbJoDfgHw6brAn7jDynVKhDz8iZEv6P86KKPq G44Q0s+JEEpXFN/dhakLWpAmVuK9BRlTjOhwJWgHW0eEky1D8S8kQKGDmOmb9vd2yVCh AgPVoyHz2RoxTA41btlLPYAEhc7D2AnoF1/Lw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=iKbGG/63lJVJzyGujgMzZsopnlOlkI4QFJ42N/swvK0=; b=V7Y+OaPOAp5Q1QH5Tgu6StfXc7EykP6a5D90cEk2AmDnEt4RiQxYQceAu5YxRFBs50 BRZ77ffBQw3ZLnnoqezApJm/fP3DSnB91fDwBoAkkgkaQGR/Fpi6JiO+Ggga3VacGjFG KVBzMmF/PwTVW568IKWmat/+nega2caRr4V1yeayEwbBDFJvtkHlY8TcPTUV8bZFi/oo vx77l/h8rikcZm5DN+UrJq3mGEElkLuel5+8RWics971908E2DN4Z0Hpaub6n8R7AdQe D/Xni6p7VlFCJ+fF/1VfSDReEd/DTkzwewGt10a8LpM91HLvvYFwtKy8IHJlQtifieC8 qlFw== X-Gm-Message-State: AIVw111L2Of0H9W6JSfzOh5R4NrrHUfHmK5WlxvCGkanGVEy57NBa1uo v27XdvmJ7XaHRn3F1zu/g9hQ7AIxuWTW X-Received: by 10.107.180.65 with SMTP id d62mr13617914iof.291.1502445790684; Fri, 11 Aug 2017 03:03:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.162.1 with HTTP; Fri, 11 Aug 2017 03:03:10 -0700 (PDT) In-Reply-To: <20170811003426.2332-2-lersek@redhat.com> References: <20170811003426.2332-1-lersek@redhat.com> <20170811003426.2332-2-lersek@redhat.com> From: Ard Biesheuvel Date: Fri, 11 Aug 2017 11:03:10 +0100 Message-ID: To: Laszlo Ersek Cc: edk2-devel-01 , Alex Williamson , Jordan Justen , Liming Gao , Michael Kinney , Yonghong Zhu Subject: Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Aug 2017 10:00:51 -0000 Content-Type: text/plain; charset="UTF-8" On 11 August 2017 at 01:34, Laszlo Ersek wrote: > Several users have recently reported boot failures with OVMF. The symptoms > are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found > the commit (via bisection) that exposes the issue. In this patch I'll > attempt to analyze the symptoms and fix the root problem. > > (1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that > provides multiprocessing services. It is built into the CpuMpPei > module (for the PEI phase) and the CpuDxe module (for the DXE and BDS > phases). When either of these modules starts up during boot, it > briefly boots up all of the CPUs, perfoms some counting / > synchronization, and then all the APs (application processors) are > quiesced until further work arrives. > > The APs boot in real mode, and need assembly language init code (a > "reset vector") that gradually takes them into 64-bit mode, before > they can call back into normal C code. The reset vector code is > assembled at build time, but it is copied as data under 1MB (for real > mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe > launches. > > Part of the reset vector code is FPU initialization. The reset vector > in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly > language function InitializeFloatingPointUnits(), which is however > provided by an external library. Normally, when the object files were > linked together, this function call would result in an absolute > reference (and matching relocation, performed at module loading), and > the instance of the reset vector that was copied under 1MB during boot > would refer to the same, unchanged, absolute address of > InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe. > > This didn't work with X64 XCODE5 builds however. XCODE5 prefers > RIP-relative addressing for X64 (i.e., the assembly language function > call depended on the distance between the call site and the callee). > When the call site was copied under 1MB but > InitializeFloatingPointUnits() would not move, the call broke. > > This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987 > ("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues", > 2017-05-17). The solution was to add another function pointer to the > MP_CPU_EXCHANGE_INFO communication area. (The function pointer would > have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would > assign the absolute address of InitializeFloatingPointUnits() to the > new field, and the APs would retrieve it -- matching the pointer size > correctly --, and call it. The C assignment can be seen in the > following hunk (from commit 3b2928b46987): > >> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c >> index 407c44c95e62..735e099b32f2 100644 >> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c >> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c >> @@ -751,6 +751,8 @@ FillExchangeInfoData ( >> >> ExchangeInfo->EnableExecuteDisable = IsBspExecuteDisableEnabled (); >> >> + ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits; >> + >> // >> // Get the BSP's data of GDT and IDT >> // > > This moved the relocation of the InitializeFloatingPointUnits() > reference (or, with XCODE5, the RIP-relative > InitializeFloatingPointUnits() reference) from the assembly code > (which is copied around) to the C code (which is not copied around). > > (2) Unfortunately, the C-language assignment to > "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is > miscompiled under the following circumstances: > > - the edk2 build target is DEBUG (or RELEASE), > - the edk2 toolchain selection is GCC5, > - (from the above two it follows that -Os and LTO are enabled,) > - MpInitLib is built for X64, > - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.). > > (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the > PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks > in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.) > > Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show > the correct assembly code generated by gcc for the assignment, and > flip the build target to DEBUG for showing the broken assembly code. > > (3) Correct code for the assignment: > >> 5406: 48 8d 15 73 24 00 00 lea 0x2473(%rip),%rdx # 7880 >> 540d: 48 8b 45 f8 mov -0x8(%rbp),%rax >> 5411: 48 89 90 84 00 00 00 mov %rdx,0x84(%rax) > > Here the absolute address of InitializeFloatingPointUnits() is loaded > into RDX with RIP-relative addressing. Then RAX is pointed to the > ExchangeInfo structure, and finally RDX is stored into > ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is > 0x84 in the structure). > > Here the only relocation happens when CpuMpPei is linked. The distance > between the call site and the callee is calculated and encoded in the > LEA instruction. The same distance is valid when CpuMpPei runs, so the > PEI core doesn't have to relocate the LEA instruction when it loads > CpuMpPei. > > (4) Incorrect code for the assignment: > >> 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx >> ... >> 2d7b: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp) >> ... >> 0000000000005860 : >> 5860: 9b db e3 finit > > This is not RIP-relative addressing. Therefore gcc generates a > relocation record for the address encoded in the MOV instruction, at > offset 0x2d69+0x3 -- initial value being 0x5860: > >> Sections: >> Idx Name Size VMA LMA File off Algn >> 0 .text 00006e66 0000000000000240 0000000000000240 000000c0 2**6 >> CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE >> ... >> RELOCATION RECORDS FOR [.text]: >> OFFSET TYPE VALUE >> ... >> 0000000000002b2c R_X86_64_32S InitializeFloatingPointUnits > > (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which > equals the .text section base plus the relocation offset, i.e., > 0x240+0x2b2c.) > > This is the only R_X86_64_32S relocation record in the linked-together > CpuMpPei.debug file (still an ELF file). In the rest of the build > process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image > (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF > relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW). > > The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon first > when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi > could theoretically execute from flash, and PEIFV has a nonzero base > address. (However, CpuMpPei has a DEPEX on permanent PEI RAM > discovery, so the PEI core will only launch it from permanent PEI RAM, > in practice.) > > The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for > the second time when the PEI core loads CpuMpPei into permanent PEI > RAM, and performs the relocations. I confirmed (with debug messages) > that the relocated address (within the instruction) is correct: > >> PeiCore:PeCoffLoaderRelocateImage: Fixup32=BFF25D6C FixupData=0 >> PeiCore:PeCoffLoaderRelocateImage: *Fixup32=0x8487A0 Adjust=0xBF6E00C0 >> Loading PEIM at 0x000BFF23000 EntryPoint=0x000BFF271BB CpuMpPei.efi > > The "Fixup32" pointer points to the address constant that should be > relocated (0xBFF25D6C - 0xBFF23000 == 0x2D6C). The initial value of > (*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that > I described above (as "first relocation"). > > After incrementing (*Fixup32) by "Adjust", the final value is > 0xBFF28860. This is the absolute address that is patched into the MOV > instruction that sets RDX for the assignment. This value is correct; > the InitializeFloatingPointUnits() function really exists in CpuMpPei > at this address, in permanent PEI RAM. It is at an offset of 0x5860 > from the load address 0x000BFF23000 (see third line). > > So where do things go wrong? Again, the MOV instruction itself is > wrong. I modified the MpInitLib C code to print the value of the field > right after the assignment, and I got: > >> AP Loop Mode is 1 >> WakeupBufferStart = 9F000, WakeupBufferSize = 1000 >> CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @ 0xFFFFFFFFBFF28860 > > The APs are ejected into outer space when they try to init their FPUs > -- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should > store 0x00000000_BFF28860! > > (5) So open the Intel SDM, and decode the instruction by hand: > >> 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx > > 0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W). > Looking up REX.W + 0xc7 (the next byte) under the MOV specification, > we get: > - opcode: REX.W + C7 /0 > - instruction: MOV r/m64, imm32 > - Description: Move imm32 sign extended to 64-bits to r/m64. > > "Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860 > -- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the > sign bit is set. And MOV sign-extends 0xBFF28860 to > 0xFFFFFFFF_BFF28860 in the assignment to RDX. > > So... why on earth generates gcc this MOV instruction, with sign > extension? > > (6) It does because we told it to. From the gcc manual: > >> -mcmodel=small >> Generate code for the small code model: the program and its symbols >> must be linked in the lower 2 GB of the address space. Pointers are >> 64 bits. Programs can be statically or dynamically linked. This is >> the default code model. > > and please refer to the following commit (message slightly rewrapped > here): > >> commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0 >> Author: Ard Biesheuvel >> Date: Sat Jul 16 00:16:11 2016 +0200 >> >> BaseTools/tools_def: switch GCC/X64 to the PIE small model >> >> The ordinary small code model for x86_64 cannot be used in UEFI, >> since it assumes the executable is loaded in the first 2 GB of >> memory. Therefore, we use the large model instead, which can execute >> anywhere, but uses absolute 64-bit wide quantities for all symbol >> references, which is costly in terms of code size. >> >> So switch to the PIE small code model, this uses 32-bit relative >> references where possible, but does not make any assumptions about >> the load address (i.e., all absolute symbol references are 64-bits >> wide). Note that, due to the 'protected' visibility pragma >> introduced in an earlier patch, there is no need for the EDK2 build >> system to deal with GOT related ELF relocation types. >> >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Ard Biesheuvel >> Reviewed-by: Jordan Justen >> Tested-by: Laszlo Ersek >> Tested-By: Liming Gao >> Reviewed-by: Liming Gao > > Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make > assumptions about the load address, despite the fact that we didn't > just replace "-mcmodel=large" with "-mcmodel=small", but we also added > "-fpie". > > (7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do > not revert it whole-sale. Instead, undo it only for the LTO build env > where it might cause problems, by appending "-fno-pie -mcmodel=large". > > The assignment is then compiled like this: > >> 33d7: 48 ba 60 64 00 00 00 movabs $0x6460,%rdx >> 33de: 00 00 00 >> ... >> 33e5: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp) >> ... >> 0000000000006460 : > > And the matching relocation is (note the target offset > 0x240+0x3199=0x33d9): > >> Sections: >> Idx Name Size VMA LMA File off Algn >> 0 .text 00007c5e 0000000000000240 0000000000000240 000000c0 2**6 >> CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE >> ... >> RELOCATION RECORDS FOR [.text]: >> OFFSET TYPE VALUE >> ... >> 0000000000003199 R_X86_64_64 InitializeFloatingPointUnits > > (8) Size changes by this patch: > > $ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \ > -t GCC5 -b DEBUG \ > -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE > > Before: > >> SECFV [10%Full] 212992 total, 21616 used, 191376 free >> FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free >> DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free >> PEIFV [21%Full] 917504 total, 194480 used, 723024 free > > After: > >> SECFV [11%Full] 212992 total, 23728 used, 189264 free >> FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free >> DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free >> PEIFV [24%Full] 917504 total, 224304 used, 693200 free > > $ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \ > -t GCC5 -b DEBUG \ > -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \ > -D SECURE_BOOT_ENABLE -D SMM_REQUIRE > > Before: > >> SECFV [10%Full] 212992 total, 21536 used, 191456 free >> FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free >> DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free >> PEIFV [22%Full] 917504 total, 202032 used, 715472 free > > After: > >> SECFV [10%Full] 212992 total, 21536 used, 191456 free >> FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free >> DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free >> PEIFV [22%Full] 917504 total, 202032 used, 715472 free > > Cc: Alex Williamson > Cc: Ard Biesheuvel > Cc: Jordan Justen > Cc: Liming Gao > Cc: Michael Kinney > Cc: Yonghong Zhu > Contributed-under: TianoCore Contribution Agreement 1.1 > Signed-off-by: Laszlo Ersek > --- > BaseTools/Conf/tools_def.template | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template > index 1fa3ca3ceae9..2ef495540e5f 100755 > --- a/BaseTools/Conf/tools_def.template > +++ b/BaseTools/Conf/tools_def.template > @@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS = DEF(GCC5_IA32_X64_DLINK_FLAGS) -flto -Os -Wl, > *_GCC5_X64_OBJCOPY_FLAGS = > *_GCC5_X64_NASM_FLAGS = -f elf64 > > - DEBUG_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os > + DEBUG_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os > DEBUG_GCC5_X64_DLINK_FLAGS = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os > > -RELEASE_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_LTO -Os -Wno-unused-but-set-variable > +RELEASE_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmodel=large -flto -DUSING_LTO -Os -Wno-unused-but-set-variable > RELEASE_GCC5_X64_DLINK_FLAGS = DEF(GCC5_X64_DLINK_FLAGS) -flto -Os > > NOOPT_GCC5_X64_CC_FLAGS = DEF(GCC5_X64_CC_FLAGS) -O0 > -- > 2.13.1.3.g8be5a757fa67 > Thanks for the interesting read. One question: is all code potentially miscompiled? Or does it only affect code executing in real mode?