From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 009D821D490EA for ; Fri, 11 Aug 2017 03:28:05 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8BAC42FE71B; Fri, 11 Aug 2017 10:30:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8BAC42FE71B Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lersek@redhat.com Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-90.phx2.redhat.com [10.3.116.90]) by smtp.corp.redhat.com (Postfix) with ESMTP id F07BC78DF3; Fri, 11 Aug 2017 10:30:21 +0000 (UTC) To: Ard Biesheuvel Cc: edk2-devel-01 , Alex Williamson , Jordan Justen , Liming Gao , Michael Kinney , Yonghong Zhu References: <20170811003426.2332-1-lersek@redhat.com> <20170811003426.2332-2-lersek@redhat.com> From: Laszlo Ersek Message-ID: <402a0745-29a7-2087-72cc-0d5e93c158c2@redhat.com> Date: Fri, 11 Aug 2017 12:30:20 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 11 Aug 2017 10:30:25 +0000 (UTC) Subject: Re: [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Aug 2017 10:28:06 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 08/11/17 12:03, Ard Biesheuvel wrote: > On 11 August 2017 at 01:34, Laszlo Ersek wrote: >> (2) Unfortunately, the C-language assignment to >> "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is >> miscompiled under the following circumstances: >> >> - the edk2 build target is DEBUG (or RELEASE), >> - the edk2 toolchain selection is GCC5, >> - (from the above two it follows that -Os and LTO are enabled,) >> - MpInitLib is built for X64, >> - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.). >> >> (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the >> PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks >> in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.) >> >> Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show >> the correct assembly code generated by gcc for the assignment, and >> flip the build target to DEBUG for showing the broken assembly code. >> >> (3) Correct code for the assignment: >> >>> 5406: 48 8d 15 73 24 00 00 lea 0x2473(%rip),%rdx # 7880 >>> 540d: 48 8b 45 f8 mov -0x8(%rbp),%rax >>> 5411: 48 89 90 84 00 00 00 mov %rdx,0x84(%rax) >> >> Here the absolute address of InitializeFloatingPointUnits() is loaded >> into RDX with RIP-relative addressing. Then RAX is pointed to the >> ExchangeInfo structure, and finally RDX is stored into >> ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is >> 0x84 in the structure). >> >> Here the only relocation happens when CpuMpPei is linked. The distance >> between the call site and the callee is calculated and encoded in the >> LEA instruction. The same distance is valid when CpuMpPei runs, so the >> PEI core doesn't have to relocate the LEA instruction when it loads >> CpuMpPei. >> >> (4) Incorrect code for the assignment: >> >>> 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx >>> ... >>> 2d7b: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp) >>> ... >>> 0000000000005860 : >>> 5860: 9b db e3 finit >> >> This is not RIP-relative addressing. Therefore gcc generates a >> relocation record for the address encoded in the MOV instruction, at >> offset 0x2d69+0x3 -- initial value being 0x5860: >> >>> Sections: >>> Idx Name Size VMA LMA File off Algn >>> 0 .text 00006e66 0000000000000240 0000000000000240 000000c0 2**6 >>> CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE >>> ... >>> RELOCATION RECORDS FOR [.text]: >>> OFFSET TYPE VALUE >>> ... >>> 0000000000002b2c R_X86_64_32S InitializeFloatingPointUnits >> >> (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which >> equals the .text section base plus the relocation offset, i.e., >> 0x240+0x2b2c.) >> >> This is the only R_X86_64_32S relocation record in the linked-together >> CpuMpPei.debug file (still an ELF file). In the rest of the build >> process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image >> (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF >> relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW). > Thanks for the interesting read. One question: is all code potentially > miscompiled? Or does it only affect code executing in real mode? The code that is miscompiled is a normal assignment, in normal C code. The structure type itself is packed [1], and the pointer to access it is volatile-qualified [2], so those could be contributing factors. [1] UefiCpuPkg/Library/MpInitLib/MpLib.h #pragma pack(1) // // MP CPU exchange information for AP reset code // This structure is required to be packed because fixed field offsets // into this structure are used in assembly code in this module // typedef struct { UINTN Lock; UINTN StackStart; UINTN StackSize; UINTN CFunction; IA32_DESCRIPTOR GdtrProfile; IA32_DESCRIPTOR IdtrProfile; UINTN BufferStart; UINTN ModeOffset; UINTN NumApsExecuting; UINTN CodeSegment; UINTN DataSegment; UINTN EnableExecuteDisable; UINTN Cr3; UINTN InitFlag; CPU_INFO_IN_HOB *CpuInfo; CPU_MP_DATA *CpuMpData; UINTN InitializeFloatingPointUnitsAddress; } MP_CPU_EXCHANGE_INFO; #pragma pack() [2] UefiCpuPkg/Library/MpInitLib/MpLib.c VOID FillExchangeInfoData ( IN CPU_MP_DATA *CpuMpData ) { volatile MP_CPU_EXCHANGE_INFO *ExchangeInfo; ... ExchangeInfo->InitializeFloatingPointUnitsAddress = (UINTN)InitializeFloatingPointUnits; The only reason the miscompilation affects real mode is that the assignment populates an "exchange area field" that is consumed by the application processors from real mode. To be clear, the issue happens on the population (assignment) side, in normal C code, not on the consumption side -- the latter is only the victim. I think *technically* speaking gcc is entitled to generate such an instruction, because with -mcmodel=small, we promise it that we're never going to run the binary above 2GB in the address space. And, we do break that promise, when the PEI core loads the PEIM just below 3GB. What's interesting is that adding -fpie on top of -mcmodel=small, safety could be restored (as explained in your commit message) under all gcc's prior to gcc-7. For some reason, "-fpie" is not enough with gcc-7 + LTO, and it really takes us up on our "-mcmodel=small" word (about not loading executables above 2GB). So, this is a behavioral change in gcc-7, but I'm unsure if we should call it "gcc bug" or "build flag misconfiguration". Thanks Laszlo