From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=2607:f8b0:4001:c06::243; helo=mail-io0-x243.google.com; envelope-from=ard.biesheuvel@linaro.org; receiver=edk2-devel@lists.01.org Received: from mail-io0-x243.google.com (mail-io0-x243.google.com [IPv6:2607:f8b0:4001:c06::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 93AC4212909DE for ; Tue, 12 Jun 2018 11:58:16 -0700 (PDT) Received: by mail-io0-x243.google.com with SMTP id k3-v6so639670iog.3 for ; Tue, 12 Jun 2018 11:58:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=hR9VHJMMz8+qmCqP6OHkkSoPnd4ueJGwtqd0uuxpVZg=; b=XU7fZ4Pxfe9APdxrMd6UA7ythuROcTKPa9cYY3yLNGQ4Gq6VbQOzrgD4jejxlDg6YR Z6yDhv3MmOmaKUXrZ2VjpxfHYhN0YJ+sslkEuiLIQeFZMScj0uaP8eF/ikddvbwdRIVH ujQ4tYHMOBGFbYi4DZbFzjcz5BnBeBWBSU7iQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=hR9VHJMMz8+qmCqP6OHkkSoPnd4ueJGwtqd0uuxpVZg=; b=CgrptLh+wHST94vGQrS6PH8436p/zjOv/X61Z4VtXc3nI22qOeK9vptGXIrkDRP5GD vKHuae1UclbqEXHfgvPTxXUUUHsY2D+WFk+LL/ubFP+VlZGpvc4tfwtM3FqcY43KreKc O1SPkC51JUDcNLjW4DCyGq+T+JrZFPsxrGf3JfGOXZLpl8SQptPmUbXQ4s53qu3LZqJm MoeBJSStFj8/SwUrOfrdj50IJS9CilgdSt4v5HWf6gUSjIScaE4kUg+11DzAZhfkTOZc hSto+gZic1CWHetlyiHPPLzUU79OyWWfLZBPfC+VgCXyzVQY9RJ9LLLe8hDPHfmYY6r7 yEkQ== X-Gm-Message-State: APt69E0kh4PNRZlbjTSrEg3II57mh/d80N6p4XWoDK26ddDHj90Lvm3V CHpe80m7/j7Sd1y742pLLNFGW3Q5ZA0YUGEKfBhQPw== X-Google-Smtp-Source: ADUXVKL97ddoFxbPk5KMUfLyPYIAvmEL2JqrNmXdbc6aqmTM/9wWbjPuaA0k06ItN8z2PYti5rT3jakMRHieJFaNjQI= X-Received: by 2002:a6b:6709:: with SMTP id b9-v6mr1673624ioc.170.1528829895433; Tue, 12 Jun 2018 11:58:15 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:bbc7:0:0:0:0:0 with HTTP; Tue, 12 Jun 2018 11:58:14 -0700 (PDT) In-Reply-To: <7be25843-d115-2738-0970-93f05a172aff@redhat.com> References: <20180612152306.25998-1-ard.biesheuvel@linaro.org> <7be25843-d115-2738-0970-93f05a172aff@redhat.com> From: Ard Biesheuvel Date: Tue, 12 Jun 2018 20:58:14 +0200 Message-ID: To: Laszlo Ersek Cc: "edk2-devel@lists.01.org" , Michael D Kinney , Liming Gao , Ruiyu Ni , Hao Wu , Leif Lindholm , Jordan Justen , Andrew Fish , Star Zeng , Eric Dong , Zenith432 , "Shi, Steven" Subject: Re: [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jun 2018 18:58:16 -0000 Content-Type: text/plain; charset="UTF-8" On 12 June 2018 at 20:33, Laszlo Ersek wrote: > Some super-naive questions, which are supposed to educate me, and not to > question the series: > > On 06/12/18 17:22, Ard Biesheuvel wrote: >> The GCC toolchain uses PIE mode when building code for X64, because it >> is the most efficient in size: it uses relative references where >> possible, but still uses 64-bit quantities for absolute symbol >> references, > > Absolute symbol references such as? References to fixed (constant) > addresses? > I should have been clearer here: from the GCC man page (apologies for the whitespace soup) """ -mcmodel=small Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model. -mcmodel=kernel Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code. -mcmodel=medium Generate code for the medium model: the program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put into large data or BSS sections and can be located above 2GB. Programs can be statically or dynamically linked. -mcmodel=large Generate code for the large model. This model makes no assumptions about addresses and sizes of sections. """ Formerly, we used the large model because UEFI can load PE/COFF executables anywhere in the lower address space, not only in the first 2 GB. The small PIE model is the best fit for UEFI because it does not have this limitation, but [unlike the large model] only uses absolute references when necessary, and will use relative references when it can. (I.e., it assumes the program will fit in 4 GB of memory, which the large model does not) Absolute symbol references are things like statically initialized function pointer variables or other quantities whose value cannot be obtained programmatically at runtime using a relative reference. >> which is optimal for executables that need to be converted >> to PE/COFF using GenFw. > > Why is that approach optimal? As few relocations records are required as > possible? > Because GenFw translates ELF relocations into PE/COFF relocations, but only for the subset that requires fixing up at runtime. Relative references do not require such fixups, so a code model that minimizes the number of absolute relocations is therefore optimal. Note that absolute references typically require twice the space as well. >> Enabling PIE mode has a couple of side effects though, primarily caused >> by the fact that the primary application area of GCC is to build programs >> for userland. GCC will assume that ELF symbols should be preemptible (which >> makes sense for PIC but not for PIE, > > Why don't preemptible symbols make sense for PIE? > > For example, if a userspace program loads a plugin with dlopen(), and > the plugin (.so) uses helper functions from the main executable, then > the main executable has to be (well, had to be, earlier?) built with > "-rdynamic". Wouldn't this mean the main executable could both be PIE > and sensibly have preemptible symbols? > > (My apologies if I'm disturbingly ignorant about this and the question > doesn't even make sense.) > I mean that the symbols defined by the PIE executable [i.e., not shared library] can never be preempted. Only symbols in shared libraries can be preempted by the symbols in the main executable, not the other way around. >> but this simply seems to be the result >> of code being shared between the two modes), and it will attempt to keep >> absolute references close to each other so that dynamic relocations that >> trigger CoW for text pages have the smallest possible footprint. > > So... Given this behavior, why is it a problem for us? What are the bad > symptoms? What is currently broken? > The bad symptoms are that PIC code will use GOT entries for all symbol references, meaning that instead of a direct relative reference from the code, it will emit a relative reference to the GOT entry containing the absolute address of the symbol. This involves an additional memory reference, and it requires the GOT entry (which by definition contains an absolute address) to be fixed up at load time. What is broken [as reported by Zenith432] is that GCC in LTO mode may in some cases still emit GOT based relocations that GenFw currently cannot handle. If the address of a symbol is used in a calculation, or when the address of a symbol is taken but not dereferenced (but only passed to a function, for instance), GCC in -Os mode will optimize this into a GOTPCREL reference. Quoting from a private email from Zenith432 (who has already proposed GenFw changes to handle these relocations """ I figured out what's going on with LTO build in GCC5 that is compiled with -Os -flto -DUSING_LTO and does not use visibility #pragma. When compiling with LTO enabled, what happens is that all C source files are transformed during compilation stage to LTO intermediate bytecode (gimple in GCC). Then when static link (ld) takes place, all LTO intermediate bytecode is sent back to compiler code-generation backend to have machine code generated for it as if all the source code is one big C source file ("whole program optimization"). As a result of this, all the extern symbols become local symbols ! like file-level static. Because it's as if all the code is in one big source file. Since there is no dynamic linking, there are no more "extern", and all symbols are like file-level static and treated the same. This is why the LTO build stops emitting GOT loads for size-optimization purposes. GCC doesn't emit GOT loads for file-level static, and in LTO build they're all like that - so no GOT loads. But there is still something that fouls this up... If an extern symbol is defined in assembly source file. Because assembly source files don't participate in LTO. They are transformed by assembler into X64 machine code. During ld, any extern symbol that is defined in an assembly source file and declared and used by C source file is treated as before like external symbol. Which means code generator can go back to its practice of emitting GOT loads if they reduce code size. """ Instead of 'fixing' GenFw, I attempted to go back to the original changes Steven and I did for LTO, to try and remember why we could not use the GCC visibility #pragma when enabling LTO. That is the issue this series aims to fix (but it is an RFC, so comments welcome) -- Ard.