* Re: [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code [not found] <1971023844.4599916.1528877047633.ref@mail.yahoo.com> @ 2018-06-13 8:04 ` Zenith432 0 siblings, 0 replies; 6+ messages in thread From: Zenith432 @ 2018-06-13 8:04 UTC (permalink / raw) To: StevenShi; +Cc: edk2-devel@lists.01.org The size differences are negligible because the code generator only emits GOT loads in narrow circumstances that reduce code size - using pointer arithmetic on an address of an external symbol. - if loading an address of an external symbol as function a argument using a push instruction. Emitting GOT loads in these scenarios slightly reduces code size, but it forces the emission of the referenced GOT entry into the executable as well. If no GOT entries are referenced by the code, they are discarded by the linker gc-sections feature. When GOT entries are referenced from the code, they also get emitted. So it's a tradeoff - if many GOT loads reference the same symbol - size is reduced. If only one or two GOT loads reference a symbol, size may grow. The number of such cases is also very small due to the narrow circumstances of the optimization opportunities. In GCC49 toolchain with LTO off, there are no GOT loads today because of visibility pragma. If visibility pragma is suppressed, I counted 6 cases in MdeModulePkg when building OvmfPkgX64.dsc. This is what originally broke the build and caused the visibility pragma to be included. In GCC5, the circumstances of GOT-load emission is further narrowed by the LTO. It happens only when... - An external symbol is defined in assembly (so it remains external to LTO). - C code declares the external symbol and uses it in one of the narrow circumstances listed above where GOT loads reduce code size. That is what is demonstrated in the sample. There are no such cases in EDK2 code base so GCC5 build doesn't break. The size differences being negligible - the only reason this is an issue is that if a GOT load is emitted - it breaks the build since GenFw doesn't handle it. So one option is to just ignore this since it doesn't happen in today's codebase, but since it can happen - document what the workarounds are: - one workaround is to manually declare external symbols that cause GOT loads with __attribute__((visibility("hidden"))) - I've also found that using __attribute__((optimize("O2"))) on a function that emits GOT loads sometimes eliminates the GOT load. This is because the GOT load is only emitted to reduce code size, so if changing optimization to speed - the GOT load is no longer used. Another option is what is suggested by Ard Biesheuvael to arrange things so that all external symbols except module entry points are hidden. This resolves the problem for GCC5 LTO build in the closest way similar to the resolution for GCC49 non-LTO build. Another option is to add functionality to GenFw for handling the various X64 GOTPCREL emissions for the small number of cases that are expected to occur. However, I cannot guarantee that future changes in the compiler will not start emitting thousands of GOT loads and this goes unnoticed because GenFw is handling them silently. This is an undesirable scenario. -------------------------------------------- On Wed, 6/13/18, Shi, Steven <steven.shi@intel.com> wrote: ... Does the hidden visibility in LTO can improve the LTO build code size? Is there any other benefit? Steven Shi Intel\SSG\STO\UEFI Firmware ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1142041495.4269416.1528831046054.ref@mail.yahoo.com>]
* Re: [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code [not found] <1142041495.4269416.1528831046054.ref@mail.yahoo.com> @ 2018-06-12 19:17 ` Zenith432 0 siblings, 0 replies; 6+ messages in thread From: Zenith432 @ 2018-06-12 19:17 UTC (permalink / raw) To: Laszlo Ersek; +Cc: edk2-devel > Absolute symbol references such as? > References to fixed (constant) > addresses? Pointers stored in the .data section. For example, if you have an array of const char*. > Why is that approach optimal? As few > relocations records are required as > possible? small pic model is optimal for AMD64 executables or shared libraries that are < 2GB in size, but need to be relocatable to any address in the 64-bit address space. It generates the most compact code due to use of PC-relative jumps, calls and effective address calculations. Technically, the small model is potentially more compact, but the sysv AMD64 ABI requires small model programs to fit in the lowest 2GB of the address space. EFI binaries load in the lower 4GB but not necessarily lower 2GB. > Why don't preemptible symbols make > sense for PIE? > (My apologies if I'm disturbingly > ignorant about this and the question > doesn't even make sense.) They do of course. The small pie model is a GCC extension not documented in sysv AMD64 ABI and it has a wierd characteristic that it assumes all external symbols are reachable directly and not via the GOT (= are not subject to being dymanically linked to.) small pic model - formalized in sysv AMD64 ABI and mandates access to extern symbols via the GOT or PLT. small pie model - a GCC extension that permits the code generator to elide the GOT, but does not mandate that the code generator elide the GOT. Contrary to conventional wisdom - using the GOT can reduce code size when doing pointer arithmetic on the address of an external symbol, or pushing the address of an external symbol on the stack to be passed as a function argument. See my response here to Andrew Fish. https://lists.01.org/pipermail/edk2-devel/2018-June/025710.html As a result, GCC sometimes emits GOT loads for external symbols in the small pie model on AMD64. There is an attribute __attribute__((visibility("hidden"))) that can be attached to external symbol declarations and tell the code generator "do not assume this symbol has a GOT entry" - effectively eliminating GOT loads. The pragma mentioned by Ard Biesheuvel turns the attribute on wholesale to all symbols in sections of source files affected by it. > So... Given this behavior, why is it a > problem for us? What are the bad > symptoms? What is currently broken? Ard Biesheuvel CCed a lot of people that didn't get the private communication about this. As a continuation to the message above, I sent out an email detailing what happens in the GCC5 toolchain with LTO enabled and a standalone Shell App that demonstrates how today the GCC5 toolchain on X64 can still omit GOT loads into the ELF executable that are not handled by GenFw. Below is my email. The standalone test case can be downloaded from here http://www.mediafire.com/file/wkc6bcj17401f4c/GccGOTEmitter.zip/file ===== [quoted email] > I figured out what's going on with LTO build in GCC5 that is compiled with -Os -flto -DUSING_LTO and does not use visibility #pragma. > > When compiling with LTO enabled, what happens is that all C source files are transformed during compilation stage to LTO intermediate bytecode (gimple in GCC). > > Then when static link (ld) takes place, all LTO intermediate bytecode is sent back to compiler code-generation backend to have machine code generated for it as if all the source code is one big C source file ("whole program optimization"). > > As a result of this, all the extern symbols become local symbols ! like file-level static. Because it's as if all the code is in one big source file. Since there is no dynamic linking, there are no more "extern", and all symbols are like file-level static and treated the same. > > This is why the LTO build stops emitting GOT loads for size-optimization purposes. GCC doesn't emit GOT loads for file-level static, and in LTO build they're all like that - so no GOT loads. > > But there is still something that fouls this up... > > If an extern symbol is defined in assembly source file. > > Because assembly source files don't participate in LTO. They are transformed by assembler into X64 machine code. During ld, any extern symbol that is defined in an assembly source file and declared and used by C source file is treated as before like external symbol. Which means code generator can go back to its practice of emitting GOT loads if they reduce code size. > > I'm attaching a standalone example of this coded as a UEFI shell application. > > - Unpack it to edk2/GccGOTEmitter. > > - Add it to ShellPkg/ShellPkg.dsc so it can be built. > diff --git a/ShellPkg/ShellPkg.dsc b/ShellPkg/ShellPkg.dsc > --- a/ShellPkg/ShellPkg.dsc > +++ b/ShellPkg/ShellPkg.dsc > @@ -134,6 +134,7 @@ > <LibraryClasses> > PerformanceLib|MdeModulePkg/Library/DxeSmmPerformanceLib/DxeSmmPerformanceLib.inf > } > + GccGOTEmitter/GccGOTEmitter.inf > > [BuildOptions] > *_*_*_CC_FLAGS = -D DISABLE_NEW_DEPRECATED_INTERFACES > > - Build with > build -a X64 -b RELEASE -m GccGOTEmitter/GccGOTEmitter.inf -p ShellPkg/ShellPkg.dsc -t GCC5 > > - Result: > "GenFw" -e UEFI_APPLICATION -o /media/Dev/edk2/Build/Shell/RELEASE_GCC5/X64/GccGOTEmitter/GccGOTEmitter/DEBUG/GccGOTEmitter.efi /media/Dev/edk2/Build/Shell/RELEASE_GCC5/X64/GccGOTEmitter/GccGOTEmitter/DEBUG/GccGOTEmitter.dll make: *** [GNUmakefile:367: /media/Dev/edk2/Build/Shell/RELEASE_GCC5/X64/GccGOTEmitter/GccGOTEmitter/DEBUG/GccGOTEmitter.efi] Error 2 > GenFw: ERROR 3000: Invalid > /media/Dev/edk2/Build/Shell/RELEASE_GCC5/X64/GccGOTEmitter/GccGOTEmitter/DEBUG/GccGOTEmitter.dll unsupported ELF EM_X86_64 relocation 0x2a. > GenFw: ERROR 3000: Invalid > /media/Dev/edk2/Build/Shell/RELEASE_GCC5/X64/GccGOTEmitter/GccGOTEmitter/DEBUG/GccGOTEmitter.dll unsupported ELF EM_X86_64 relocation 0x2a. > > relocation 0x2a is R_X86_64_REX_GOTPCRELX which is emitted as part of addq instruction into the GOT in order to implement the pointer arithmetic with slightly smaller code. > > There are 2 possible resolutions to this. > - One is to add the X64 GOTPCREL support to GenFw. > - The other is to document somewhere that if > -- An external symbol is defined in assembly code. > -- The symbol is declared and used in C code. > -- The C code uses pointer arithmetic on the external symbol or passes it as a function argument. > -- Then the external symbol should be declared as "__attribute__((visibility("hidden")))" in the C code. > > Note that the 2nd resolution also works in the sample - if the attribute is put on ThunksBase declaration. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code @ 2018-06-12 15:22 Ard Biesheuvel 2018-06-12 18:33 ` Laszlo Ersek 2018-06-13 2:08 ` Shi, Steven 0 siblings, 2 replies; 6+ messages in thread From: Ard Biesheuvel @ 2018-06-12 15:22 UTC (permalink / raw) To: edk2-devel Cc: Ard Biesheuvel, Michael D Kinney, Liming Gao, Ruiyu Ni, Hao Wu, Leif Lindholm, Jordan Justen, Andrew Fish, Star Zeng, Eric Dong, Laszlo Ersek, Zenith432, Shi, Steven The GCC toolchain uses PIE mode when building code for X64, because it is the most efficient in size: it uses relative references where possible, but still uses 64-bit quantities for absolute symbol references, which is optimal for executables that need to be converted to PE/COFF using GenFw. Enabling PIE mode has a couple of side effects though, primarily caused by the fact that the primary application area of GCC is to build programs for userland. GCC will assume that ELF symbols should be preemptible (which makes sense for PIC but not for PIE, but this simply seems to be the result of code being shared between the two modes), and it will attempt to keep absolute references close to each other so that dynamic relocations that trigger CoW for text pages have the smallest possible footprint. These side effects can be mititgated by overriding the visibility of all symbol definitions *and* symbol references, using a special #pragma. This will inform the compiler that symbol preemption and dynamic relocations are not a concern, and that all symbol references can be emitted as direct relative references rather than relative references to a GOT entry containing the absolute address. Unsurprisingly, this leads to better and smaller code. Unfortunately, we have not been able to set this override when LTO is in effect, because the LTO code generator infers from the hidden visibility of all symbols that none of the code is reachable, and discards it all, leading to corrupt, empty binaries. We can work around this by overriding the visibility for symbols that are module entry points. So implement this for all occcurrences of the symbol '_ModuleEntryPoint', and enable 'hidden' visibility in LTO builds as well. Note that all the changes in this series resolve to no-ops if USING_LTO is not #defined. Code can be found here: https://github.com/ardbiesheuvel/edk2/tree/x64-lto-visibility Cc: Michael D Kinney <michael.d.kinney@intel.com> Cc: Liming Gao <liming.gao@intel.com> Cc: Ruiyu Ni <ruiyu.ni@intel.com> Cc: Hao Wu <hao.a.wu@intel.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Andrew Fish <afish@apple.com> Cc: Star Zeng <star.zeng@intel.com> Cc: Eric Dong <eric.dong@intel.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Zenith432 <zenith432@users.sourceforge.net> Cc: "Shi, Steven" <steven.shi@intel.com> Ard Biesheuvel (11): MdePkg/ProcessorBind.h: define macro to decorate module entry points DuetPkg: annotate module entry points with EFI_ENTRYPOINT EdkCompatibilityPkg: annotate module entry points with EFI_ENTRYPOINT EmbeddedPkg: annotate module entry points with EFI_ENTRYPOINT EmulatorPkg: annotate module entry points with EFI_ENTRYPOINT IntelFrameWorkPkg: annotate module entry points with EFI_ENTRYPOINT MdeModulePkg: annotate module entry points with EFI_ENTRYPOINT MdePkg: annotate module entry points with EFI_ENTRYPOINT Nt32Pkg: annotate module entry points with EFI_ENTRYPOINT UefiCpuPkg: annotate module entry points with EFI_ENTRYPOINT MdePkg/ProcessorBind.h X64: drop non-LTO limitation on visiblity override DuetPkg/DxeIpl/DxeInit.c | 1 + DuetPkg/EfiLdr/EfiLoader.c | 1 + .../EntryPoints/EdkIIGlueDxeDriverEntryPoint.c | 1 + .../EntryPoints/EdkIIGluePeimEntryPoint.c | 1 + .../EntryPoints/EdkIIGlueSmmDriverEntryPoint.c | 1 + .../Library/EdkIIGlueDxeSmmDriverEntryPoint.h | 1 + .../Include/Library/EdkIIGluePeimEntryPoint.h | 1 + .../Library/EdkIIGlueUefiDriverEntryPoint.h | 1 + EmbeddedPkg/TemplateSec/TemplateSec.c | 1 + EmulatorPkg/Sec/Sec.c | 1 + .../DxeSmmDriverEntryPoint/DriverEntryPoint.c | 1 + MdeModulePkg/Universal/CapsulePei/X64/X64Entry.c | 1 + MdePkg/Include/Base.h | 7 +++++++ MdePkg/Include/Library/DxeCoreEntryPoint.h | 1 + MdePkg/Include/Library/PeiCoreEntryPoint.h | 1 + MdePkg/Include/Library/PeimEntryPoint.h | 1 + .../Include/Library/UefiApplicationEntryPoint.h | 1 + MdePkg/Include/Library/UefiDriverEntryPoint.h | 1 + MdePkg/Include/X64/ProcessorBind.h | 16 +++++++++++----- .../DxeCoreEntryPoint/DxeCoreEntryPoint.c | 1 + .../PeiCoreEntryPoint/PeiCoreEntryPoint.c | 1 + MdePkg/Library/PeimEntryPoint/PeimEntryPoint.c | 1 + .../ApplicationEntryPoint.c | 1 + .../UefiDriverEntryPoint/DriverEntryPoint.c | 1 + Nt32Pkg/Sec/SecMain.c | 1 + .../PlatformSecLibNull/PlatformSecLibNull.c | 1 + 26 files changed, 42 insertions(+), 5 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code 2018-06-12 15:22 Ard Biesheuvel @ 2018-06-12 18:33 ` Laszlo Ersek 2018-06-12 18:58 ` Ard Biesheuvel 2018-06-13 2:08 ` Shi, Steven 1 sibling, 1 reply; 6+ messages in thread From: Laszlo Ersek @ 2018-06-12 18:33 UTC (permalink / raw) To: Ard Biesheuvel, edk2-devel Cc: Michael D Kinney, Liming Gao, Ruiyu Ni, Hao Wu, Leif Lindholm, Jordan Justen, Andrew Fish, Star Zeng, Eric Dong, Zenith432, Shi, Steven Some super-naive questions, which are supposed to educate me, and not to question the series: On 06/12/18 17:22, Ard Biesheuvel wrote: > The GCC toolchain uses PIE mode when building code for X64, because it > is the most efficient in size: it uses relative references where > possible, but still uses 64-bit quantities for absolute symbol > references, Absolute symbol references such as? References to fixed (constant) addresses? > which is optimal for executables that need to be converted > to PE/COFF using GenFw. Why is that approach optimal? As few relocations records are required as possible? > Enabling PIE mode has a couple of side effects though, primarily caused > by the fact that the primary application area of GCC is to build programs > for userland. GCC will assume that ELF symbols should be preemptible (which > makes sense for PIC but not for PIE, Why don't preemptible symbols make sense for PIE? For example, if a userspace program loads a plugin with dlopen(), and the plugin (.so) uses helper functions from the main executable, then the main executable has to be (well, had to be, earlier?) built with "-rdynamic". Wouldn't this mean the main executable could both be PIE and sensibly have preemptible symbols? (My apologies if I'm disturbingly ignorant about this and the question doesn't even make sense.) > but this simply seems to be the result > of code being shared between the two modes), and it will attempt to keep > absolute references close to each other so that dynamic relocations that > trigger CoW for text pages have the smallest possible footprint. So... Given this behavior, why is it a problem for us? What are the bad symptoms? What is currently broken? Sorry about my naivety here. Thanks, Laszlo > These side effects can be mititgated by overriding the visibility of all > symbol definitions *and* symbol references, using a special #pragma. This > will inform the compiler that symbol preemption and dynamic relocations > are not a concern, and that all symbol references can be emitted as direct > relative references rather than relative references to a GOT entry containing > the absolute address. Unsurprisingly, this leads to better and smaller code. > > Unfortunately, we have not been able to set this override when LTO is in > effect, because the LTO code generator infers from the hidden visibility > of all symbols that none of the code is reachable, and discards it all, > leading to corrupt, empty binaries. > > We can work around this by overriding the visibility for symbols that are > module entry points. So implement this for all occcurrences of the symbol > '_ModuleEntryPoint', and enable 'hidden' visibility in LTO builds as well. > > Note that all the changes in this series resolve to no-ops if USING_LTO > is not #defined. > > Code can be found here: > https://github.com/ardbiesheuvel/edk2/tree/x64-lto-visibility > > Cc: Michael D Kinney <michael.d.kinney@intel.com> > Cc: Liming Gao <liming.gao@intel.com> > Cc: Ruiyu Ni <ruiyu.ni@intel.com> > Cc: Hao Wu <hao.a.wu@intel.com> > Cc: Leif Lindholm <leif.lindholm@linaro.org> > Cc: Jordan Justen <jordan.l.justen@intel.com> > Cc: Andrew Fish <afish@apple.com> > Cc: Star Zeng <star.zeng@intel.com> > Cc: Eric Dong <eric.dong@intel.com> > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Zenith432 <zenith432@users.sourceforge.net> > Cc: "Shi, Steven" <steven.shi@intel.com> > > Ard Biesheuvel (11): > MdePkg/ProcessorBind.h: define macro to decorate module entry points > DuetPkg: annotate module entry points with EFI_ENTRYPOINT > EdkCompatibilityPkg: annotate module entry points with EFI_ENTRYPOINT > EmbeddedPkg: annotate module entry points with EFI_ENTRYPOINT > EmulatorPkg: annotate module entry points with EFI_ENTRYPOINT > IntelFrameWorkPkg: annotate module entry points with EFI_ENTRYPOINT > MdeModulePkg: annotate module entry points with EFI_ENTRYPOINT > MdePkg: annotate module entry points with EFI_ENTRYPOINT > Nt32Pkg: annotate module entry points with EFI_ENTRYPOINT > UefiCpuPkg: annotate module entry points with EFI_ENTRYPOINT > MdePkg/ProcessorBind.h X64: drop non-LTO limitation on visiblity > override > > DuetPkg/DxeIpl/DxeInit.c | 1 + > DuetPkg/EfiLdr/EfiLoader.c | 1 + > .../EntryPoints/EdkIIGlueDxeDriverEntryPoint.c | 1 + > .../EntryPoints/EdkIIGluePeimEntryPoint.c | 1 + > .../EntryPoints/EdkIIGlueSmmDriverEntryPoint.c | 1 + > .../Library/EdkIIGlueDxeSmmDriverEntryPoint.h | 1 + > .../Include/Library/EdkIIGluePeimEntryPoint.h | 1 + > .../Library/EdkIIGlueUefiDriverEntryPoint.h | 1 + > EmbeddedPkg/TemplateSec/TemplateSec.c | 1 + > EmulatorPkg/Sec/Sec.c | 1 + > .../DxeSmmDriverEntryPoint/DriverEntryPoint.c | 1 + > MdeModulePkg/Universal/CapsulePei/X64/X64Entry.c | 1 + > MdePkg/Include/Base.h | 7 +++++++ > MdePkg/Include/Library/DxeCoreEntryPoint.h | 1 + > MdePkg/Include/Library/PeiCoreEntryPoint.h | 1 + > MdePkg/Include/Library/PeimEntryPoint.h | 1 + > .../Include/Library/UefiApplicationEntryPoint.h | 1 + > MdePkg/Include/Library/UefiDriverEntryPoint.h | 1 + > MdePkg/Include/X64/ProcessorBind.h | 16 +++++++++++----- > .../DxeCoreEntryPoint/DxeCoreEntryPoint.c | 1 + > .../PeiCoreEntryPoint/PeiCoreEntryPoint.c | 1 + > MdePkg/Library/PeimEntryPoint/PeimEntryPoint.c | 1 + > .../ApplicationEntryPoint.c | 1 + > .../UefiDriverEntryPoint/DriverEntryPoint.c | 1 + > Nt32Pkg/Sec/SecMain.c | 1 + > .../PlatformSecLibNull/PlatformSecLibNull.c | 1 + > 26 files changed, 42 insertions(+), 5 deletions(-) > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code 2018-06-12 18:33 ` Laszlo Ersek @ 2018-06-12 18:58 ` Ard Biesheuvel 0 siblings, 0 replies; 6+ messages in thread From: Ard Biesheuvel @ 2018-06-12 18:58 UTC (permalink / raw) To: Laszlo Ersek Cc: edk2-devel@lists.01.org, Michael D Kinney, Liming Gao, Ruiyu Ni, Hao Wu, Leif Lindholm, Jordan Justen, Andrew Fish, Star Zeng, Eric Dong, Zenith432, Shi, Steven On 12 June 2018 at 20:33, Laszlo Ersek <lersek@redhat.com> wrote: > Some super-naive questions, which are supposed to educate me, and not to > question the series: > > On 06/12/18 17:22, Ard Biesheuvel wrote: >> The GCC toolchain uses PIE mode when building code for X64, because it >> is the most efficient in size: it uses relative references where >> possible, but still uses 64-bit quantities for absolute symbol >> references, > > Absolute symbol references such as? References to fixed (constant) > addresses? > I should have been clearer here: from the GCC man page (apologies for the whitespace soup) """ -mcmodel=small Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model. -mcmodel=kernel Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code. -mcmodel=medium Generate code for the medium model: the program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put into large data or BSS sections and can be located above 2GB. Programs can be statically or dynamically linked. -mcmodel=large Generate code for the large model. This model makes no assumptions about addresses and sizes of sections. """ Formerly, we used the large model because UEFI can load PE/COFF executables anywhere in the lower address space, not only in the first 2 GB. The small PIE model is the best fit for UEFI because it does not have this limitation, but [unlike the large model] only uses absolute references when necessary, and will use relative references when it can. (I.e., it assumes the program will fit in 4 GB of memory, which the large model does not) Absolute symbol references are things like statically initialized function pointer variables or other quantities whose value cannot be obtained programmatically at runtime using a relative reference. >> which is optimal for executables that need to be converted >> to PE/COFF using GenFw. > > Why is that approach optimal? As few relocations records are required as > possible? > Because GenFw translates ELF relocations into PE/COFF relocations, but only for the subset that requires fixing up at runtime. Relative references do not require such fixups, so a code model that minimizes the number of absolute relocations is therefore optimal. Note that absolute references typically require twice the space as well. >> Enabling PIE mode has a couple of side effects though, primarily caused >> by the fact that the primary application area of GCC is to build programs >> for userland. GCC will assume that ELF symbols should be preemptible (which >> makes sense for PIC but not for PIE, > > Why don't preemptible symbols make sense for PIE? > > For example, if a userspace program loads a plugin with dlopen(), and > the plugin (.so) uses helper functions from the main executable, then > the main executable has to be (well, had to be, earlier?) built with > "-rdynamic". Wouldn't this mean the main executable could both be PIE > and sensibly have preemptible symbols? > > (My apologies if I'm disturbingly ignorant about this and the question > doesn't even make sense.) > I mean that the symbols defined by the PIE executable [i.e., not shared library] can never be preempted. Only symbols in shared libraries can be preempted by the symbols in the main executable, not the other way around. >> but this simply seems to be the result >> of code being shared between the two modes), and it will attempt to keep >> absolute references close to each other so that dynamic relocations that >> trigger CoW for text pages have the smallest possible footprint. > > So... Given this behavior, why is it a problem for us? What are the bad > symptoms? What is currently broken? > The bad symptoms are that PIC code will use GOT entries for all symbol references, meaning that instead of a direct relative reference from the code, it will emit a relative reference to the GOT entry containing the absolute address of the symbol. This involves an additional memory reference, and it requires the GOT entry (which by definition contains an absolute address) to be fixed up at load time. What is broken [as reported by Zenith432] is that GCC in LTO mode may in some cases still emit GOT based relocations that GenFw currently cannot handle. If the address of a symbol is used in a calculation, or when the address of a symbol is taken but not dereferenced (but only passed to a function, for instance), GCC in -Os mode will optimize this into a GOTPCREL reference. Quoting from a private email from Zenith432 (who has already proposed GenFw changes to handle these relocations """ I figured out what's going on with LTO build in GCC5 that is compiled with -Os -flto -DUSING_LTO and does not use visibility #pragma. When compiling with LTO enabled, what happens is that all C source files are transformed during compilation stage to LTO intermediate bytecode (gimple in GCC). Then when static link (ld) takes place, all LTO intermediate bytecode is sent back to compiler code-generation backend to have machine code generated for it as if all the source code is one big C source file ("whole program optimization"). As a result of this, all the extern symbols become local symbols ! like file-level static. Because it's as if all the code is in one big source file. Since there is no dynamic linking, there are no more "extern", and all symbols are like file-level static and treated the same. This is why the LTO build stops emitting GOT loads for size-optimization purposes. GCC doesn't emit GOT loads for file-level static, and in LTO build they're all like that - so no GOT loads. But there is still something that fouls this up... If an extern symbol is defined in assembly source file. Because assembly source files don't participate in LTO. They are transformed by assembler into X64 machine code. During ld, any extern symbol that is defined in an assembly source file and declared and used by C source file is treated as before like external symbol. Which means code generator can go back to its practice of emitting GOT loads if they reduce code size. """ Instead of 'fixing' GenFw, I attempted to go back to the original changes Steven and I did for LTO, to try and remember why we could not use the GCC visibility #pragma when enabling LTO. That is the issue this series aims to fix (but it is an RFC, so comments welcome) -- Ard. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code 2018-06-12 15:22 Ard Biesheuvel 2018-06-12 18:33 ` Laszlo Ersek @ 2018-06-13 2:08 ` Shi, Steven 1 sibling, 0 replies; 6+ messages in thread From: Shi, Steven @ 2018-06-13 2:08 UTC (permalink / raw) To: Ard Biesheuvel, Zenith432 Cc: Kinney, Michael D, Gao, Liming, Ni, Ruiyu, Wu, Hao A, Leif Lindholm, Justen, Jordan L, Andrew Fish, Zeng, Star, Dong, Eric, Laszlo Ersek, edk2-devel@lists.01.org Hi Ard, Zenith, Thank you both explained the complete knowledge about ELF GOT, LTO, PIC/PIE, machine code mode and GCC visibility #pragma. It is pretty good to read them all in one picture. And I believe copying these explain to a edk2 wiki page in GitHub could be very useful for other edk2 developers. >From code change impact view, I see to use the hidden visibility for LTO, which is to remove the !defined(USING_LTO) in X64/ProcessorBind.h actually, need to change other 20+ files overall the edk2. The cost looks not small. We might need more justification to accept such change. Does the hidden visibility in LTO can improve the LTO build code size? Is there any other benefit? Steven Shi Intel\SSG\STO\UEFI Firmware Tel: +86 021-61166522 iNet: 821-6522 > -----Original Message----- > From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] > Sent: Tuesday, June 12, 2018 11:23 PM > To: edk2-devel@lists.01.org > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>; Kinney, Michael D > <michael.d.kinney@intel.com>; Gao, Liming <liming.gao@intel.com>; Ni, > Ruiyu <ruiyu.ni@intel.com>; Wu, Hao A <hao.a.wu@intel.com>; Leif > Lindholm <leif.lindholm@linaro.org>; Justen, Jordan L > <jordan.l.justen@intel.com>; Andrew Fish <afish@apple.com>; Zeng, Star > <star.zeng@intel.com>; Dong, Eric <eric.dong@intel.com>; Laszlo Ersek > <lersek@redhat.com>; Zenith432 <zenith432@users.sourceforge.net>; Shi, > Steven <steven.shi@intel.com> > Subject: [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code > > The GCC toolchain uses PIE mode when building code for X64, because it > is the most efficient in size: it uses relative references where > possible, but still uses 64-bit quantities for absolute symbol > references, which is optimal for executables that need to be converted > to PE/COFF using GenFw. > > Enabling PIE mode has a couple of side effects though, primarily caused > by the fact that the primary application area of GCC is to build programs > for userland. GCC will assume that ELF symbols should be preemptible (which > makes sense for PIC but not for PIE, but this simply seems to be the result > of code being shared between the two modes), and it will attempt to keep > absolute references close to each other so that dynamic relocations that > trigger CoW for text pages have the smallest possible footprint. > > These side effects can be mititgated by overriding the visibility of all > symbol definitions *and* symbol references, using a special #pragma. This > will inform the compiler that symbol preemption and dynamic relocations > are not a concern, and that all symbol references can be emitted as direct > relative references rather than relative references to a GOT entry containing > the absolute address. Unsurprisingly, this leads to better and smaller code. > > Unfortunately, we have not been able to set this override when LTO is in > effect, because the LTO code generator infers from the hidden visibility > of all symbols that none of the code is reachable, and discards it all, > leading to corrupt, empty binaries. > > We can work around this by overriding the visibility for symbols that are > module entry points. So implement this for all occcurrences of the symbol > '_ModuleEntryPoint', and enable 'hidden' visibility in LTO builds as well. > > Note that all the changes in this series resolve to no-ops if USING_LTO > is not #defined. > > Code can be found here: > https://github.com/ardbiesheuvel/edk2/tree/x64-lto-visibility > > Cc: Michael D Kinney <michael.d.kinney@intel.com> > Cc: Liming Gao <liming.gao@intel.com> > Cc: Ruiyu Ni <ruiyu.ni@intel.com> > Cc: Hao Wu <hao.a.wu@intel.com> > Cc: Leif Lindholm <leif.lindholm@linaro.org> > Cc: Jordan Justen <jordan.l.justen@intel.com> > Cc: Andrew Fish <afish@apple.com> > Cc: Star Zeng <star.zeng@intel.com> > Cc: Eric Dong <eric.dong@intel.com> > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Zenith432 <zenith432@users.sourceforge.net> > Cc: "Shi, Steven" <steven.shi@intel.com> > > Ard Biesheuvel (11): > MdePkg/ProcessorBind.h: define macro to decorate module entry points > DuetPkg: annotate module entry points with EFI_ENTRYPOINT > EdkCompatibilityPkg: annotate module entry points with EFI_ENTRYPOINT > EmbeddedPkg: annotate module entry points with EFI_ENTRYPOINT > EmulatorPkg: annotate module entry points with EFI_ENTRYPOINT > IntelFrameWorkPkg: annotate module entry points with EFI_ENTRYPOINT > MdeModulePkg: annotate module entry points with EFI_ENTRYPOINT > MdePkg: annotate module entry points with EFI_ENTRYPOINT > Nt32Pkg: annotate module entry points with EFI_ENTRYPOINT > UefiCpuPkg: annotate module entry points with EFI_ENTRYPOINT > MdePkg/ProcessorBind.h X64: drop non-LTO limitation on visiblity > override > > DuetPkg/DxeIpl/DxeInit.c | 1 + > DuetPkg/EfiLdr/EfiLoader.c | 1 + > .../EntryPoints/EdkIIGlueDxeDriverEntryPoint.c | 1 + > .../EntryPoints/EdkIIGluePeimEntryPoint.c | 1 + > .../EntryPoints/EdkIIGlueSmmDriverEntryPoint.c | 1 + > .../Library/EdkIIGlueDxeSmmDriverEntryPoint.h | 1 + > .../Include/Library/EdkIIGluePeimEntryPoint.h | 1 + > .../Library/EdkIIGlueUefiDriverEntryPoint.h | 1 + > EmbeddedPkg/TemplateSec/TemplateSec.c | 1 + > EmulatorPkg/Sec/Sec.c | 1 + > .../DxeSmmDriverEntryPoint/DriverEntryPoint.c | 1 + > MdeModulePkg/Universal/CapsulePei/X64/X64Entry.c | 1 + > MdePkg/Include/Base.h | 7 +++++++ > MdePkg/Include/Library/DxeCoreEntryPoint.h | 1 + > MdePkg/Include/Library/PeiCoreEntryPoint.h | 1 + > MdePkg/Include/Library/PeimEntryPoint.h | 1 + > .../Include/Library/UefiApplicationEntryPoint.h | 1 + > MdePkg/Include/Library/UefiDriverEntryPoint.h | 1 + > MdePkg/Include/X64/ProcessorBind.h | 16 +++++++++++----- > .../DxeCoreEntryPoint/DxeCoreEntryPoint.c | 1 + > .../PeiCoreEntryPoint/PeiCoreEntryPoint.c | 1 + > MdePkg/Library/PeimEntryPoint/PeimEntryPoint.c | 1 + > .../ApplicationEntryPoint.c | 1 + > .../UefiDriverEntryPoint/DriverEntryPoint.c | 1 + > Nt32Pkg/Sec/SecMain.c | 1 + > .../PlatformSecLibNull/PlatformSecLibNull.c | 1 + > 26 files changed, 42 insertions(+), 5 deletions(-) > > -- > 2.17.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-06-13 8:04 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1971023844.4599916.1528877047633.ref@mail.yahoo.com> 2018-06-13 8:04 ` [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code Zenith432 [not found] <1142041495.4269416.1528831046054.ref@mail.yahoo.com> 2018-06-12 19:17 ` Zenith432 2018-06-12 15:22 Ard Biesheuvel 2018-06-12 18:33 ` Laszlo Ersek 2018-06-12 18:58 ` Ard Biesheuvel 2018-06-13 2:08 ` Shi, Steven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox