* Managing GCC Assembly Code Size (AArch64) @ 2016-08-04 18:08 Cohen, Eugene 2016-08-04 19:18 ` Ard Biesheuvel 0 siblings, 1 reply; 4+ messages in thread From: Cohen, Eugene @ 2016-08-04 18:08 UTC (permalink / raw) To: Ard Biesheuvel, Leif Lindholm; +Cc: edk2-devel@lists.01.org Ard and Leif, I've been too backlogged to provide a real patchset at this point but wanted to get your approval on this proposal... As you know we have some code size sensitive uncompressed XIP stuff going on. For C code we get dead code stripping thanks to the "-ffunction-sections" switch which places each function in its own section so the linker can strip unreferenced sections. For assembly there is not a solution that's as easy. For RVCT we handled this with an assembler macro that combined the procedure label definition, export of global symbols and placement of the procedure in its own section. For GCC I haven't found a way to fully do this because we rely on the C preprocessor for assembly which means you cannot expand to multi-line macros. (The label and assembler directives require their own lines but the preprocessor collapses stuff onto one line because in the C language newlines don't matter.) So the solution I've settled on is to do this: in MdePkg\Include\AArch64\ProcessorBind.h define: /// Macro to place a function in its own section for dead code elimination /// This must be placed directly before the corresponding code since the /// .section directive applies to the code that follows it. #define GCC_ASM_EXPORT_SECTION(func__) \ .global _CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\ .section .text._CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\ .type ASM_PFX(func__), %function; \ This has the effect of placing the function in a section called .text.<func__> so the linker can do its dead code stripping stuff. It also absorbs the making the symbol globally visible so the corresponding GCC_ASM_EXPORT statement can be removed. then for every single assembly procedure change from this: [top of file] GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA) [lower down] ASM_PFX(ArmInvalidateDataCacheEntryByMVA): dc ivac, x0 // Invalidate single data cache line ret to this: GCC_ASM_EXPORT_SECTION(ArmInvalidateDataCacheEntryByMVA) ASM_PFX(ArmInvalidateDataCacheEntryByMVA): dc ivac, x0 // Invalidate single data cache line ret Because the assembly label must appear in column 1 I couldn't find a way to use the C preprocessor to absorb it so hence the two lines. If you can find a way to improve on this it would be great. I'm not sure what impacts this might have to other toolchains - can this be translated to CLANG and ARM Compiler? I'd like to get your OK on this conceptually and then I could upstream some patches that modify the AArch64 *.S files to use this approach. Unfortunately it won't be complete because I only updated the libraries that we use. My hope is that long term all assembly (or at least assembly in libraries) adopt this approach so we are positioned for maximum dead code stripping. Thanks, Eugene ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Managing GCC Assembly Code Size (AArch64) 2016-08-04 18:08 Managing GCC Assembly Code Size (AArch64) Cohen, Eugene @ 2016-08-04 19:18 ` Ard Biesheuvel 2016-08-04 19:47 ` Ard Biesheuvel 0 siblings, 1 reply; 4+ messages in thread From: Ard Biesheuvel @ 2016-08-04 19:18 UTC (permalink / raw) To: Cohen, Eugene; +Cc: Leif Lindholm, edk2-devel@lists.01.org On 4 August 2016 at 20:08, Cohen, Eugene <eugene@hp.com> wrote: > Ard and Leif, > > I've been too backlogged to provide a real patchset at this point but wanted to get your approval on this proposal... > > > As you know we have some code size sensitive uncompressed XIP stuff going on. For C code we get dead code stripping thanks to the "-ffunction-sections" switch which places each function in its own section so the linker can strip unreferenced sections. > > For assembly there is not a solution that's as easy. For RVCT we handled this with an assembler macro that combined the procedure label definition, export of global symbols and placement of the procedure in its own section. For GCC I haven't found a way to fully do this because we rely on the C preprocessor for assembly which means you cannot expand to multi-line macros. (The label and assembler directives require their own lines but the preprocessor collapses stuff onto one line because in the C language newlines don't matter.) > > So the solution I've settled on is to do this: > > in MdePkg\Include\AArch64\ProcessorBind.h define: > > /// Macro to place a function in its own section for dead code elimination > /// This must be placed directly before the corresponding code since the > /// .section directive applies to the code that follows it. > #define GCC_ASM_EXPORT_SECTION(func__) \ > .global _CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\ > .section .text._CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\ > .type ASM_PFX(func__), %function; \ > > This has the effect of placing the function in a section called .text.<func__> so the linker can do its dead code stripping stuff. It also absorbs the making the symbol globally visible so the corresponding GCC_ASM_EXPORT statement can be removed. > > then for every single assembly procedure change from this: > > [top of file] > GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA) > > [lower down] > ASM_PFX(ArmInvalidateDataCacheEntryByMVA): > dc ivac, x0 // Invalidate single data cache line > ret > > to this: > > GCC_ASM_EXPORT_SECTION(ArmInvalidateDataCacheEntryByMVA) > ASM_PFX(ArmInvalidateDataCacheEntryByMVA): > dc ivac, x0 // Invalidate single data cache line > ret > > Because the assembly label must appear in column 1 I couldn't find a way to use the C preprocessor to absorb it so hence the two lines. If you can find a way to improve on this it would be great. > What about GAS macros (.macro / .endm). I prefer those over cpp macros in assembler anyway. > I'm not sure what impacts this might have to other toolchains - can this be translated to CLANG and ARM Compiler? > The asm dialect is 99% aligned between CLANG and GNU as, so this shouldn't be a problem > I'd like to get your OK on this conceptually and then I could upstream some patches that modify the AArch64 *.S files to use this approach. Unfortunately it won't be complete because I only updated the libraries that we use. My hope is that long term all assembly (or at least assembly in libraries) adopt this approach so we are positioned for maximum dead code stripping. > I think this would be an improvement, so go for it. The only thing to be wary of is routines that fall through into the subsequent one. Those need to remain in the same section. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Managing GCC Assembly Code Size (AArch64) 2016-08-04 19:18 ` Ard Biesheuvel @ 2016-08-04 19:47 ` Ard Biesheuvel 2016-08-04 21:55 ` Cohen, Eugene 0 siblings, 1 reply; 4+ messages in thread From: Ard Biesheuvel @ 2016-08-04 19:47 UTC (permalink / raw) To: Cohen, Eugene; +Cc: Leif Lindholm, edk2-devel@lists.01.org On 4 August 2016 at 21:18, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 4 August 2016 at 20:08, Cohen, Eugene <eugene@hp.com> wrote: >> Ard and Leif, >> >> I've been too backlogged to provide a real patchset at this point but wanted to get your approval on this proposal... >> >> >> As you know we have some code size sensitive uncompressed XIP stuff going on. For C code we get dead code stripping thanks to the "-ffunction-sections" switch which places each function in its own section so the linker can strip unreferenced sections. >> >> For assembly there is not a solution that's as easy. For RVCT we handled this with an assembler macro that combined the procedure label definition, export of global symbols and placement of the procedure in its own section. For GCC I haven't found a way to fully do this because we rely on the C preprocessor for assembly which means you cannot expand to multi-line macros. (The label and assembler directives require their own lines but the preprocessor collapses stuff onto one line because in the C language newlines don't matter.) >> >> So the solution I've settled on is to do this: >> >> in MdePkg\Include\AArch64\ProcessorBind.h define: >> >> /// Macro to place a function in its own section for dead code elimination >> /// This must be placed directly before the corresponding code since the >> /// .section directive applies to the code that follows it. >> #define GCC_ASM_EXPORT_SECTION(func__) \ >> .global _CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\ >> .section .text._CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\ >> .type ASM_PFX(func__), %function; \ >> >> This has the effect of placing the function in a section called .text.<func__> so the linker can do its dead code stripping stuff. It also absorbs the making the symbol globally visible so the corresponding GCC_ASM_EXPORT statement can be removed. >> >> then for every single assembly procedure change from this: >> >> [top of file] >> GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA) >> >> [lower down] >> ASM_PFX(ArmInvalidateDataCacheEntryByMVA): >> dc ivac, x0 // Invalidate single data cache line >> ret >> >> to this: >> >> GCC_ASM_EXPORT_SECTION(ArmInvalidateDataCacheEntryByMVA) >> ASM_PFX(ArmInvalidateDataCacheEntryByMVA): >> dc ivac, x0 // Invalidate single data cache line >> ret >> >> Because the assembly label must appear in column 1 I couldn't find a way to use the C preprocessor to absorb it so hence the two lines. If you can find a way to improve on this it would be great. >> > > What about GAS macros (.macro / .endm). I prefer those over cpp macros > in assembler anyway. > FYI there is a null token \() for GAS which you can use to concatenate a string with a macro argument, e.g., .macro func, x .globl \x .type \x, %function .section .text.\x \x\(): .endm >> I'm not sure what impacts this might have to other toolchains - can this be translated to CLANG and ARM Compiler? >> > > The asm dialect is 99% aligned between CLANG and GNU as, so this > shouldn't be a problem > >> I'd like to get your OK on this conceptually and then I could upstream some patches that modify the AArch64 *.S files to use this approach. Unfortunately it won't be complete because I only updated the libraries that we use. My hope is that long term all assembly (or at least assembly in libraries) adopt this approach so we are positioned for maximum dead code stripping. >> > > I think this would be an improvement, so go for it. The only thing to > be wary of is routines that fall through into the subsequent one. > Those need to remain in the same section. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Managing GCC Assembly Code Size (AArch64) 2016-08-04 19:47 ` Ard Biesheuvel @ 2016-08-04 21:55 ` Cohen, Eugene 0 siblings, 0 replies; 4+ messages in thread From: Cohen, Eugene @ 2016-08-04 21:55 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Leif Lindholm, edk2-devel@lists.01.org Ard, as usual you rock... > FYI there is a null token \() for GAS which you can use to concatenate > a string with a macro argument, e.g., > > .macro func, x > .globl \x > .type \x, %function > .section .text.\x > \x\(): > .endm > Using the GAS .macro syntax this all collapses nicely. I tested it with one assembly function and all the right stuff happens. So the request becomes: can we modify all of the assembly (at least Aarch64 please) to use this? How would you like to phase this in? > > I think this would be an improvement, so go for it. The only thing to > > be wary of is routines that fall through into the subsequent one. > > Those need to remain in the same section. Yes, I've accidentally modified these with disastrous results. I now know to stay away from them (ExceptionSupport.S in particular). :) Thanks, Eugene > -----Original Message----- > From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] > Sent: Thursday, August 04, 2016 1:47 PM > To: Cohen, Eugene <eugene@hp.com> > Cc: Leif Lindholm <leif.lindholm@linaro.org>; edk2-devel@lists.01.org > Subject: Re: Managing GCC Assembly Code Size (AArch64) > > On 4 August 2016 at 21:18, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On 4 August 2016 at 20:08, Cohen, Eugene <eugene@hp.com> > wrote: > >> Ard and Leif, > >> > >> I've been too backlogged to provide a real patchset at this point but > wanted to get your approval on this proposal... > >> > >> > >> As you know we have some code size sensitive uncompressed XIP > stuff going on. For C code we get dead code stripping thanks to the "- > ffunction-sections" switch which places each function in its own > section so the linker can strip unreferenced sections. > >> > >> For assembly there is not a solution that's as easy. For RVCT we > handled this with an assembler macro that combined the procedure > label definition, export of global symbols and placement of the > procedure in its own section. For GCC I haven't found a way to fully do > this because we rely on the C preprocessor for assembly which means > you cannot expand to multi-line macros. (The label and assembler > directives require their own lines but the preprocessor collapses stuff > onto one line because in the C language newlines don't matter.) > >> > >> So the solution I've settled on is to do this: > >> > >> in MdePkg\Include\AArch64\ProcessorBind.h define: > >> > >> /// Macro to place a function in its own section for dead code > elimination > >> /// This must be placed directly before the corresponding code > since the > >> /// .section directive applies to the code that follows it. > >> #define GCC_ASM_EXPORT_SECTION(func__) \ > >> .global _CONCATENATE (__USER_LABEL_PREFIX__, func__) > ;\ > >> .section .text._CONCATENATE (__USER_LABEL_PREFIX__, > func__) ;\ > >> .type ASM_PFX(func__), %function; \ > >> > >> This has the effect of placing the function in a section called > .text.<func__> so the linker can do its dead code stripping stuff. It also > absorbs the making the symbol globally visible so the corresponding > GCC_ASM_EXPORT statement can be removed. > >> > >> then for every single assembly procedure change from this: > >> > >> [top of file] > >> GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA) > >> > >> [lower down] > >> ASM_PFX(ArmInvalidateDataCacheEntryByMVA): > >> dc ivac, x0 // Invalidate single data cache line > >> ret > >> > >> to this: > >> > >> > GCC_ASM_EXPORT_SECTION(ArmInvalidateDataCacheEntryByMVA) > >> ASM_PFX(ArmInvalidateDataCacheEntryByMVA): > >> dc ivac, x0 // Invalidate single data cache line > >> ret > >> > >> Because the assembly label must appear in column 1 I couldn't find > a way to use the C preprocessor to absorb it so hence the two lines. If > you can find a way to improve on this it would be great. > >> > > > > What about GAS macros (.macro / .endm). I prefer those over cpp > macros > > in assembler anyway. > > > > FYI there is a null token \() for GAS which you can use to concatenate > a string with a macro argument, e.g., > > .macro func, x > .globl \x > .type \x, %function > .section .text.\x > \x\(): > .endm > > > >> I'm not sure what impacts this might have to other toolchains - can > this be translated to CLANG and ARM Compiler? > >> > > > > The asm dialect is 99% aligned between CLANG and GNU as, so this > > shouldn't be a problem > > > >> I'd like to get your OK on this conceptually and then I could > upstream some patches that modify the AArch64 *.S files to use this > approach. Unfortunately it won't be complete because I only updated > the libraries that we use. My hope is that long term all assembly (or at > least assembly in libraries) adopt this approach so we are positioned > for maximum dead code stripping. > >> > > > > I think this would be an improvement, so go for it. The only thing to > > be wary of is routines that fall through into the subsequent one. > > Those need to remain in the same section. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-08-04 21:55 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-04 18:08 Managing GCC Assembly Code Size (AArch64) Cohen, Eugene 2016-08-04 19:18 ` Ard Biesheuvel 2016-08-04 19:47 ` Ard Biesheuvel 2016-08-04 21:55 ` Cohen, Eugene
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox