* Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it
2016-08-01 11:53 ` [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it Ard Biesheuvel
@ 2016-08-01 14:09 ` Gao, Liming
2016-08-01 14:19 ` Leif Lindholm
1 sibling, 0 replies; 4+ messages in thread
From: Gao, Liming @ 2016-08-01 14:09 UTC (permalink / raw)
To: Ard Biesheuvel, edk2-devel-01, Zhu, Yonghong; +Cc: Leif Lindholm, Cohen, Eugene
Ard:
Your change is OK to me. I have no comment.
Thanks
Liming
From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
Sent: Monday, August 1, 2016 7:53 PM
To: edk2-devel-01 <edk2-devel@lists.01.org>; Gao, Liming <liming.gao@intel.com>; Zhu, Yonghong <yonghong.zhu@intel.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>; Cohen, Eugene <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it
On 27 July 2016 at 13:26, Ard Biesheuvel wrote:
> The ADRP instruction in the AArch64 ISA requires the link time and load
> time offsets of a binary to be equal modulo 4 KB. The reason is that this
> instruction always produces a multiple of 4 KB, and relies on a subsequent
> ADD or LDR instruction to set the offset into the page. The resulting
> symbol reference only produces the correct value if the symbol in question
> resides at that exact offset into the page, and so loading the binary at
> arbitrary offsets is not possible.
>
> Due to the various levels of padding when packing FVs into FVs into FDs,
> this alignment is very costly for XIP code, and so we would like to relax
> this alignment requirement if possible.
>
> Given that symbols that are sufficiently close (within 1 MB) of the
> reference can also be reached using an ADR instruction which does not
> suffer from this alignment issue, let's replace ADRP instructions with ADR
> after linking if the offset can be encoded in this instruction's immediate
> field. Note that this only makes sense if the section alignment is < 4 KB.
> Otherwise, replacing the ADRP has no benefit, considering that the
> subsequent ADD or LDR instruction is retained, and that micro-architectures
> are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
> fusing) than for ADR/ADD pairs, which are non-typical.
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Ard Biesheuvel
@Liming, @Leif:
are there any objections to these patches? I know it is unfortunate
that we need to modify instructions as part of the ELF to PE/COFF
conversion, but it is very effective
ArmVirtQemu-AARCH64 built with CLANG35:
Before:
FVMAIN_COMPACT [41%Full] 2093056 total, 868416 used, 1224640 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free
After:
FVMAIN_COMPACT [36%Full] 2093056 total, 768064 used, 1324992 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free
For comparision, GCC49
FVMAIN_COMPACT [35%Full] 2093056 total, 749960 used, 1343096 free
FVMAIN [99%Full] 3929088 total, 3929032 used, 56 free
and GCC5 (with LTO)
FVMAIN_COMPACT [34%Full] 2093056 total, 732400 used, 1360656 free
FVMAIN [99%Full] 3730240 total, 3730216 used, 24 free
In other words, it turns CLANG35 from a pathetic outlier into
something usable :-)
Regards,
Ard.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it
2016-08-01 11:53 ` [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it Ard Biesheuvel
2016-08-01 14:09 ` Gao, Liming
@ 2016-08-01 14:19 ` Leif Lindholm
2016-08-02 9:03 ` Ard Biesheuvel
1 sibling, 1 reply; 4+ messages in thread
From: Leif Lindholm @ 2016-08-01 14:19 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: edk2-devel-01, Gao, Liming, Zhu, Yonghong, Cohen, Eugene
Apologies, lost track of this one.
On Mon, Aug 01, 2016 at 01:53:09PM +0200, Ard Biesheuvel wrote:
> On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > The ADRP instruction in the AArch64 ISA requires the link time and load
> > time offsets of a binary to be equal modulo 4 KB. The reason is that this
> > instruction always produces a multiple of 4 KB, and relies on a subsequent
> > ADD or LDR instruction to set the offset into the page. The resulting
> > symbol reference only produces the correct value if the symbol in question
> > resides at that exact offset into the page, and so loading the binary at
> > arbitrary offsets is not possible.
> >
> > Due to the various levels of padding when packing FVs into FVs into FDs,
> > this alignment is very costly for XIP code, and so we would like to relax
> > this alignment requirement if possible.
> >
> > Given that symbols that are sufficiently close (within 1 MB) of the
> > reference can also be reached using an ADR instruction which does not
> > suffer from this alignment issue, let's replace ADRP instructions with ADR
> > after linking if the offset can be encoded in this instruction's immediate
> > field. Note that this only makes sense if the section alignment is < 4 KB.
> > Otherwise, replacing the ADRP has no benefit, considering that the
> > subsequent ADD or LDR instruction is retained, and that micro-architectures
> > are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
> > fusing) than for ADR/ADD pairs, which are non-typical.
> >
> > Contributed-under: TianoCore Contribution Agreement 1.0
> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> @Liming, @Leif:
>
> are there any objections to these patches? I know it is unfortunate
> that we need to modify instructions as part of the ELF to PE/COFF
> conversion, but it is very effective
It's absolutely horrid, but extremely useful.
For the series:
Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>
> ArmVirtQemu-AARCH64 built with CLANG35:
>
> Before:
>
> FVMAIN_COMPACT [41%Full] 2093056 total, 868416 used, 1224640 free
> FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free
>
> After:
>
> FVMAIN_COMPACT [36%Full] 2093056 total, 768064 used, 1324992 free
> FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free
>
> For comparision, GCC49
>
> FVMAIN_COMPACT [35%Full] 2093056 total, 749960 used, 1343096 free
> FVMAIN [99%Full] 3929088 total, 3929032 used, 56 free
>
> and GCC5 (with LTO)
>
> FVMAIN_COMPACT [34%Full] 2093056 total, 732400 used, 1360656 free
> FVMAIN [99%Full] 3730240 total, 3730216 used, 24 free
>
> In other words, it turns CLANG35 from a pathetic outlier into
> something usable :-)
>
> Regards,
> Ard.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it
2016-08-01 14:19 ` Leif Lindholm
@ 2016-08-02 9:03 ` Ard Biesheuvel
0 siblings, 0 replies; 4+ messages in thread
From: Ard Biesheuvel @ 2016-08-02 9:03 UTC (permalink / raw)
To: Leif Lindholm; +Cc: edk2-devel-01, Gao, Liming, Zhu, Yonghong, Cohen, Eugene
On 1 August 2016 at 16:19, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> Apologies, lost track of this one.
>
> On Mon, Aug 01, 2016 at 01:53:09PM +0200, Ard Biesheuvel wrote:
>> On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > The ADRP instruction in the AArch64 ISA requires the link time and load
>> > time offsets of a binary to be equal modulo 4 KB. The reason is that this
>> > instruction always produces a multiple of 4 KB, and relies on a subsequent
>> > ADD or LDR instruction to set the offset into the page. The resulting
>> > symbol reference only produces the correct value if the symbol in question
>> > resides at that exact offset into the page, and so loading the binary at
>> > arbitrary offsets is not possible.
>> >
>> > Due to the various levels of padding when packing FVs into FVs into FDs,
>> > this alignment is very costly for XIP code, and so we would like to relax
>> > this alignment requirement if possible.
>> >
>> > Given that symbols that are sufficiently close (within 1 MB) of the
>> > reference can also be reached using an ADR instruction which does not
>> > suffer from this alignment issue, let's replace ADRP instructions with ADR
>> > after linking if the offset can be encoded in this instruction's immediate
>> > field. Note that this only makes sense if the section alignment is < 4 KB.
>> > Otherwise, replacing the ADRP has no benefit, considering that the
>> > subsequent ADD or LDR instruction is retained, and that micro-architectures
>> > are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
>> > fusing) than for ADR/ADD pairs, which are non-typical.
>> >
>> > Contributed-under: TianoCore Contribution Agreement 1.0
>> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>
>> @Liming, @Leif:
>>
>> are there any objections to these patches? I know it is unfortunate
>> that we need to modify instructions as part of the ELF to PE/COFF
>> conversion, but it is very effective
>
> It's absolutely horrid, but extremely useful.
> For the series:
> Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>
>
Thanks
Committed as
026a82abf0bd BaseTools/GenFw AARCH64: convert ADRP to ADR instructions
if binary size allows it
b89919ee8f8c BaseTools AARCH64: override XIP module linker alignment to 32 bytes
^ permalink raw reply [flat|nested] 4+ messages in thread