public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it
       [not found] <1469618762-7648-1-git-send-email-ard.biesheuvel@linaro.org>
@ 2016-08-01 11:53 ` Ard Biesheuvel
  2016-08-01 14:09   ` Gao, Liming
  2016-08-01 14:19   ` Leif Lindholm
  0 siblings, 2 replies; 4+ messages in thread
From: Ard Biesheuvel @ 2016-08-01 11:53 UTC (permalink / raw)
  To: edk2-devel-01, Gao, Liming, Zhu, Yonghong
  Cc: Leif Lindholm, Cohen, Eugene, Ard Biesheuvel

On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> The ADRP instruction in the AArch64 ISA requires the link time and load
> time offsets of a binary to be equal modulo 4 KB. The reason is that this
> instruction always produces a multiple of 4 KB, and relies on a subsequent
> ADD or LDR instruction to set the offset into the page. The resulting
> symbol reference only produces the correct value if the symbol in question
> resides at that exact offset into the page, and so loading the binary at
> arbitrary offsets is not possible.
>
> Due to the various levels of padding when packing FVs into FVs into FDs,
> this alignment is very costly for XIP code, and so we would like to relax
> this alignment requirement if possible.
>
> Given that symbols that are sufficiently close (within 1 MB) of the
> reference can also be reached using an ADR instruction which does not
> suffer from this alignment issue, let's replace ADRP instructions with ADR
> after linking if the offset can be encoded in this instruction's immediate
> field. Note that this only makes sense if the section alignment is < 4 KB.
> Otherwise, replacing the ADRP has no benefit, considering that the
> subsequent ADD or LDR instruction is retained, and that micro-architectures
> are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
> fusing) than for ADR/ADD pairs, which are non-typical.
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

@Liming, @Leif:

are there any objections to these patches? I know it is unfortunate
that we need to modify instructions as part of the ELF to PE/COFF
conversion, but it is very effective

ArmVirtQemu-AARCH64 built with CLANG35:

Before:

FVMAIN_COMPACT [41%Full] 2093056 total, 868416 used, 1224640 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

After:

FVMAIN_COMPACT [36%Full] 2093056 total, 768064 used, 1324992 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

For comparision, GCC49

FVMAIN_COMPACT [35%Full] 2093056 total, 749960 used, 1343096 free
FVMAIN [99%Full] 3929088 total, 3929032 used, 56 free

and GCC5 (with LTO)

FVMAIN_COMPACT [34%Full] 2093056 total, 732400 used, 1360656 free
FVMAIN [99%Full] 3730240 total, 3730216 used, 24 free

In other words, it turns CLANG35 from a pathetic outlier into
something usable :-)

Regards,
Ard.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it
  2016-08-01 11:53 ` [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it Ard Biesheuvel
@ 2016-08-01 14:09   ` Gao, Liming
  2016-08-01 14:19   ` Leif Lindholm
  1 sibling, 0 replies; 4+ messages in thread
From: Gao, Liming @ 2016-08-01 14:09 UTC (permalink / raw)
  To: Ard Biesheuvel, edk2-devel-01, Zhu, Yonghong; +Cc: Leif Lindholm, Cohen, Eugene

Ard:
  Your change is OK to me. I have no comment.

Thanks
Liming
From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
Sent: Monday, August 1, 2016 7:53 PM
To: edk2-devel-01 <edk2-devel@lists.01.org>; Gao, Liming <liming.gao@intel.com>; Zhu, Yonghong <yonghong.zhu@intel.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>; Cohen, Eugene <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it

On 27 July 2016 at 13:26, Ard Biesheuvel wrote:
> The ADRP instruction in the AArch64 ISA requires the link time and load
> time offsets of a binary to be equal modulo 4 KB. The reason is that this
> instruction always produces a multiple of 4 KB, and relies on a subsequent
> ADD or LDR instruction to set the offset into the page. The resulting
> symbol reference only produces the correct value if the symbol in question
> resides at that exact offset into the page, and so loading the binary at
> arbitrary offsets is not possible.
>
> Due to the various levels of padding when packing FVs into FVs into FDs,
> this alignment is very costly for XIP code, and so we would like to relax
> this alignment requirement if possible.
>
> Given that symbols that are sufficiently close (within 1 MB) of the
> reference can also be reached using an ADR instruction which does not
> suffer from this alignment issue, let's replace ADRP instructions with ADR
> after linking if the offset can be encoded in this instruction's immediate
> field. Note that this only makes sense if the section alignment is < 4 KB.
> Otherwise, replacing the ADRP has no benefit, considering that the
> subsequent ADD or LDR instruction is retained, and that micro-architectures
> are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
> fusing) than for ADR/ADD pairs, which are non-typical.
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Ard Biesheuvel

@Liming, @Leif:

are there any objections to these patches? I know it is unfortunate
that we need to modify instructions as part of the ELF to PE/COFF
conversion, but it is very effective

ArmVirtQemu-AARCH64 built with CLANG35:

Before:

FVMAIN_COMPACT [41%Full] 2093056 total, 868416 used, 1224640 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

After:

FVMAIN_COMPACT [36%Full] 2093056 total, 768064 used, 1324992 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

For comparision, GCC49

FVMAIN_COMPACT [35%Full] 2093056 total, 749960 used, 1343096 free
FVMAIN [99%Full] 3929088 total, 3929032 used, 56 free

and GCC5 (with LTO)

FVMAIN_COMPACT [34%Full] 2093056 total, 732400 used, 1360656 free
FVMAIN [99%Full] 3730240 total, 3730216 used, 24 free

In other words, it turns CLANG35 from a pathetic outlier into
something usable :-)

Regards,
Ard.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it
  2016-08-01 11:53 ` [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it Ard Biesheuvel
  2016-08-01 14:09   ` Gao, Liming
@ 2016-08-01 14:19   ` Leif Lindholm
  2016-08-02  9:03     ` Ard Biesheuvel
  1 sibling, 1 reply; 4+ messages in thread
From: Leif Lindholm @ 2016-08-01 14:19 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: edk2-devel-01, Gao, Liming, Zhu, Yonghong, Cohen, Eugene

Apologies, lost track of this one.

On Mon, Aug 01, 2016 at 01:53:09PM +0200, Ard Biesheuvel wrote:
> On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > The ADRP instruction in the AArch64 ISA requires the link time and load
> > time offsets of a binary to be equal modulo 4 KB. The reason is that this
> > instruction always produces a multiple of 4 KB, and relies on a subsequent
> > ADD or LDR instruction to set the offset into the page. The resulting
> > symbol reference only produces the correct value if the symbol in question
> > resides at that exact offset into the page, and so loading the binary at
> > arbitrary offsets is not possible.
> >
> > Due to the various levels of padding when packing FVs into FVs into FDs,
> > this alignment is very costly for XIP code, and so we would like to relax
> > this alignment requirement if possible.
> >
> > Given that symbols that are sufficiently close (within 1 MB) of the
> > reference can also be reached using an ADR instruction which does not
> > suffer from this alignment issue, let's replace ADRP instructions with ADR
> > after linking if the offset can be encoded in this instruction's immediate
> > field. Note that this only makes sense if the section alignment is < 4 KB.
> > Otherwise, replacing the ADRP has no benefit, considering that the
> > subsequent ADD or LDR instruction is retained, and that micro-architectures
> > are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
> > fusing) than for ADR/ADD pairs, which are non-typical.
> >
> > Contributed-under: TianoCore Contribution Agreement 1.0
> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> @Liming, @Leif:
> 
> are there any objections to these patches? I know it is unfortunate
> that we need to modify instructions as part of the ELF to PE/COFF
> conversion, but it is very effective

It's absolutely horrid, but extremely useful.
For the series:
Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>

> ArmVirtQemu-AARCH64 built with CLANG35:
> 
> Before:
> 
> FVMAIN_COMPACT [41%Full] 2093056 total, 868416 used, 1224640 free
> FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free
> 
> After:
> 
> FVMAIN_COMPACT [36%Full] 2093056 total, 768064 used, 1324992 free
> FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free
> 
> For comparision, GCC49
> 
> FVMAIN_COMPACT [35%Full] 2093056 total, 749960 used, 1343096 free
> FVMAIN [99%Full] 3929088 total, 3929032 used, 56 free
> 
> and GCC5 (with LTO)
> 
> FVMAIN_COMPACT [34%Full] 2093056 total, 732400 used, 1360656 free
> FVMAIN [99%Full] 3730240 total, 3730216 used, 24 free
> 
> In other words, it turns CLANG35 from a pathetic outlier into
> something usable :-)
> 
> Regards,
> Ard.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it
  2016-08-01 14:19   ` Leif Lindholm
@ 2016-08-02  9:03     ` Ard Biesheuvel
  0 siblings, 0 replies; 4+ messages in thread
From: Ard Biesheuvel @ 2016-08-02  9:03 UTC (permalink / raw)
  To: Leif Lindholm; +Cc: edk2-devel-01, Gao, Liming, Zhu, Yonghong, Cohen, Eugene

On 1 August 2016 at 16:19, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> Apologies, lost track of this one.
>
> On Mon, Aug 01, 2016 at 01:53:09PM +0200, Ard Biesheuvel wrote:
>> On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > The ADRP instruction in the AArch64 ISA requires the link time and load
>> > time offsets of a binary to be equal modulo 4 KB. The reason is that this
>> > instruction always produces a multiple of 4 KB, and relies on a subsequent
>> > ADD or LDR instruction to set the offset into the page. The resulting
>> > symbol reference only produces the correct value if the symbol in question
>> > resides at that exact offset into the page, and so loading the binary at
>> > arbitrary offsets is not possible.
>> >
>> > Due to the various levels of padding when packing FVs into FVs into FDs,
>> > this alignment is very costly for XIP code, and so we would like to relax
>> > this alignment requirement if possible.
>> >
>> > Given that symbols that are sufficiently close (within 1 MB) of the
>> > reference can also be reached using an ADR instruction which does not
>> > suffer from this alignment issue, let's replace ADRP instructions with ADR
>> > after linking if the offset can be encoded in this instruction's immediate
>> > field. Note that this only makes sense if the section alignment is < 4 KB.
>> > Otherwise, replacing the ADRP has no benefit, considering that the
>> > subsequent ADD or LDR instruction is retained, and that micro-architectures
>> > are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
>> > fusing) than for ADR/ADD pairs, which are non-typical.
>> >
>> > Contributed-under: TianoCore Contribution Agreement 1.0
>> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>
>> @Liming, @Leif:
>>
>> are there any objections to these patches? I know it is unfortunate
>> that we need to modify instructions as part of the ELF to PE/COFF
>> conversion, but it is very effective
>
> It's absolutely horrid, but extremely useful.
> For the series:
> Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>
>

Thanks

Committed as

026a82abf0bd BaseTools/GenFw AARCH64: convert ADRP to ADR instructions
if binary size allows it
b89919ee8f8c BaseTools AARCH64: override XIP module linker alignment to 32 bytes


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-02  9:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1469618762-7648-1-git-send-email-ard.biesheuvel@linaro.org>
2016-08-01 11:53 ` [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it Ard Biesheuvel
2016-08-01 14:09   ` Gao, Liming
2016-08-01 14:19   ` Leif Lindholm
2016-08-02  9:03     ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox