From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Leif Lindholm <leif.lindholm@linaro.org>
Cc: edk2-devel-01 <edk2-devel@lists.01.org>,
Jeff Brasen <jbrasen@codeaurora.org>,
"Tian, Feng" <feng.tian@intel.com>,
"Zeng, Star" <star.zeng@intel.com>,
Daniil Egranov <daniil.egranov@arm.com>
Subject: Re: [PATCH v3 3/4] MdeModulePkg/EbxDxe AARCH64: use tail call for EBC to native thunk
Date: Fri, 26 Aug 2016 19:41:41 +0100 [thread overview]
Message-ID: <240A60BE-4A88-4ACA-BAF4-680502EF8EFD@linaro.org> (raw)
In-Reply-To: <20160826181045.GL4715@bivouac.eciton.net>
> On 26 aug. 2016, at 19:10, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>
>> On Fri, Aug 26, 2016 at 06:14:10PM +0100, Ard Biesheuvel wrote:
>>> On 26 August 2016 at 13:54, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>>>> On Wed, Aug 17, 2016 at 04:59:04PM +0200, Ard Biesheuvel wrote:
>>>> Instead of pessimistically copying at least 64 bytes from the VM stack
>>>> to the native stack, and popping off the register arguments again
>>>> before doing the native call, try to avoid touching the stack completely
>>>> if the VM stack frame is < 64 bytes. Also, if the stack frame does exceed
>>>
>>> Should this say "does not"?
>>
>> No, if it exceeds 64 bytes, we may have arguments that should be
>> passed on the stack (but we're not sure, unfortunately, since the VM
>> stack frame is not annotated, i.e., it could simply be local vars in
>> the caller) In this case, there is no need to push all arguments onto
>> the native stack, and pop off the first 8 into registers again, which
>> is what the original code was doing.
>
> OK, thanks.
> Should that not be <= 64 bytes above then?
>
Correct
>>>> 64 bytes, there is no need to copy the first 64 bytes, since we are passing
>>>> those in registers anyway.
>>>>
>>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>> MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S | 73 +++++++++++++++-----
>>>> 1 file changed, 55 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S
>>>> index cb7a70b5a4f8..d95713e82b0f 100644
>>>> --- a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S
>>>> +++ b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S
>>>> @@ -35,30 +35,67 @@ ASM_GLOBAL ASM_PFX(mEbcInstructionBufferTemplate)
>>>> //****************************************************************************
>>>> // UINTN EbcLLCALLEXNative(UINTN FuncAddr, UINTN NewStackPointer, VOID *FramePtr)
>>>> ASM_PFX(EbcLLCALLEXNative):
>>>> - stp x19, x20, [sp, #-16]!
>>>> - stp x29, x30, [sp, #-16]!
>>>> + mov x8, x0 // Preserve x0
>>>> + mov x9, x1 // Preserve x1
>>>>
>>>> - mov x19, x0
>>>> - mov x20, sp
>>>> - sub x2, x2, x1 // Length = NewStackPointer-FramePtr
>>>> - sub sp, sp, x2
>>>> - sub sp, sp, #64 // Make sure there is room for at least 8 args in the new stack
>>>> - mov x0, sp
>>>> + //
>>>> + // If the EBC stack frame is smaller than or equal to 64 bytes, we know there
>>>> + // are no stacked arguments #9 and beyond that we need to copy to the native
>>>> + // stack. In this case, we can perform a tail call which is much more
>>>> + // efficient, since there is no need to touch the native stack at all.
>>>> + //
>>>> + sub x3, x2, x1 // Length = NewStackPointer - FramePtr
>>>> + cmp x3, #64
>>>> + b.gt 1f
>>>>
>>>> - bl CopyMem // Sp, NewStackPointer, Length
>>>> + adr x0, 0f
>>>> + sub x0, x0, x3, lsr #1
>>>> + br x0
>>>>
>>>> - ldp x0, x1, [sp], #16
>>>> - ldp x2, x3, [sp], #16
>>>> - ldp x4, x5, [sp], #16
>>>> - ldp x6, x7, [sp], #16
>>>> + ldr x7, [x9, #56]
>>>> + ldr x6, [x9, #48]
>>>> + ldr x5, [x9, #40]
>>>> + ldr x4, [x9, #32]
>>>> + ldr x3, [x9, #24]
>>>> + ldr x2, [x9, #16]
>>>> + ldr x1, [x9, #8]
>>>> + ldr x0, [x9]
>>>
>>> Why not keep using ldp, but with x9?
>>
>> This is a computed jump, depending on the size of the stack frame,
>> hence the inverted order, and the LDRs. While probably harmless in
>> most cases, it is arguably incorrect to copy random data from memory
>> into these registers (and could even be a security issue)
>
> OK, that's way too clever for someone like me to realise from the
> existing code (which arguably, you have simply corrected). Yes, even
> with the block comment above.
> Could we add some kind of for-dummies comment by this code to indicate
> what is going on at the actual instructions.
>
> Say:
>
> + ldr x7, [x9, #56] // Branch target for 8 arguments
> + ldr x6, [x9, #48] // |
> + ldr x5, [x9, #40] // |
> + ldr x4, [x9, #32] // |
> + ldr x3, [x9, #24] // |
> + ldr x2, [x9, #16] // |
> + ldr x1, [x9, #8] // V
> + ldr x0, [x9] // Branch target for 1 argument
>
> - blr x19
> +0: br x8 // Branch target for 0 arguments
>
> (Unless I'm still misunderstanding.)
>
No, that is accurate, and a worthwhile clarification
> /
> Leif
>
>>
>>>>
>>>> - blr x19
>>>> +0: br x8
>>>>
>>>> - mov sp, x20
>>>> - ldp x29, x30, [sp], #16
>>>> - ldp x19, x20, [sp], #16
>>>> + //
>>>> + // More than 64 bytes: we need to build the full native stack frame and copy
>>>> + // the part of the VM stack exceeding 64 bytes (which may contain stacked
>>>> + // arguments) to the native stack
>>>> + //
>>>> +1: stp x29, x30, [sp, #-16]!
>>>> + mov x29, sp
>>>>
>>>> - ret
>>>> + //
>>>> + // Ensure that the stack pointer remains 16 byte aligned,
>>>> + // even if the size of the VM stack frame is not a multiple of 16
>>>> + //
>>>> + add x1, x1, #64 // Skip over [potential] reg params
>>>> + tbz x3, #3, 2f // Multiple of 16?
>>>> + ldr x4, [x2, #-8]! // No? Then push one word
>>>> + str x4, [sp, #-16]! // ... but use two slots
>>>> + b 3f
>>>> +
>>>> +2: ldp x4, x5, [x2, #-16]!
>>>> + stp x4, x5, [sp, #-16]!
>>>> +3: cmp x2, x1
>>>> + b.gt 2b
>>>> +
>>>> + ldp x0, x1, [x9]
>>>> + ldp x2, x3, [x9, #16]
>>>> + ldp x4, x5, [x9, #32]
>>>> + ldp x6, x7, [x9, #48]
>>>> +
>>>> + blr x8
>>>> +
>>>> + mov sp, x29
>>>> + ldp x29, x30, [sp], #16
>>>> + ret
>>>>
>>>> //****************************************************************************
>>>> // EbcLLEbcInterpret
>>>> --
>>>> 2.7.4
>>>>
next prev parent reply other threads:[~2016-08-26 18:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-17 14:59 [PATCH v3 0/4] MdeModulePkg/EbcDxe: AARCH64 improvements Ard Biesheuvel
2016-08-17 14:59 ` [PATCH v3 1/4] MdeModulePkg/EbcDxe AARCH64: clean up comment style in ASM file Ard Biesheuvel
2016-08-26 11:06 ` Leif Lindholm
2016-08-17 14:59 ` [PATCH v3 2/4] MdeModulePkg/EbcDxe AARCH64: use a fixed size thunk structure Ard Biesheuvel
2016-08-26 12:28 ` Leif Lindholm
2016-08-17 14:59 ` [PATCH v3 3/4] MdeModulePkg/EbxDxe AARCH64: use tail call for EBC to native thunk Ard Biesheuvel
2016-08-26 12:54 ` Leif Lindholm
2016-08-26 17:14 ` Ard Biesheuvel
2016-08-26 18:10 ` Leif Lindholm
2016-08-26 18:41 ` Ard Biesheuvel [this message]
2016-08-17 14:59 ` [PATCH v3 4/4] MdeModulePkg/EbcDxe AARCH64: simplify interpreter entry point thunks Ard Biesheuvel
2016-08-26 12:56 ` Leif Lindholm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=240A60BE-4A88-4ACA-BAF4-680502EF8EFD@linaro.org \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox