From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ard.biesheuvel@linaro.org>
Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com
 [IPv6:2a00:1450:400c:c09::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id E000C1A1E59
 for <edk2-devel@lists.01.org>; Fri, 26 Aug 2016 11:42:21 -0700 (PDT)
Received: by mail-wm0-x233.google.com with SMTP id i5so4519700wmg.0
 for <edk2-devel@lists.01.org>; Fri, 26 Aug 2016 11:42:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
 h=references:mime-version:in-reply-to:content-transfer-encoding
 :message-id:cc:from:subject:date:to;
 bh=gIQkGRsvdeZ4RCz2rbAFt9KR1rMPLPKcWHkW1cioy+w=;
 b=e/PxuVNY+V8LQE7Tgm7wHd5aDDto4mZj6RAfJv58SsqVUUtf+bOOPHDVk2ffMT+sOg
 eqlQ01fWH0w6vNjkrxSqHd3FMAX+uamnPxTcS20XRBv5EhHZJ/c/LbEt1WukzV03P2+Q
 e3kbPHqJ3a16o67s2UFH9MV38urCWtM0+yNVg=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:references:mime-version:in-reply-to
 :content-transfer-encoding:message-id:cc:from:subject:date:to;
 bh=gIQkGRsvdeZ4RCz2rbAFt9KR1rMPLPKcWHkW1cioy+w=;
 b=jkq8wP6MXhiboIIVeiR+7mquc2q4IDlpnk5R3YeY+BPTd5TP/xM+YnGncglqQK1MXi
 EzMaAMiUJ/57cxvTbo6VGPJXfj/YoPd2muy9kmG6SqEb/hepJsxNfWG0gQ7MdIxzH8mp
 vT2qEx4iLM9fvFsDseDzyQvnp3plIgoeHRY+TQhgM7NRlJffLS8US7vF3SULSsc6fDXp
 KL5cnBpxtWgyz6+B7TmgopmxGwmFHvcTWMbIAzAbYdKYrZzofU3jNbTsM0E1nineA2EG
 IE/MIIq/N35EA/lJdJpGJxuMQ5Mx7FryotYtBw4cccmvyROrssWWOoqnkIX7tWkzbInY
 vLyw==
X-Gm-Message-State: AE9vXwPxAtCb+5H9xgtTkkO9aLAGIZzZr1IOnPBqieeu2cLDSLmApXjkgUM+jHbaCv5bsy2c
X-Received: by 10.194.105.40 with SMTP id gj8mr6137975wjb.71.1472236940098;
 Fri, 26 Aug 2016 11:42:20 -0700 (PDT)
Received: from [10.20.158.133] (dynrak234g-133-134-67-105.inwitelecom.net.
 [105.67.134.133])
 by smtp.gmail.com with ESMTPSA id x203sm294242wmg.0.2016.08.26.11.42.08
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Fri, 26 Aug 2016 11:42:19 -0700 (PDT)
References: <1471445945-19239-1-git-send-email-ard.biesheuvel@linaro.org>
 <1471445945-19239-4-git-send-email-ard.biesheuvel@linaro.org>
 <20160826125439.GF4715@bivouac.eciton.net>
 <CAKv+Gu_rgPcwh66_Le-4fhhx7FFtEr6YHtYfzYP1999yGYQSJw@mail.gmail.com>
 <20160826181045.GL4715@bivouac.eciton.net>
Mime-Version: 1.0 (1.0)
In-Reply-To: <20160826181045.GL4715@bivouac.eciton.net>
Message-Id: <240A60BE-4A88-4ACA-BAF4-680502EF8EFD@linaro.org>
Cc: edk2-devel-01 <edk2-devel@lists.01.org>,
 Jeff Brasen <jbrasen@codeaurora.org>, "Tian, Feng" <feng.tian@intel.com>,
 "Zeng, Star" <star.zeng@intel.com>, Daniil Egranov <daniil.egranov@arm.com>
X-Mailer: iPhone Mail (13G35)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Date: Fri, 26 Aug 2016 19:41:41 +0100
To: Leif Lindholm <leif.lindholm@linaro.org>
Subject: Re: [PATCH v3 3/4] MdeModulePkg/EbxDxe AARCH64: use tail call for EBC to native thunk
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Aug 2016 18:42:22 -0000
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable


> On 26 aug. 2016, at 19:10, Leif Lindholm <leif.lindholm@linaro.org> wrote:=

>=20
>> On Fri, Aug 26, 2016 at 06:14:10PM +0100, Ard Biesheuvel wrote:
>>> On 26 August 2016 at 13:54, Leif Lindholm <leif.lindholm@linaro.org> wro=
te:
>>>> On Wed, Aug 17, 2016 at 04:59:04PM +0200, Ard Biesheuvel wrote:
>>>> Instead of pessimistically copying at least 64 bytes from the VM stack
>>>> to the native stack, and popping off the register arguments again
>>>> before doing the native call, try to avoid touching the stack completel=
y
>>>> if the VM stack frame is < 64 bytes. Also, if the stack frame does exce=
ed
>>>=20
>>> Should this say "does not"?
>>=20
>> No, if it exceeds 64 bytes, we may have arguments that should be
>> passed on the stack (but we're not sure, unfortunately, since the VM
>> stack frame is not annotated, i.e., it could simply be local vars in
>> the caller) In this case, there is no need to push all arguments onto
>> the native stack, and pop off the first 8 into registers again, which
>> is what the original code was doing.
>=20
> OK, thanks.
> Should that not be <=3D 64 bytes above then?
>=20

Correct

>>>> 64 bytes, there is no need to copy the first 64 bytes, since we are pas=
sing
>>>> those in registers anyway.
>>>>=20
>>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>> MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S | 73 ++++++++++++++=
+-----
>>>> 1 file changed, 55 insertions(+), 18 deletions(-)
>>>>=20
>>>> diff --git a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S b/MdeM=
odulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S
>>>> index cb7a70b5a4f8..d95713e82b0f 100644
>>>> --- a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S
>>>> +++ b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S
>>>> @@ -35,30 +35,67 @@ ASM_GLOBAL ASM_PFX(mEbcInstructionBufferTemplate)
>>>> //*********************************************************************=
*******
>>>> // UINTN EbcLLCALLEXNative(UINTN FuncAddr, UINTN NewStackPointer, VOID *=
FramePtr)
>>>> ASM_PFX(EbcLLCALLEXNative):
>>>> -      stp  x19, x20, [sp, #-16]!
>>>> -      stp  x29, x30, [sp, #-16]!
>>>> +    mov     x8, x0                 // Preserve x0
>>>> +    mov     x9, x1                 // Preserve x1
>>>>=20
>>>> -      mov  x19, x0
>>>> -      mov  x20, sp
>>>> -      sub  x2, x2, x1   // Length =3D NewStackPointer-FramePtr
>>>> -      sub  sp, sp, x2
>>>> -      sub  sp, sp, #64  // Make sure there is room for at least 8 args=
 in the new stack
>>>> -      mov  x0, sp
>>>> +    //
>>>> +    // If the EBC stack frame is smaller than or equal to 64 bytes, we=
 know there
>>>> +    // are no stacked arguments #9 and beyond that we need to copy to t=
he native
>>>> +    // stack. In this case, we can perform a tail call which is much m=
ore
>>>> +    // efficient, since there is no need to touch the native stack at a=
ll.
>>>> +    //
>>>> +    sub     x3, x2, x1              // Length =3D NewStackPointer - Fra=
mePtr
>>>> +    cmp     x3, #64
>>>> +    b.gt    1f
>>>>=20
>>>> -      bl   CopyMem      // Sp, NewStackPointer, Length
>>>> +    adr     x0, 0f
>>>> +    sub     x0, x0, x3, lsr #1
>>>> +    br      x0
>>>>=20
>>>> -      ldp  x0, x1, [sp], #16
>>>> -      ldp  x2, x3, [sp], #16
>>>> -      ldp  x4, x5, [sp], #16
>>>> -      ldp  x6, x7, [sp], #16
>>>> +    ldr     x7, [x9, #56]
>>>> +    ldr     x6, [x9, #48]
>>>> +    ldr     x5, [x9, #40]
>>>> +    ldr     x4, [x9, #32]
>>>> +    ldr     x3, [x9, #24]
>>>> +    ldr     x2, [x9, #16]
>>>> +    ldr     x1, [x9, #8]
>>>> +    ldr     x0, [x9]
>>>=20
>>> Why not keep using ldp, but with x9?
>>=20
>> This is a computed jump, depending on the size of the stack frame,
>> hence the inverted order, and the LDRs. While probably harmless in
>> most cases, it is arguably incorrect to copy random data from memory
>> into these registers (and could even be a security issue)
>=20
> OK, that's way too clever for someone like me to realise from the
> existing code (which arguably, you have simply corrected). Yes, even
> with the block comment above.
> Could we add some kind of for-dummies comment by this code to indicate
> what is going on at the actual instructions.
>=20
> Say:
>=20
> +    ldr     x7, [x9, #56]  // Branch target for 8 arguments
> +    ldr     x6, [x9, #48]  // |
> +    ldr     x5, [x9, #40]  // |
> +    ldr     x4, [x9, #32]  // |
> +    ldr     x3, [x9, #24]  // |
> +    ldr     x2, [x9, #16]  // |
> +    ldr     x1, [x9, #8]   // V
> +    ldr     x0, [x9]       // Branch target for 1 argument
>=20
> -      blr  x19
> +0:  br      x8             // Branch target for 0 arguments
>=20
> (Unless I'm still misunderstanding.)
>=20

No, that is accurate, and a worthwhile clarification

> /
>    Leif
>=20
>>=20
>>>>=20
>>>> -      blr  x19
>>>> +0:  br      x8
>>>>=20
>>>> -      mov  sp,  x20
>>>> -      ldp  x29, x30, [sp], #16
>>>> -      ldp  x19, x20, [sp], #16
>>>> +    //
>>>> +    // More than 64 bytes: we need to build the full native stack fram=
e and copy
>>>> +    // the part of the VM stack exceeding 64 bytes (which may contain s=
tacked
>>>> +    // arguments) to the native stack
>>>> +    //
>>>> +1:  stp     x29, x30, [sp, #-16]!
>>>> +    mov     x29, sp
>>>>=20
>>>> -      ret
>>>> +    //
>>>> +    // Ensure that the stack pointer remains 16 byte aligned,
>>>> +    // even if the size of the VM stack frame is not a multiple of 16
>>>> +    //
>>>> +    add     x1, x1, #64             // Skip over [potential] reg param=
s
>>>> +    tbz     x3, #3, 2f              // Multiple of 16?
>>>> +    ldr     x4, [x2, #-8]!          // No? Then push one word
>>>> +    str     x4, [sp, #-16]!         // ... but use two slots
>>>> +    b       3f
>>>> +
>>>> +2:  ldp     x4, x5, [x2, #-16]!
>>>> +    stp     x4, x5, [sp, #-16]!
>>>> +3:  cmp     x2, x1
>>>> +    b.gt    2b
>>>> +
>>>> +    ldp     x0, x1, [x9]
>>>> +    ldp     x2, x3, [x9, #16]
>>>> +    ldp     x4, x5, [x9, #32]
>>>> +    ldp     x6, x7, [x9, #48]
>>>> +
>>>> +    blr     x8
>>>> +
>>>> +    mov     sp, x29
>>>> +    ldp     x29, x30, [sp], #16
>>>> +    ret
>>>>=20
>>>> //*********************************************************************=
*******
>>>> // EbcLLEbcInterpret
>>>> --
>>>> 2.7.4
>>>>=20