From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com [IPv6:2a00:1450:400c:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id E000C1A1E59 for ; Fri, 26 Aug 2016 11:42:21 -0700 (PDT) Received: by mail-wm0-x233.google.com with SMTP id i5so4519700wmg.0 for ; Fri, 26 Aug 2016 11:42:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=references:mime-version:in-reply-to:content-transfer-encoding :message-id:cc:from:subject:date:to; bh=gIQkGRsvdeZ4RCz2rbAFt9KR1rMPLPKcWHkW1cioy+w=; b=e/PxuVNY+V8LQE7Tgm7wHd5aDDto4mZj6RAfJv58SsqVUUtf+bOOPHDVk2ffMT+sOg eqlQ01fWH0w6vNjkrxSqHd3FMAX+uamnPxTcS20XRBv5EhHZJ/c/LbEt1WukzV03P2+Q e3kbPHqJ3a16o67s2UFH9MV38urCWtM0+yNVg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:references:mime-version:in-reply-to :content-transfer-encoding:message-id:cc:from:subject:date:to; bh=gIQkGRsvdeZ4RCz2rbAFt9KR1rMPLPKcWHkW1cioy+w=; b=jkq8wP6MXhiboIIVeiR+7mquc2q4IDlpnk5R3YeY+BPTd5TP/xM+YnGncglqQK1MXi EzMaAMiUJ/57cxvTbo6VGPJXfj/YoPd2muy9kmG6SqEb/hepJsxNfWG0gQ7MdIxzH8mp vT2qEx4iLM9fvFsDseDzyQvnp3plIgoeHRY+TQhgM7NRlJffLS8US7vF3SULSsc6fDXp KL5cnBpxtWgyz6+B7TmgopmxGwmFHvcTWMbIAzAbYdKYrZzofU3jNbTsM0E1nineA2EG IE/MIIq/N35EA/lJdJpGJxuMQ5Mx7FryotYtBw4cccmvyROrssWWOoqnkIX7tWkzbInY vLyw== X-Gm-Message-State: AE9vXwPxAtCb+5H9xgtTkkO9aLAGIZzZr1IOnPBqieeu2cLDSLmApXjkgUM+jHbaCv5bsy2c X-Received: by 10.194.105.40 with SMTP id gj8mr6137975wjb.71.1472236940098; Fri, 26 Aug 2016 11:42:20 -0700 (PDT) Received: from [10.20.158.133] (dynrak234g-133-134-67-105.inwitelecom.net. [105.67.134.133]) by smtp.gmail.com with ESMTPSA id x203sm294242wmg.0.2016.08.26.11.42.08 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 26 Aug 2016 11:42:19 -0700 (PDT) References: <1471445945-19239-1-git-send-email-ard.biesheuvel@linaro.org> <1471445945-19239-4-git-send-email-ard.biesheuvel@linaro.org> <20160826125439.GF4715@bivouac.eciton.net> <20160826181045.GL4715@bivouac.eciton.net> Mime-Version: 1.0 (1.0) In-Reply-To: <20160826181045.GL4715@bivouac.eciton.net> Message-Id: <240A60BE-4A88-4ACA-BAF4-680502EF8EFD@linaro.org> Cc: edk2-devel-01 , Jeff Brasen , "Tian, Feng" , "Zeng, Star" , Daniil Egranov X-Mailer: iPhone Mail (13G35) From: Ard Biesheuvel Date: Fri, 26 Aug 2016 19:41:41 +0100 To: Leif Lindholm Subject: Re: [PATCH v3 3/4] MdeModulePkg/EbxDxe AARCH64: use tail call for EBC to native thunk X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Aug 2016 18:42:22 -0000 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable > On 26 aug. 2016, at 19:10, Leif Lindholm wrote:= >=20 >> On Fri, Aug 26, 2016 at 06:14:10PM +0100, Ard Biesheuvel wrote: >>> On 26 August 2016 at 13:54, Leif Lindholm wro= te: >>>> On Wed, Aug 17, 2016 at 04:59:04PM +0200, Ard Biesheuvel wrote: >>>> Instead of pessimistically copying at least 64 bytes from the VM stack >>>> to the native stack, and popping off the register arguments again >>>> before doing the native call, try to avoid touching the stack completel= y >>>> if the VM stack frame is < 64 bytes. Also, if the stack frame does exce= ed >>>=20 >>> Should this say "does not"? >>=20 >> No, if it exceeds 64 bytes, we may have arguments that should be >> passed on the stack (but we're not sure, unfortunately, since the VM >> stack frame is not annotated, i.e., it could simply be local vars in >> the caller) In this case, there is no need to push all arguments onto >> the native stack, and pop off the first 8 into registers again, which >> is what the original code was doing. >=20 > OK, thanks. > Should that not be <=3D 64 bytes above then? >=20 Correct >>>> 64 bytes, there is no need to copy the first 64 bytes, since we are pas= sing >>>> those in registers anyway. >>>>=20 >>>> Contributed-under: TianoCore Contribution Agreement 1.0 >>>> Signed-off-by: Ard Biesheuvel >>>> --- >>>> MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S | 73 ++++++++++++++= +----- >>>> 1 file changed, 55 insertions(+), 18 deletions(-) >>>>=20 >>>> diff --git a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S b/MdeM= odulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S >>>> index cb7a70b5a4f8..d95713e82b0f 100644 >>>> --- a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S >>>> +++ b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S >>>> @@ -35,30 +35,67 @@ ASM_GLOBAL ASM_PFX(mEbcInstructionBufferTemplate) >>>> //*********************************************************************= ******* >>>> // UINTN EbcLLCALLEXNative(UINTN FuncAddr, UINTN NewStackPointer, VOID *= FramePtr) >>>> ASM_PFX(EbcLLCALLEXNative): >>>> - stp x19, x20, [sp, #-16]! >>>> - stp x29, x30, [sp, #-16]! >>>> + mov x8, x0 // Preserve x0 >>>> + mov x9, x1 // Preserve x1 >>>>=20 >>>> - mov x19, x0 >>>> - mov x20, sp >>>> - sub x2, x2, x1 // Length =3D NewStackPointer-FramePtr >>>> - sub sp, sp, x2 >>>> - sub sp, sp, #64 // Make sure there is room for at least 8 args= in the new stack >>>> - mov x0, sp >>>> + // >>>> + // If the EBC stack frame is smaller than or equal to 64 bytes, we= know there >>>> + // are no stacked arguments #9 and beyond that we need to copy to t= he native >>>> + // stack. In this case, we can perform a tail call which is much m= ore >>>> + // efficient, since there is no need to touch the native stack at a= ll. >>>> + // >>>> + sub x3, x2, x1 // Length =3D NewStackPointer - Fra= mePtr >>>> + cmp x3, #64 >>>> + b.gt 1f >>>>=20 >>>> - bl CopyMem // Sp, NewStackPointer, Length >>>> + adr x0, 0f >>>> + sub x0, x0, x3, lsr #1 >>>> + br x0 >>>>=20 >>>> - ldp x0, x1, [sp], #16 >>>> - ldp x2, x3, [sp], #16 >>>> - ldp x4, x5, [sp], #16 >>>> - ldp x6, x7, [sp], #16 >>>> + ldr x7, [x9, #56] >>>> + ldr x6, [x9, #48] >>>> + ldr x5, [x9, #40] >>>> + ldr x4, [x9, #32] >>>> + ldr x3, [x9, #24] >>>> + ldr x2, [x9, #16] >>>> + ldr x1, [x9, #8] >>>> + ldr x0, [x9] >>>=20 >>> Why not keep using ldp, but with x9? >>=20 >> This is a computed jump, depending on the size of the stack frame, >> hence the inverted order, and the LDRs. While probably harmless in >> most cases, it is arguably incorrect to copy random data from memory >> into these registers (and could even be a security issue) >=20 > OK, that's way too clever for someone like me to realise from the > existing code (which arguably, you have simply corrected). Yes, even > with the block comment above. > Could we add some kind of for-dummies comment by this code to indicate > what is going on at the actual instructions. >=20 > Say: >=20 > + ldr x7, [x9, #56] // Branch target for 8 arguments > + ldr x6, [x9, #48] // | > + ldr x5, [x9, #40] // | > + ldr x4, [x9, #32] // | > + ldr x3, [x9, #24] // | > + ldr x2, [x9, #16] // | > + ldr x1, [x9, #8] // V > + ldr x0, [x9] // Branch target for 1 argument >=20 > - blr x19 > +0: br x8 // Branch target for 0 arguments >=20 > (Unless I'm still misunderstanding.) >=20 No, that is accurate, and a worthwhile clarification > / > Leif >=20 >>=20 >>>>=20 >>>> - blr x19 >>>> +0: br x8 >>>>=20 >>>> - mov sp, x20 >>>> - ldp x29, x30, [sp], #16 >>>> - ldp x19, x20, [sp], #16 >>>> + // >>>> + // More than 64 bytes: we need to build the full native stack fram= e and copy >>>> + // the part of the VM stack exceeding 64 bytes (which may contain s= tacked >>>> + // arguments) to the native stack >>>> + // >>>> +1: stp x29, x30, [sp, #-16]! >>>> + mov x29, sp >>>>=20 >>>> - ret >>>> + // >>>> + // Ensure that the stack pointer remains 16 byte aligned, >>>> + // even if the size of the VM stack frame is not a multiple of 16 >>>> + // >>>> + add x1, x1, #64 // Skip over [potential] reg param= s >>>> + tbz x3, #3, 2f // Multiple of 16? >>>> + ldr x4, [x2, #-8]! // No? Then push one word >>>> + str x4, [sp, #-16]! // ... but use two slots >>>> + b 3f >>>> + >>>> +2: ldp x4, x5, [x2, #-16]! >>>> + stp x4, x5, [sp, #-16]! >>>> +3: cmp x2, x1 >>>> + b.gt 2b >>>> + >>>> + ldp x0, x1, [x9] >>>> + ldp x2, x3, [x9, #16] >>>> + ldp x4, x5, [x9, #32] >>>> + ldp x6, x7, [x9, #48] >>>> + >>>> + blr x8 >>>> + >>>> + mov sp, x29 >>>> + ldp x29, x30, [sp], #16 >>>> + ret >>>>=20 >>>> //*********************************************************************= ******* >>>> // EbcLLEbcInterpret >>>> -- >>>> 2.7.4 >>>>=20