From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-x236.google.com (mail-wm0-x236.google.com [IPv6:2a00:1450:400c:c09::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 38C561A1E59 for ; Fri, 26 Aug 2016 11:10:50 -0700 (PDT) Received: by mail-wm0-x236.google.com with SMTP id i5so3451245wmg.0 for ; Fri, 26 Aug 2016 11:10:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=jKpx2TDISPZv8+eawwHAKhWG0rCJdFMP6VLNggoUMck=; b=W5EqKwEP04vX7sAWcvoBc0E2pTZ5S2uVKtNxlemhwyDCkUbkMFYifAZIn6CcFseEe4 sEskyUm2BFU53UZo11FL/4A6SEngG7MVTUprE90DXyQZw0IUv5esRgkbDta5Wm/vXmTe koAYFzxzOIvltfYdSAvUr4JrvQn9EPPFUK/Y8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=jKpx2TDISPZv8+eawwHAKhWG0rCJdFMP6VLNggoUMck=; b=AbVrMBkslL6AcTZRDKC1kONEVFHDROFmh8NS42rUbRQlXXaQkLY7l8bqbhk0XDGIfY OS1fwvQiLjAzugLoAaAUPGwNRGRw6c1PJl1zTBkQHWvSgl7LfWX7KvQ6Iig4TVRs93Uv D0yPZZsDH+E81Taa5UkXB4QrHZTpx77Ng1Y4txvX1GHoDzv0QbnJ7rerZ4lOowmsCX4S tlDWZ1+pJkd18+oE0+vGGJICvchhqCkPNw0s4tmq98jj5cy6v8Q+u/hlpywaQ1nFezdb BEw6iHHH75qwrbQ46hDmqNHiGxS8q0gIm44qtjWs0qa2bdJP6uwY+IszU0RSl9oYlOu8 oh7Q== X-Gm-Message-State: AE9vXwM4N2nVmqsFkQITpVDRgyWgKkETuvb69adW5bI71p266Zp/6r5Q5mx8jo3ghvlKb/Pk X-Received: by 10.28.51.210 with SMTP id z201mr158420wmz.98.1472235048297; Fri, 26 Aug 2016 11:10:48 -0700 (PDT) Received: from bivouac.eciton.net (bivouac.eciton.net. [2a00:1098:0:86:1000:23:0:2]) by smtp.gmail.com with ESMTPSA id n2sm21068846wjd.1.2016.08.26.11.10.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 Aug 2016 11:10:47 -0700 (PDT) Date: Fri, 26 Aug 2016 19:10:45 +0100 From: Leif Lindholm To: Ard Biesheuvel Cc: edk2-devel-01 , Jeff Brasen , "Tian, Feng" , "Zeng, Star" , Daniil Egranov Message-ID: <20160826181045.GL4715@bivouac.eciton.net> References: <1471445945-19239-1-git-send-email-ard.biesheuvel@linaro.org> <1471445945-19239-4-git-send-email-ard.biesheuvel@linaro.org> <20160826125439.GF4715@bivouac.eciton.net> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [PATCH v3 3/4] MdeModulePkg/EbxDxe AARCH64: use tail call for EBC to native thunk X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Aug 2016 18:10:50 -0000 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Aug 26, 2016 at 06:14:10PM +0100, Ard Biesheuvel wrote: > On 26 August 2016 at 13:54, Leif Lindholm wrote: > > On Wed, Aug 17, 2016 at 04:59:04PM +0200, Ard Biesheuvel wrote: > >> Instead of pessimistically copying at least 64 bytes from the VM stack > >> to the native stack, and popping off the register arguments again > >> before doing the native call, try to avoid touching the stack completely > >> if the VM stack frame is < 64 bytes. Also, if the stack frame does exceed > > > > Should this say "does not"? > > No, if it exceeds 64 bytes, we may have arguments that should be > passed on the stack (but we're not sure, unfortunately, since the VM > stack frame is not annotated, i.e., it could simply be local vars in > the caller) In this case, there is no need to push all arguments onto > the native stack, and pop off the first 8 into registers again, which > is what the original code was doing. OK, thanks. Should that not be <= 64 bytes above then? > >> 64 bytes, there is no need to copy the first 64 bytes, since we are passing > >> those in registers anyway. > >> > >> Contributed-under: TianoCore Contribution Agreement 1.0 > >> Signed-off-by: Ard Biesheuvel > >> --- > >> MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S | 73 +++++++++++++++----- > >> 1 file changed, 55 insertions(+), 18 deletions(-) > >> > >> diff --git a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S > >> index cb7a70b5a4f8..d95713e82b0f 100644 > >> --- a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S > >> +++ b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S > >> @@ -35,30 +35,67 @@ ASM_GLOBAL ASM_PFX(mEbcInstructionBufferTemplate) > >> //**************************************************************************** > >> // UINTN EbcLLCALLEXNative(UINTN FuncAddr, UINTN NewStackPointer, VOID *FramePtr) > >> ASM_PFX(EbcLLCALLEXNative): > >> - stp x19, x20, [sp, #-16]! > >> - stp x29, x30, [sp, #-16]! > >> + mov x8, x0 // Preserve x0 > >> + mov x9, x1 // Preserve x1 > >> > >> - mov x19, x0 > >> - mov x20, sp > >> - sub x2, x2, x1 // Length = NewStackPointer-FramePtr > >> - sub sp, sp, x2 > >> - sub sp, sp, #64 // Make sure there is room for at least 8 args in the new stack > >> - mov x0, sp > >> + // > >> + // If the EBC stack frame is smaller than or equal to 64 bytes, we know there > >> + // are no stacked arguments #9 and beyond that we need to copy to the native > >> + // stack. In this case, we can perform a tail call which is much more > >> + // efficient, since there is no need to touch the native stack at all. > >> + // > >> + sub x3, x2, x1 // Length = NewStackPointer - FramePtr > >> + cmp x3, #64 > >> + b.gt 1f > >> > >> - bl CopyMem // Sp, NewStackPointer, Length > >> + adr x0, 0f > >> + sub x0, x0, x3, lsr #1 > >> + br x0 > >> > >> - ldp x0, x1, [sp], #16 > >> - ldp x2, x3, [sp], #16 > >> - ldp x4, x5, [sp], #16 > >> - ldp x6, x7, [sp], #16 > >> + ldr x7, [x9, #56] > >> + ldr x6, [x9, #48] > >> + ldr x5, [x9, #40] > >> + ldr x4, [x9, #32] > >> + ldr x3, [x9, #24] > >> + ldr x2, [x9, #16] > >> + ldr x1, [x9, #8] > >> + ldr x0, [x9] > > > > Why not keep using ldp, but with x9? > > This is a computed jump, depending on the size of the stack frame, > hence the inverted order, and the LDRs. While probably harmless in > most cases, it is arguably incorrect to copy random data from memory > into these registers (and could even be a security issue) OK, that's way too clever for someone like me to realise from the existing code (which arguably, you have simply corrected). Yes, even with the block comment above. Could we add some kind of for-dummies comment by this code to indicate what is going on at the actual instructions. Say: + ldr x7, [x9, #56] // Branch target for 8 arguments + ldr x6, [x9, #48] // | + ldr x5, [x9, #40] // | + ldr x4, [x9, #32] // | + ldr x3, [x9, #24] // | + ldr x2, [x9, #16] // | + ldr x1, [x9, #8] // V + ldr x0, [x9] // Branch target for 1 argument - blr x19 +0: br x8 // Branch target for 0 arguments (Unless I'm still misunderstanding.) / Leif > > >> > >> - blr x19 > >> +0: br x8 > >> > >> - mov sp, x20 > >> - ldp x29, x30, [sp], #16 > >> - ldp x19, x20, [sp], #16 > >> + // > >> + // More than 64 bytes: we need to build the full native stack frame and copy > >> + // the part of the VM stack exceeding 64 bytes (which may contain stacked > >> + // arguments) to the native stack > >> + // > >> +1: stp x29, x30, [sp, #-16]! > >> + mov x29, sp > >> > >> - ret > >> + // > >> + // Ensure that the stack pointer remains 16 byte aligned, > >> + // even if the size of the VM stack frame is not a multiple of 16 > >> + // > >> + add x1, x1, #64 // Skip over [potential] reg params > >> + tbz x3, #3, 2f // Multiple of 16? > >> + ldr x4, [x2, #-8]! // No? Then push one word > >> + str x4, [sp, #-16]! // ... but use two slots > >> + b 3f > >> + > >> +2: ldp x4, x5, [x2, #-16]! > >> + stp x4, x5, [sp, #-16]! > >> +3: cmp x2, x1 > >> + b.gt 2b > >> + > >> + ldp x0, x1, [x9] > >> + ldp x2, x3, [x9, #16] > >> + ldp x4, x5, [x9, #32] > >> + ldp x6, x7, [x9, #48] > >> + > >> + blr x8 > >> + > >> + mov sp, x29 > >> + ldp x29, x30, [sp], #16 > >> + ret > >> > >> //**************************************************************************** > >> // EbcLLEbcInterpret > >> -- > >> 2.7.4 > >>