From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-x231.google.com (mail-wm0-x231.google.com [IPv6:2a00:1450:400c:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 9D4B61A1E05 for ; Tue, 30 Aug 2016 11:52:38 -0700 (PDT) Received: by mail-wm0-x231.google.com with SMTP id 1so2454927wmz.1 for ; Tue, 30 Aug 2016 11:52:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=KOXApo5kjL+AfEya2djKfKe88uX01QxPE++hKAuy9Pc=; b=j+n2NDeeh5CurJzjJiOVW1lSg71KhliEoc7IPFWt7vh/wdFxVOPftoV5WH+eTl148b ggMoZ2eD2IGDHmTANtRSAU0OkVAjU72EMgNYtKmek8ThBfliTvqNZlh0PspIV4sNpkL3 jSDNLy4vG9DEBJ/HlJdzLeEtVhWSeDicZdAP8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=KOXApo5kjL+AfEya2djKfKe88uX01QxPE++hKAuy9Pc=; b=QgW5xV5crqHZ83hJ4caC+Dqf3dmXWQ8nhAiJOTAQTsCz4rwcprfDCpBEaMjNCn8YCU KTFoX87t9AZWssyHBees8htEojCSio9xhGa1dfCk4KCR04JAhKyWutr7HZfR/w7qeSZ2 CuiLLKbm1w+IfvGccFq5CvHrhaaQghJiWGjaODcJMBK6uEI1sVqoJLgkWgdbGdWEdm3x aiub2C8AlrusGIoE1MS908MNaTzRAVvnOIp67Cg28N6SoSGL4smSaxnHumsml1f3+fxR QFDfmFV1zKvygRs/RHuD4jd/6Hl3vpPvk+PXBQxvgOYm+LD7PK8VaqIFLUcr52VbpLYO ZwWA== X-Gm-Message-State: AE9vXwMsKUTAe3ByIO4gNHL99BzrotjJcMAG7P99dIjIKRlro1epfOLmohH+2JPodL1HaGUe X-Received: by 10.28.156.213 with SMTP id f204mr18048870wme.86.1472583156931; Tue, 30 Aug 2016 11:52:36 -0700 (PDT) Received: from bivouac.eciton.net (bivouac.eciton.net. [2a00:1098:0:86:1000:23:0:2]) by smtp.gmail.com with ESMTPSA id gg10sm40877259wjd.4.2016.08.30.11.52.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Aug 2016 11:52:36 -0700 (PDT) Date: Tue, 30 Aug 2016 19:52:34 +0100 From: Leif Lindholm To: Ard Biesheuvel Cc: edk2-devel@lists.01.org Message-ID: <20160830185234.GQ4715@bivouac.eciton.net> References: <1472567244-32031-1-git-send-email-ard.biesheuvel@linaro.org> <1472567244-32031-4-git-send-email-ard.biesheuvel@linaro.org> MIME-Version: 1.0 In-Reply-To: <1472567244-32031-4-git-send-email-ard.biesheuvel@linaro.org> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [PATCH v4 3/4] MdeModulePkg/EbxDxe AARCH64: use tail call for EBC to native thunk X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2016 18:52:39 -0000 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Aug 30, 2016 at 03:27:23PM +0100, Ard Biesheuvel wrote: > Instead of pessimistically copying at least 64 bytes from the VM stack > to the native stack, and popping off the register arguments again > before doing the native call, try to avoid touching the stack completely > if the VM stack frame is <= 64 bytes. Also, if the stack frame does exceed > 64 bytes, there is no need to copy the first 64 bytes, since we are passing > those in registers anyway. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Ard Biesheuvel Reviewed-by: Leif Lindholm > --- > MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S | 85 +++++++++++++++----- > 1 file changed, 65 insertions(+), 20 deletions(-) > > diff --git a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S > index b4b8531f1a01..34794c06a644 100644 > --- a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S > +++ b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S > @@ -35,30 +35,75 @@ ASM_GLOBAL ASM_PFX(mEbcInstructionBufferTemplate) > //**************************************************************************** > // UINTN EbcLLCALLEXNative(UINTN FuncAddr, UINTN NewStackPointer, VOID *FramePtr) > ASM_PFX(EbcLLCALLEXNative): > - stp x19, x20, [sp, #-16]! > - stp x29, x30, [sp, #-16]! > + mov x8, x0 // Preserve x0 > + mov x9, x1 // Preserve x1 > > - mov x19, x0 > - mov x20, sp > - sub x2, x2, x1 // Length = NewStackPointer-FramePtr > - sub sp, sp, x2 > - sub sp, sp, #64 // Make sure there is room for at least 8 args in the new stack > - mov x0, sp > - > - bl CopyMem // Sp, NewStackPointer, Length > - > - ldp x0, x1, [sp], #16 > - ldp x2, x3, [sp], #16 > - ldp x4, x5, [sp], #16 > - ldp x6, x7, [sp], #16 > + // > + // If the EBC stack frame is smaller than or equal to 64 bytes, we know there > + // are no stacked arguments #9 and beyond that we need to copy to the native > + // stack. In this case, we can perform a tail call which is much more > + // efficient, since there is no need to touch the native stack at all. > + // > + sub x3, x2, x1 // Length = NewStackPointer - FramePtr > + cmp x3, #64 > + b.gt 1f > > - blr x19 > + // > + // While probably harmless in practice, we should not access the VM stack > + // outside of the interval [NewStackPointer, FramePtr), which means we > + // should not blindly fill all 8 argument registers with VM stack data. > + // So instead, calculate how many argument registers we can fill based on > + // the size of the VM stack frame, and skip the remaining ones. > + // > + adr x0, 0f // Take address of 'br' instruction below > + bic x3, x3, #7 // Ensure correct alignment > + sub x0, x0, x3, lsr #1 // Subtract 4 bytes for each arg to unstack > + br x0 // Skip remaining argument registers > + > + ldr x7, [x9, #56] // Call with 8 arguments > + ldr x6, [x9, #48] // | > + ldr x5, [x9, #40] // | > + ldr x4, [x9, #32] // | > + ldr x3, [x9, #24] // | > + ldr x2, [x9, #16] // | > + ldr x1, [x9, #8] // V > + ldr x0, [x9] // Call with 1 argument > + > +0: br x8 // Call with no arguments > > - mov sp, x20 > - ldp x29, x30, [sp], #16 > - ldp x19, x20, [sp], #16 > + // > + // More than 64 bytes: we need to build the full native stack frame and copy > + // the part of the VM stack exceeding 64 bytes (which may contain stacked > + // arguments) to the native stack > + // > +1: stp x29, x30, [sp, #-16]! > + mov x29, sp > > - ret > + // > + // Ensure that the stack pointer remains 16 byte aligned, > + // even if the size of the VM stack frame is not a multiple of 16 > + // > + add x1, x1, #64 // Skip over [potential] reg params > + tbz x3, #3, 2f // Multiple of 16? > + ldr x4, [x2, #-8]! // No? Then push one word > + str x4, [sp, #-16]! // ... but use two slots > + b 3f > + > +2: ldp x4, x5, [x2, #-16]! > + stp x4, x5, [sp, #-16]! > +3: cmp x2, x1 > + b.gt 2b > + > + ldp x0, x1, [x9] > + ldp x2, x3, [x9, #16] > + ldp x4, x5, [x9, #32] > + ldp x6, x7, [x9, #48] > + > + blr x8 > + > + mov sp, x29 > + ldp x29, x30, [sp], #16 > + ret > > //**************************************************************************** > // EbcLLEbcInterpret > -- > 2.7.4 >