From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com [IPv6:2a00:1450:400c:c09::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7A74B1A1E3D for ; Wed, 31 Aug 2016 01:46:04 -0700 (PDT) Received: by mail-wm0-x232.google.com with SMTP id w2so20728226wmd.0 for ; Wed, 31 Aug 2016 01:46:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ah2x0s6ZIwS11b5dHUheX1nPTawfVSuNEu2rEqkh8xQ=; b=EHSJJLrxR7dRE/6eNK/TyZZpP0roGzLIeh1odAnQMszBaSpu2Rbi9aR45TBrOuSzxe L6ZpQhrvOsutQa0bgNV8qNy4XZU37BZbucTuKIEm6cX2rjkfsK0MYv1DVUpYJ/ARqsde jZfGXZiYuJry3AnNa9bAMcOEx6Kna4OAU+LkA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ah2x0s6ZIwS11b5dHUheX1nPTawfVSuNEu2rEqkh8xQ=; b=Fer8AQZoK7TJKEWJV72Li7Y7jSd++RyoLjgxvhOEltdP+t9LpsqVThb7Qbx232llt6 wBoxxifCJKqWuUa5NJL3Hbimh34+uHqDlpP8CLYYr1yHgVDiYd7wwWxZ0qkMD3bpQqZD igMJuEo4/4cDxiJxk/fz28PJhhb6vknJEHKOIMcGj186nLJMb/9NsINCPXcDviBE5gdd PGJ+O/kJwIm/0bywXbQGe/wC6HSqrb/d9ghvBRNCF9d8hpiXFZ8TeAwWgGVCFdJVkuFB 3KX2PH79CTJuhqfnFYLAxFnOVVXjpKM2I8omn9T91J5P2W35qgM8BrH1lfuQpREpQmIk wqbA== X-Gm-Message-State: AE9vXwNcxdv5rliBYP5SCbB8r1GHeQH1UV5UdaMfeszAHF03kV/sHJRM6/BrV2Cc76v5as3Q X-Received: by 10.194.157.194 with SMTP id wo2mr1916656wjb.120.1472633162959; Wed, 31 Aug 2016 01:46:02 -0700 (PDT) Received: from localhost.localdomain ([160.169.158.74]) by smtp.gmail.com with ESMTPSA id f10sm43240579wje.14.2016.08.31.01.46.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 31 Aug 2016 01:46:02 -0700 (PDT) From: Ard Biesheuvel To: edk2-devel@lists.01.org, feng.tian@intel.com, star.zeng@intel.com Cc: leif.lindholm@linaro.org, Ard Biesheuvel Date: Wed, 31 Aug 2016 09:45:48 +0100 Message-Id: <1472633149-13817-4-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1472633149-13817-1-git-send-email-ard.biesheuvel@linaro.org> References: <1472633149-13817-1-git-send-email-ard.biesheuvel@linaro.org> Subject: [PATCH v5 3/4] MdeModulePkg/EbcDxe AARCH64: use tail call for EBC to native thunk X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2016 08:46:04 -0000 Instead of pessimistically copying at least 64 bytes from the VM stack to the native stack, and popping off the register arguments again before doing the native call, try to avoid touching the stack completely if the VM stack frame is <= 64 bytes. Also, if the stack frame does exceed 64 bytes, there is no need to copy the first 64 bytes, since we are passing those in registers anyway. Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Ard Biesheuvel Reviewed-by: Leif Lindholm --- MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S | 85 +++++++++++++++----- 1 file changed, 65 insertions(+), 20 deletions(-) diff --git a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S index b4b8531f1a01..34794c06a644 100644 --- a/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S +++ b/MdeModulePkg/Universal/EbcDxe/AArch64/EbcLowLevel.S @@ -35,30 +35,75 @@ ASM_GLOBAL ASM_PFX(mEbcInstructionBufferTemplate) //**************************************************************************** // UINTN EbcLLCALLEXNative(UINTN FuncAddr, UINTN NewStackPointer, VOID *FramePtr) ASM_PFX(EbcLLCALLEXNative): - stp x19, x20, [sp, #-16]! - stp x29, x30, [sp, #-16]! + mov x8, x0 // Preserve x0 + mov x9, x1 // Preserve x1 - mov x19, x0 - mov x20, sp - sub x2, x2, x1 // Length = NewStackPointer-FramePtr - sub sp, sp, x2 - sub sp, sp, #64 // Make sure there is room for at least 8 args in the new stack - mov x0, sp - - bl CopyMem // Sp, NewStackPointer, Length - - ldp x0, x1, [sp], #16 - ldp x2, x3, [sp], #16 - ldp x4, x5, [sp], #16 - ldp x6, x7, [sp], #16 + // + // If the EBC stack frame is smaller than or equal to 64 bytes, we know there + // are no stacked arguments #9 and beyond that we need to copy to the native + // stack. In this case, we can perform a tail call which is much more + // efficient, since there is no need to touch the native stack at all. + // + sub x3, x2, x1 // Length = NewStackPointer - FramePtr + cmp x3, #64 + b.gt 1f - blr x19 + // + // While probably harmless in practice, we should not access the VM stack + // outside of the interval [NewStackPointer, FramePtr), which means we + // should not blindly fill all 8 argument registers with VM stack data. + // So instead, calculate how many argument registers we can fill based on + // the size of the VM stack frame, and skip the remaining ones. + // + adr x0, 0f // Take address of 'br' instruction below + bic x3, x3, #7 // Ensure correct alignment + sub x0, x0, x3, lsr #1 // Subtract 4 bytes for each arg to unstack + br x0 // Skip remaining argument registers + + ldr x7, [x9, #56] // Call with 8 arguments + ldr x6, [x9, #48] // | + ldr x5, [x9, #40] // | + ldr x4, [x9, #32] // | + ldr x3, [x9, #24] // | + ldr x2, [x9, #16] // | + ldr x1, [x9, #8] // V + ldr x0, [x9] // Call with 1 argument + +0: br x8 // Call with no arguments - mov sp, x20 - ldp x29, x30, [sp], #16 - ldp x19, x20, [sp], #16 + // + // More than 64 bytes: we need to build the full native stack frame and copy + // the part of the VM stack exceeding 64 bytes (which may contain stacked + // arguments) to the native stack + // +1: stp x29, x30, [sp, #-16]! + mov x29, sp - ret + // + // Ensure that the stack pointer remains 16 byte aligned, + // even if the size of the VM stack frame is not a multiple of 16 + // + add x1, x1, #64 // Skip over [potential] reg params + tbz x3, #3, 2f // Multiple of 16? + ldr x4, [x2, #-8]! // No? Then push one word + str x4, [sp, #-16]! // ... but use two slots + b 3f + +2: ldp x4, x5, [x2, #-16]! + stp x4, x5, [sp, #-16]! +3: cmp x2, x1 + b.gt 2b + + ldp x0, x1, [x9] + ldp x2, x3, [x9, #16] + ldp x4, x5, [x9, #32] + ldp x6, x7, [x9, #48] + + blr x8 + + mov sp, x29 + ldp x29, x30, [sp], #16 + ret //**************************************************************************** // EbcLLEbcInterpret -- 2.7.4