From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-x231.google.com (mail-it0-x231.google.com [IPv6:2607:f8b0:4001:c0b::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id AF7C51A1E59 for ; Wed, 7 Sep 2016 09:19:56 -0700 (PDT) Received: by mail-it0-x231.google.com with SMTP id e124so206524937ith.0 for ; Wed, 07 Sep 2016 09:19:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Ksieo1z2LIk5TooNmMLrLQtXRQuTBeDFNQmfjN12cak=; b=KdV4fb0wfTAP6q2gN56YRcxEe6KJjSSQ9rs/s4MU3a94MTy5iUF63wKQed9h/pecBp wI1b4kylfMQJRn1Qj3Pf3vTT8RtGJF1dudoNvtUyOokxn7krkjpoYrVD0bFvP/dKbm1W GiCGUnJYtodIE9mXSvc+Lt7hcYSmvnKOTmyrE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Ksieo1z2LIk5TooNmMLrLQtXRQuTBeDFNQmfjN12cak=; b=bFjQGzVmQFsgr4S/Vdts9qzvMIRgjL3cJI8ybIp4q31JkoqT9+eQvSzuHNdbn8iUKj CPVothCkiKaTJSiZkX8ESCd4knUnpSQOqp7Bt59hte/sprT9MegZDOSJpA+R6khdzZZN qplUF6vuGgEfl4ca3t2xWChVmEkaaV5IGdOZEAIhULSl2u0WeMrfi/mHGrfAMC0wVPet Mnp7JIa3zacIaecdFyn1ybtXwvPStAnfeXjcnDAicbVXi00DpoP7VTdSrIvnDLNfM79c xFpxeV3+Dry/g4gLE+z72ncgzF58Duvr2k2hQBquQOvaeEDN7dTY38gzwozExBzQMCmh f95g== X-Gm-Message-State: AE9vXwPQ/5G+OPlYWXrd1BH5lgjMXm9mTQyZ7tZoQiprh+SFNhKIjCWjvDluqugPi+ogRwJD6p/sQM0XanJGjpFf X-Received: by 10.36.65.2 with SMTP id x2mr7781073ita.78.1473265195825; Wed, 07 Sep 2016 09:19:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.204.195 with HTTP; Wed, 7 Sep 2016 09:19:55 -0700 (PDT) In-Reply-To: <1473259968-7221-3-git-send-email-ard.biesheuvel@linaro.org> References: <1473259968-7221-1-git-send-email-ard.biesheuvel@linaro.org> <1473259968-7221-3-git-send-email-ard.biesheuvel@linaro.org> From: Ard Biesheuvel Date: Wed, 7 Sep 2016 17:19:55 +0100 Message-ID: To: edk2-devel-01 , Leif Lindholm Cc: "Gao, Liming" , Ryan Harkin , Ard Biesheuvel Subject: Re: [PATCH v4 2/3] MdePkg/BaseMemoryLibOptDxe: add accelerated ARM routines X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Sep 2016 16:19:57 -0000 Content-Type: text/plain; charset=UTF-8 On 7 September 2016 at 15:52, Ard Biesheuvel wrote: > This adds ARM support to BaseMemoryLibOptDxe, partially based on the > cortex-strings library (ScanMem) and the existing CopyMem() implementation > from BaseMemoryLibStm in ArmPkg. > > All string routines are accelerated except ScanMem16, ScanMem32, > ScanMem64 and IsZeroBuffer, which can wait for another day. (Very few > occurrences exist in the codebase) > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Ard Biesheuvel > Reviewed-by: Liming Gao > --- > MdePkg/Library/BaseMemoryLibOptDxe/Arm/CompareMem.S | 138 ++++++++++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/Arm/CompareMem.asm | 140 ++++++++++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/Arm/CopyMem.S | 172 ++++++++++++++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/Arm/CopyMem.asm | 147 +++++++++++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/Arm/ScanMem.S | 146 +++++++++++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/Arm/ScanMem.asm | 147 +++++++++++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/Arm/ScanMemGeneric.c | 142 ++++++++++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.S | 75 +++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.asm | 81 +++++++++ > MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf | 30 ++-- > 10 files changed, 1204 insertions(+), 14 deletions(-) > [..] > diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.S b/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.S > new file mode 100644 > index 000000000000..914fdd60ea52 > --- /dev/null > +++ b/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.S > @@ -0,0 +1,75 @@ > +#------------------------------------------------------------------------------ > +# > +# Copyright (c) 2016, Linaro Ltd. All rights reserved.
> +# > +# This program and the accompanying materials are licensed and made available > +# under the terms and conditions of the BSD License which accompanies this > +# distribution. The full text of the license may be found at > +# http://opensource.org/licenses/bsd-license.php > +# > +# THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS, > +# WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED. > +# > +#------------------------------------------------------------------------------ > + > + .text > + .thumb > + .syntax unified > + > +ASM_GLOBAL ASM_PFX(InternalMemZeroMem) > +ASM_PFX(InternalMemZeroMem): > + movs r2, #0 > + > +ASM_GLOBAL ASM_PFX(InternalMemSetMem) > +ASM_PFX(InternalMemSetMem): > + uxtb r2, r2 > + orr r2, r2, r2, lsl #8 > + > +ASM_GLOBAL ASM_PFX(InternalMemSetMem16) > +ASM_PFX(InternalMemSetMem16): > + uxth r2, r2 > + orr r2, r2, r2, lsl #16 > + > +ASM_GLOBAL ASM_PFX(InternalMemSetMem32) > +ASM_PFX(InternalMemSetMem32): > + mov r3, r2 > + > +ASM_GLOBAL ASM_PFX(InternalMemSetMem64) > +ASM_PFX(InternalMemSetMem64): > + push {r0, lr} > + add ip, r0, r1 // ip := dst + length > + adds r0, r0, #16 // advance the output pointer by 16 bytes > + cmp r1, #16 // fewer than 16 bytes of input? > + blt 2f > + > + str r2, [r0, #-16] // potentially unaligned store of 4 bytes > + str r3, [r0, #-12] // potentially unaligned store of 4 bytes > + str r2, [r0, #-8] // potentially unaligned store of 4 bytes > + str r3, [r0, #-4] // potentially unaligned store of 4 bytes > + bic r0, r0, #15 // align output pointer > + beq 1f > + > +0: adds r0, r0, #16 // advance the output pointer by 16 bytes > + subs r1, ip, r0 // past the output? > + blt 2f // break out of the loop > + strd r2, r3, [r0, #-16] // aligned store of 16 bytes > + strd r2, r3, [r0, #-8] > + bne 0b // goto beginning of loop > +1: pop {r0, pc} > + > +2: and r1, r1, #0xf > + cmp r1, #0x4 // between 4 and 15 bytes? > + blt 3f > + cmp r1, #0x8 // between 8 and 15 bytes? > + str r2, [r0, #-16] // overlapping store of 4 + (4 + 4) + 4 bytes > + itt ge > + strge r3, [r0, #-12] > + strge r2, [ip, #-8] > + str r3, [ip, #-4] > + pop {r0, pc} > + > +3: cmp r1, #2 // 2 or 3 bytes? > + strb r2, [r0, #-16] // store 1 byte > + it ge > + strhge r2, [ip, #-2] // store 2 bytes > + pop {r0, pc} > diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.asm b/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.asm > new file mode 100644 > index 000000000000..14fecd93a96c > --- /dev/null > +++ b/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.asm > @@ -0,0 +1,81 @@ > +;------------------------------------------------------------------------------ > +; > +; Copyright (c) 2016, Linaro Ltd. All rights reserved.
> +; > +; This program and the accompanying materials are licensed and made available > +; under the terms and conditions of the BSD License which accompanies this > +; distribution. The full text of the license may be found at > +; http://opensource.org/licenses/bsd-license.php > +; > +; THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS, > +; WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED. > +; > +;------------------------------------------------------------------------------ > + > + EXPORT InternalMemZeroMem > + EXPORT InternalMemSetMem > + EXPORT InternalMemSetMem16 > + EXPORT InternalMemSetMem32 > + EXPORT InternalMemSetMem64 > + > + AREA SetMem, CODE, READONLY > + THUMB > + > +InternalMemZeroMem > + movs r2, #0 > + > +InternalMemSetMem > + uxtb r2, r2 > + orr r2, r2, r2, lsl #8 > + > +InternalMemSetMem16 > + uxth r2, r2 > + orr r2, r2, r2, lsr #16 > + > +InternalMemSetMem32 > + mov r3, r2 > + > +InternalMemSetMem64 > + push {r0, lr} > + add ip, r0, r1 ; ip := dst + length > + adds r0, r0, #16 ; advance the output pointer by 16 bytes > + cmp r1, #16 ; fewer than 16 bytes of input? > + blt L2 > + > + str r2, [r0, #-16] ; potentially unaligned store of 4 bytes > + str r3, [r0, #-12] ; potentially unaligned store of 4 bytes > + str r2, [r0, #-8] ; potentially unaligned store of 4 bytes > + str r3, [r0, #-4] ; potentially unaligned store of 4 bytes > + bic r0, r0, #15 ; align output pointer > + beq L1 > + > +L0 > + adds r0, r0, #16 ; advance the output pointer by 16 bytes > + subs r1, ip, r0 ; past the output? > + blt L2 ; break out of the loop > + strd r2, r3, [r0, #-16] ; aligned store of 16 bytes > + strd r2, r3, [r0, #-8] > + bne L0 ; goto beginning of loop > +L1 > + pop {r0, pc} > + > +L2 > + and r1, r1, #0xf > + cmp r1, #0x4 ; between 4 and 15 bytes? > + blt L3 > + cmp r1, #0x8 ; between 8 and 15 bytes? > + str r2, [r0, #-16] ; overlapping store of 4 + (4 + 4) + 4 bytes > + itt ge > + strge r3, [r0, #-12] > + strge r2, [ip, #-8] This could be changed to 'gt' in all three instructions above, while keeping the same functionality. I can change that before committing -- Ard.