public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: edk2-devel-01 <edk2-devel@lists.01.org>,
	Leif Lindholm <leif.lindholm@linaro.org>
Cc: "Gao, Liming" <liming.gao@intel.com>,
	Ryan Harkin <ryan.harkin@linaro.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: [PATCH v4 2/3] MdePkg/BaseMemoryLibOptDxe: add accelerated ARM routines
Date: Wed, 7 Sep 2016 17:19:55 +0100	[thread overview]
Message-ID: <CAKv+Gu8Cm4J60PsZeJ-DBGrJ7t4jKUs_A0s2b-aYMOVN1ApTVg@mail.gmail.com> (raw)
In-Reply-To: <1473259968-7221-3-git-send-email-ard.biesheuvel@linaro.org>

On 7 September 2016 at 15:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> This adds ARM support to BaseMemoryLibOptDxe, partially based on the
> cortex-strings library (ScanMem) and the existing CopyMem() implementation
> from BaseMemoryLibStm in ArmPkg.
>
> All string routines are accelerated except ScanMem16, ScanMem32,
> ScanMem64 and IsZeroBuffer, which can wait for another day. (Very few
> occurrences exist in the codebase)
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Reviewed-by: Liming Gao <liming.gao@intel.com>
> ---
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/CompareMem.S        | 138 ++++++++++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/CompareMem.asm      | 140 ++++++++++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/CopyMem.S           | 172 ++++++++++++++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/CopyMem.asm         | 147 +++++++++++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/ScanMem.S           | 146 +++++++++++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/ScanMem.asm         | 147 +++++++++++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/ScanMemGeneric.c    | 142 ++++++++++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.S            |  75 +++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.asm          |  81 +++++++++
>  MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf |  30 ++--
>  10 files changed, 1204 insertions(+), 14 deletions(-)
>
[..]
> diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.S b/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.S
> new file mode 100644
> index 000000000000..914fdd60ea52
> --- /dev/null
> +++ b/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.S
> @@ -0,0 +1,75 @@
> +#------------------------------------------------------------------------------
> +#
> +# Copyright (c) 2016, Linaro Ltd. All rights reserved.<BR>
> +#
> +# This program and the accompanying materials are licensed and made available
> +# under the terms and conditions of the BSD License which accompanies this
> +# distribution.  The full text of the license may be found at
> +# http://opensource.org/licenses/bsd-license.php
> +#
> +# THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
> +# WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
> +#
> +#------------------------------------------------------------------------------
> +
> +    .text
> +    .thumb
> +    .syntax unified
> +
> +ASM_GLOBAL ASM_PFX(InternalMemZeroMem)
> +ASM_PFX(InternalMemZeroMem):
> +    movs    r2, #0
> +
> +ASM_GLOBAL ASM_PFX(InternalMemSetMem)
> +ASM_PFX(InternalMemSetMem):
> +    uxtb    r2, r2
> +    orr     r2, r2, r2, lsl #8
> +
> +ASM_GLOBAL ASM_PFX(InternalMemSetMem16)
> +ASM_PFX(InternalMemSetMem16):
> +    uxth    r2, r2
> +    orr     r2, r2, r2, lsl #16
> +
> +ASM_GLOBAL ASM_PFX(InternalMemSetMem32)
> +ASM_PFX(InternalMemSetMem32):
> +    mov     r3, r2
> +
> +ASM_GLOBAL ASM_PFX(InternalMemSetMem64)
> +ASM_PFX(InternalMemSetMem64):
> +    push    {r0, lr}
> +    add     ip, r0, r1              // ip := dst + length
> +    adds    r0, r0, #16             // advance the output pointer by 16 bytes
> +    cmp     r1, #16                 // fewer than 16 bytes of input?
> +    blt     2f
> +
> +    str     r2, [r0, #-16]          // potentially unaligned store of 4 bytes
> +    str     r3, [r0, #-12]          // potentially unaligned store of 4 bytes
> +    str     r2, [r0, #-8]           // potentially unaligned store of 4 bytes
> +    str     r3, [r0, #-4]           // potentially unaligned store of 4 bytes
> +    bic     r0, r0, #15             // align output pointer
> +    beq     1f
> +
> +0:  adds    r0, r0, #16             // advance the output pointer by 16 bytes
> +    subs    r1, ip, r0              // past the output?
> +    blt     2f                      // break out of the loop
> +    strd    r2, r3, [r0, #-16]      // aligned store of 16 bytes
> +    strd    r2, r3, [r0, #-8]
> +    bne     0b                      // goto beginning of loop
> +1:  pop     {r0, pc}
> +
> +2:  and     r1, r1, #0xf
> +    cmp     r1, #0x4                // between 4 and 15 bytes?
> +    blt     3f
> +    cmp     r1, #0x8                // between 8 and 15 bytes?
> +    str     r2, [r0, #-16]          // overlapping store of 4 + (4 + 4) + 4 bytes
> +    itt     ge
> +    strge   r3, [r0, #-12]
> +    strge   r2, [ip, #-8]
> +    str     r3, [ip, #-4]
> +    pop     {r0, pc}
> +
> +3:  cmp     r1, #2                  // 2 or 3 bytes?
> +    strb    r2, [r0, #-16]          // store 1 byte
> +    it      ge
> +    strhge  r2, [ip, #-2]           // store 2 bytes
> +    pop     {r0, pc}
> diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.asm b/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.asm
> new file mode 100644
> index 000000000000..14fecd93a96c
> --- /dev/null
> +++ b/MdePkg/Library/BaseMemoryLibOptDxe/Arm/SetMem.asm
> @@ -0,0 +1,81 @@
> +;------------------------------------------------------------------------------
> +;
> +; Copyright (c) 2016, Linaro Ltd. All rights reserved.<BR>
> +;
> +; This program and the accompanying materials are licensed and made available
> +; under the terms and conditions of the BSD License which accompanies this
> +; distribution.  The full text of the license may be found at
> +; http://opensource.org/licenses/bsd-license.php
> +;
> +; THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
> +; WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
> +;
> +;------------------------------------------------------------------------------
> +
> +    EXPORT  InternalMemZeroMem
> +    EXPORT  InternalMemSetMem
> +    EXPORT  InternalMemSetMem16
> +    EXPORT  InternalMemSetMem32
> +    EXPORT  InternalMemSetMem64
> +
> +    AREA    SetMem, CODE, READONLY
> +    THUMB
> +
> +InternalMemZeroMem
> +    movs    r2, #0
> +
> +InternalMemSetMem
> +    uxtb    r2, r2
> +    orr     r2, r2, r2, lsl #8
> +
> +InternalMemSetMem16
> +    uxth    r2, r2
> +    orr     r2, r2, r2, lsr #16
> +
> +InternalMemSetMem32
> +    mov     r3, r2
> +
> +InternalMemSetMem64
> +    push    {r0, lr}
> +    add     ip, r0, r1              ; ip := dst + length
> +    adds    r0, r0, #16             ; advance the output pointer by 16 bytes
> +    cmp     r1, #16                 ; fewer than 16 bytes of input?
> +    blt     L2
> +
> +    str     r2, [r0, #-16]          ; potentially unaligned store of 4 bytes
> +    str     r3, [r0, #-12]          ; potentially unaligned store of 4 bytes
> +    str     r2, [r0, #-8]           ; potentially unaligned store of 4 bytes
> +    str     r3, [r0, #-4]           ; potentially unaligned store of 4 bytes
> +    bic     r0, r0, #15             ; align output pointer
> +    beq     L1
> +
> +L0
> +    adds    r0, r0, #16             ; advance the output pointer by 16 bytes
> +    subs    r1, ip, r0              ; past the output?
> +    blt     L2                      ; break out of the loop
> +    strd    r2, r3, [r0, #-16]      ; aligned store of 16 bytes
> +    strd    r2, r3, [r0, #-8]
> +    bne     L0                      ; goto beginning of loop
> +L1
> +    pop     {r0, pc}
> +
> +L2
> +    and     r1, r1, #0xf
> +    cmp     r1, #0x4                ; between 4 and 15 bytes?
> +    blt     L3
> +    cmp     r1, #0x8                ; between 8 and 15 bytes?
> +    str     r2, [r0, #-16]          ; overlapping store of 4 + (4 + 4) + 4 bytes
> +    itt     ge
> +    strge   r3, [r0, #-12]
> +    strge   r2, [ip, #-8]

This could be changed to 'gt' in all three instructions above, while
keeping the same functionality. I can change that before committing

-- 
Ard.


  reply	other threads:[~2016-09-07 16:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-07 14:52 [PATCH v4 0/3] MdePkg: add ARM/AARCH64 support to BaseMemoryLib Ard Biesheuvel
2016-09-07 14:52 ` [PATCH v4 1/3] MdePkg/BaseMemoryLib: widen aligned accesses to 32 or 64 bits Ard Biesheuvel
2016-09-07 14:52 ` [PATCH v4 2/3] MdePkg/BaseMemoryLibOptDxe: add accelerated ARM routines Ard Biesheuvel
2016-09-07 16:19   ` Ard Biesheuvel [this message]
2016-09-07 14:52 ` [PATCH v4 3/3] MdePkg/BaseMemoryLibOptDxe: add accelerated AARCH64 routines Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKv+Gu8Cm4J60PsZeJ-DBGrJ7t4jKUs_A0s2b-aYMOVN1ApTVg@mail.gmail.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox