From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.120]) by mx.groups.io with SMTP id smtpd.web11.10640.1590080384697648172 for ; Thu, 21 May 2020 09:59:44 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VcgShW6Y; spf=pass (domain: redhat.com, ip: 207.211.31.120, mailfrom: philmd@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590080381; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HHd7N+tdmKq+LKGOaT2QBCUOaRCgzv79kxG0UoUmB6M=; b=VcgShW6Y64K+YUgx9trc9k52w/cROeAjcd2426Y6I+v9jQfNllv72vGZofOVwsjVvvbL8T /GfuTWL5oyqheWxpAUxAzO/gLyTkEHpVL6VAEfD2n1DRCge8uLKIGGtLRU+NU+E2zpfdxt kvuBKOtjooTMhfhbM/YcKRZ4bzu68Y8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-415-K7HzEYugM9OvIf6i776OqA-1; Thu, 21 May 2020 12:59:36 -0400 X-MC-Unique: K7HzEYugM9OvIf6i776OqA-1 Received: by mail-wr1-f72.google.com with SMTP id z5so3139434wrt.17 for ; Thu, 21 May 2020 09:59:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=HHd7N+tdmKq+LKGOaT2QBCUOaRCgzv79kxG0UoUmB6M=; b=D9Rh69bhFpgGLxTYwM8BF6zlsWe1P5SMCGqlt+prwp6Xjg4COkYcIboqVaaGFmXWps T6JRFPrajuAO2asXhxmiYnQQAPQqOrEdBRsCwJoX/+LB0MV9I/Qp8cfWRy1xZ/s9T2ai heuvkJhUAji43XzRNdIVLvnHPkzUmILeN0gpCF/AsbQsch2gR/VaPpIvXmsk/Hrq0ii5 0jweLjVJKRGJDPw1xg5ujDfPG41ZIBMJN0jkzjtqcHbuHVK2KG3E/A7CZvpDJdSddguo TEMh0gZFrKVQBiIRnz3LUbPFPmdhzWYkb71HBOTxvdX1mhLWS6N0Amht7Cmo1VemMK9s p+zw== X-Gm-Message-State: AOAM533z9sQ25LEJI4pIr4bXnWUKx5L29V0kJT7LfkseVg5eBHOkILMm dGmSTIeuAj3fksd5o3WwM6O9+ISh2+fifqfX8XY/zbqbwzBZnxwlAqnDYw9lXxvqcDpS6bvGyp7 VJwMOuIdZCcs7Bw== X-Received: by 2002:a1c:1f0d:: with SMTP id f13mr9059517wmf.184.1590080374666; Thu, 21 May 2020 09:59:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwSztYtORneTvNd8V9oxJS4p1sdjGQxN+CRa9rTMqFKyv0Lhgj3ZzRhXOCxjUiy+qtHgvODjQ== X-Received: by 2002:a1c:1f0d:: with SMTP id f13mr9059497wmf.184.1590080374357; Thu, 21 May 2020 09:59:34 -0700 (PDT) Return-Path: Received: from [192.168.1.40] (17.red-88-21-202.staticip.rima-tde.net. [88.21.202.17]) by smtp.gmail.com with ESMTPSA id h133sm7409286wmf.25.2020.05.21.09.59.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 21 May 2020 09:59:33 -0700 (PDT) Subject: Re: [edk2-devel] [PATCH v2] ArmPkg/CompilerIntrinsicsLib: provide atomics intrinsics To: Ard Biesheuvel , devel@edk2.groups.io Cc: glin@suse.com, leif@nuviainc.com, lersek@redhat.com, liming.gao@intel.com References: <20200520114448.26104-1-ard.biesheuvel@arm.com> <47f54425-df5d-17a3-e134-fe9e01fb08bd@redhat.com> <6c9ec3a3-8aa1-c4d4-c7ba-1b9e28fd0866@redhat.com> From: =?UTF-8?B?UGhpbGlwcGUgTWF0aGlldS1EYXVkw6k=?= Message-ID: <0e981fa9-c365-dcf8-1660-f472075e35d9@redhat.com> Date: Thu, 21 May 2020 18:59:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 5/21/20 6:45 PM, Ard Biesheuvel wrote: > On 5/21/20 6:40 PM, Philippe Mathieu-Daudé via groups.io wrote: >> On 5/20/20 2:37 PM, Philippe Mathieu-Daudé wrote: >>> Hi Ard, >>> >>> On 5/20/20 1:44 PM, Ard Biesheuvel wrote: >>>> Gary reports the GCC 10 will emit calls to atomics intrinsics routines >>>> unless -mno-outline-atomics is specified. This means GCC-10 introduces >>>> new intrinsics, and even though it would be possible to work around >>>> this >>>> by specifying the command line option, this would require a new GCC10 >>>> toolchain profile to be created, which we prefer to avoid. >>>> >>>> So instead, add the new intrinsics to our library so they are provided >>>> when necessary. >>>> >>>> Signed-off-by: Ard Biesheuvel >>>> --- >>>> v2: >>>> - add missing .globl to export the functions from the object file >>>> - add function end markers so the size of each is visible in the ELF >>>> metadata >>>> - add some comments to describe what is going on >>> >>> Thanks, head hurts a bit less... >>> >>>> >>>>   ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf | >>>> 3 + >>>>   ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S         | >>>> 142 ++++++++++++++++++++ >>>>   2 files changed, 145 insertions(+) >>>> >>>> diff --git >>>> a/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf >>>> b/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf >>>> index d5bad9467758..fcf48c678119 100644 >>>> --- a/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf >>>> +++ b/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf >>>> @@ -79,6 +79,9 @@ [Sources.ARM] >>>>     Arm/ldivmod.asm      | MSFT >>>>     Arm/llsr.asm         | MSFT >>>> +[Sources.AARCH64] >>>> +  AArch64/Atomics.S    | GCC >>>> + >>>>   [Packages] >>>>     MdePkg/MdePkg.dec >>>>     ArmPkg/ArmPkg.dec >>>> diff --git a/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S >>>> b/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S >>>> new file mode 100644 >>>> index 000000000000..dc61d6bb8e52 >>>> --- /dev/null >>>> +++ b/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S >>>> @@ -0,0 +1,142 @@ >>>> +#------------------------------------------------------------------------------ >>>> >>>> +# >>>> +# Copyright (c) 2020, Arm, Limited. All rights reserved.
>>>> +# >>>> +# SPDX-License-Identifier: BSD-2-Clause-Patent >>>> +# >>>> +#------------------------------------------------------------------------------ >>>> >>>> + >>>> +    /* >>>> +     * Provide the GCC intrinsics that are required when using GCC >>>> 9 or >>>> +     * later with the -moutline-atomics options (which became the >>>> default >>>> +     * in GCC 10) >>>> +     */ >>>> +    .arch armv8-a >>>> + >>>> +    .macro        reg_alias, pfx, sz >>>> +    r0_\sz        .req    \pfx\()0 >>>> +    r1_\sz        .req    \pfx\()1 >>>> +    tmp0_\sz    .req    \pfx\()16 >>>> +    tmp1_\sz    .req    \pfx\()17 >>>> +    .endm >>>> + >>>> +    /* >>>> +     * Define register aliases of the right type for each size >>>> +     * (xN for 8 bytes, wN for everything smaller) >>>> +     */ >>>> +    reg_alias    w, 1 >>>> +    reg_alias    w, 2 >>>> +    reg_alias    w, 4 >>>> +    reg_alias    x, 8 >>>> + >>>> +    .macro        fn_start, name:req >>>> +    .section    .text.\name >>>> +    .globl        \name >>>> +    .type        \name, %function >>>> +\name\(): >>>> +    .endm >>>> + >>>> +    .macro        fn_end, name:req >>>> +    .size        \name, . - \name >>>> +    .endm >>>> + >>>> +    /* >>>> +     * Emit an atomic helper for \model with operands of size \sz, >>>> using >>>> +     * the operation specified by \insn (which is the LSE name), >>>> and which >>>> +     * can be implemented using the generic >>>> load-locked/store-conditional >>>> +     * (LL/SC) sequence below, using the arithmetic operation given by >>>> +     * \opc. >>>> +     */ >>>> +    .macro         emit_ld_sz, sz:req, insn:req, opc:req, >>>> model:req, s, a, l >>>> +    fn_start    __aarch64_\insn\()\sz\()\model >>>> +    mov        tmp0_\sz, r0_\sz >>>> +0:    ld\a\()xr\s    r0_\sz, [x1] >>>> +    .ifnc        \insn, swp >>>> +    \opc        tmp1_\sz, r0_\sz, tmp0_\sz >>>> +    .else >>>> +    \opc        tmp1_\sz, tmp0_\sz >>>> +    .endif >>>> +    st\l\()xr\s    w15, tmp1_\sz, [x1] >>>> +    cbnz        w15, 0b >>> >>> I see at the end \s is in {,b,h} range. >>> >>> Don't you need to use x15 on 64-bit? >> >> Ard, I expanded all macros and reviewed this patch, but I am still >> having hard time to figure why w15 temp is OK instead of x15. Any hint? >> > > Why do you think it should be x15? I.e.: https://developer.arm.com/docs/100076/0100/instruction-set-reference/a64-data-transfer-instructions/ldaxr Syntax LDAXR Wt, [Xn|SP{,#0}] ; 32-bit LDAXR Xt, [Xn|SP{,#0}] ; 64-bit > > >>> >>>> +    ret >>>> +    fn_end        __aarch64_\insn\()\sz\()\model >>>> +    .endm >>>> + >>>> +    /* >>>> +     * Emit atomic helpers for \model for operand sizes in the >>>> +     * set {1, 2, 4, 8}, for the instruction pattern given by >>>> +     * \insn. (This is the LSE name, but this implementation uses >>>> +     * the generic LL/SC sequence using \opc as the arithmetic >>>> +     * operation on the target.) >>>> +     */ >>>> +    .macro        emit_ld, insn:req, opc:req, model:req, a, l >>>> +    emit_ld_sz    1, \insn, \opc, \model, b, \a, \l >>>> +    emit_ld_sz    2, \insn, \opc, \model, h, \a, \l >>>> +    emit_ld_sz    4, \insn, \opc, \model,  , \a, \l >>>> +    emit_ld_sz    8, \insn, \opc, \model,  , \a, \l >>>> +    .endm >>>> + >>>> +    /* >>>> +     * Emit the compare and swap helper for \model and size \sz >>>> +     * using LL/SC instructions. >>>> +     */ >>>> +    .macro         emit_cas_sz, sz:req, model:req, uxt:req, s, a, l >>>> +    fn_start    __aarch64_cas\sz\()\model >>>> +    \uxt        tmp0_\sz, r0_\sz >>>> +0:    ld\a\()xr\s    r0_\sz, [x2] >>>> +    cmp        r0_\sz, tmp0_\sz >>>> +    bne        1f >>>> +    st\l\()xr\s    w15, r1_\sz, [x2] >>>> +    cbnz        w15, 0b >>>> +1:    ret >>>> +    fn_end        __aarch64_cas\sz\()\model >>>> +    .endm >>>> + >>>> +    /* >>>> +     * Emit compare-and-swap helpers for \model for operand sizes >>>> in the >>>> +     * set {1, 2, 4, 8, 16}. >>>> +     */ >>>> +    .macro        emit_cas, model:req, a, l >>>> +    emit_cas_sz    1, \model, uxtb, b, \a, \l >>>> +    emit_cas_sz    2, \model, uxth, h, \a, \l >>>> +    emit_cas_sz    4, \model, mov ,  , \a, \l >>>> +    emit_cas_sz    8, \model, mov ,  , \a, \l >>>> + >>>> +    /* >>>> +     * We cannot use the parameterized sequence for 16 byte CAS, so we >>>> +     * need to define it explicitly. >>>> +     */ >>>> +    fn_start    __aarch64_cas16\model >>>> +    mov        x16, x0 >>>> +    mov        x17, x1 >>>> +0:    ld\a\()xp    x0, x1, [x4] >>>> +    cmp        x0, x16 >>>> +    ccmp        x1, x17, #0, eq >>>> +    bne        1f >>>> +    st\l\()xp    w15, x16, x17, [x4] >>>> +    cbnz        w15, 0b >>>> +1:    ret >>>> +    fn_end        __aarch64_cas16\model >>>> +    .endm >>>> + >>>> +    /* >>>> +     * Emit the set of GCC outline atomic helper functions for >>>> +     * the memory ordering model given by \model: >>>> +     * - relax    unordered loads and stores >>>> +     * - acq    load-acquire, unordered store >>>> +     * - rel    unordered load, store-release >>>> +     * - acq_rel    load-acquire, store-release >>>> +     */ >>>> +    .macro        emit_model, model:req, a, l >>>> +    emit_ld        ldadd, add, \model, \a, \l >>>> +    emit_ld        ldclr, bic, \model, \a, \l >>>> +    emit_ld        ldeor, eor, \model, \a, \l >>>> +    emit_ld        ldset, orr, \model, \a, \l >>>> +    emit_ld        swp,   mov, \model, \a, \l >>>> +    emit_cas    \model, \a, \l >>>> +    .endm >>>> + >>>> +    emit_model    _relax >>>> +    emit_model    _acq, a >>>> +    emit_model    _rel,, l >>>> +    emit_model    _acq_rel, a, l >>>> >> >> >> >> >