From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.10331.1590079544865171079 for ; Thu, 21 May 2020 09:45:45 -0700 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ard.biesheuvel@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8C90AD6E; Thu, 21 May 2020 09:45:43 -0700 (PDT) Received: from [192.168.1.81] (unknown [10.37.8.250]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1D9DB3F68F; Thu, 21 May 2020 09:45:41 -0700 (PDT) Subject: Re: [edk2-devel] [PATCH v2] ArmPkg/CompilerIntrinsicsLib: provide atomics intrinsics To: devel@edk2.groups.io, philmd@redhat.com Cc: glin@suse.com, leif@nuviainc.com, lersek@redhat.com, liming.gao@intel.com References: <20200520114448.26104-1-ard.biesheuvel@arm.com> <47f54425-df5d-17a3-e134-fe9e01fb08bd@redhat.com> <6c9ec3a3-8aa1-c4d4-c7ba-1b9e28fd0866@redhat.com> From: "Ard Biesheuvel" Message-ID: Date: Thu, 21 May 2020 18:45:39 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <6c9ec3a3-8aa1-c4d4-c7ba-1b9e28fd0866@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 5/21/20 6:40 PM, Philippe Mathieu-Daud=C3=A9 via groups.io wrote: > On 5/20/20 2:37 PM, Philippe Mathieu-Daud=C3=A9 wrote: >> Hi Ard, >> >> On 5/20/20 1:44 PM, Ard Biesheuvel wrote: >>> Gary reports the GCC 10 will emit calls to atomics intrinsics routines >>> unless -mno-outline-atomics is specified. This means GCC-10 introduces >>> new intrinsics, and even though it would be possible to work around th= is >>> by specifying the command line option, this would require a new GCC10 >>> toolchain profile to be created, which we prefer to avoid. >>> >>> So instead, add the new intrinsics to our library so they are provided >>> when necessary. >>> >>> Signed-off-by: Ard Biesheuvel >>> --- >>> v2: >>> - add missing .globl to export the functions from the object file >>> - add function end markers so the size of each is visible in the ELF= =20 >>> metadata >>> - add some comments to describe what is going on >> >> Thanks, head hurts a bit less... >> >>> >>> =C2=A0 ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf = |=C2=A0=C2=A0 3 + >>> =C2=A0 ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S=C2=A0=C2= = =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=20 >>> 142 ++++++++++++++++++++ >>> =C2=A0 2 files changed, 145 insertions(+) >>> >>> diff --git=20 >>> a/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf=20 >>> b/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf >>> index d5bad9467758..fcf48c678119 100644 >>> --- a/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf >>> +++ b/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf >>> @@ -79,6 +79,9 @@ [Sources.ARM] >>> =C2=A0=C2=A0=C2=A0 Arm/ldivmod.asm=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | MSF= T >>> =C2=A0=C2=A0=C2=A0 Arm/llsr.asm=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= = =A0=C2=A0 | MSFT >>> +[Sources.AARCH64] >>> +=C2=A0 AArch64/Atomics.S=C2=A0=C2=A0=C2=A0 | GCC >>> + >>> =C2=A0 [Packages] >>> =C2=A0=C2=A0=C2=A0 MdePkg/MdePkg.dec >>> =C2=A0=C2=A0=C2=A0 ArmPkg/ArmPkg.dec >>> diff --git a/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S=20 >>> b/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S >>> new file mode 100644 >>> index 000000000000..dc61d6bb8e52 >>> --- /dev/null >>> +++ b/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S >>> @@ -0,0 +1,142 @@ >>> +#--------------------------------------------------------------------= ----------=20 >>> >>> +# >>> +# Copyright (c) 2020, Arm, Limited. All rights reserved.
>>> +# >>> +# SPDX-License-Identifier: BSD-2-Clause-Patent >>> +# >>> +#--------------------------------------------------------------------= ----------=20 >>> >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Provide the GCC intrinsics that are requir= ed when using GCC 9 or >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * later with the -moutline-atomics options (= which became the=20 >>> default >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * in GCC 10) >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 .arch armv8-a >>> + >>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 r= eg_alias, pfx, sz >>> +=C2=A0=C2=A0=C2=A0 r0_\sz=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .= req=C2=A0=C2=A0=C2=A0 \pfx\()0 >>> +=C2=A0=C2=A0=C2=A0 r1_\sz=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .= req=C2=A0=C2=A0=C2=A0 \pfx\()1 >>> +=C2=A0=C2=A0=C2=A0 tmp0_\sz=C2=A0=C2=A0=C2=A0 .req=C2=A0=C2=A0=C2=A0 = \pfx\()16 >>> +=C2=A0=C2=A0=C2=A0 tmp1_\sz=C2=A0=C2=A0=C2=A0 .req=C2=A0=C2=A0=C2=A0 = \pfx\()17 >>> +=C2=A0=C2=A0=C2=A0 .endm >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Define register aliases of the right type = for each size >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * (xN for 8 bytes, wN for everything smaller= ) >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 reg_alias=C2=A0=C2=A0=C2=A0 w, 1 >>> +=C2=A0=C2=A0=C2=A0 reg_alias=C2=A0=C2=A0=C2=A0 w, 2 >>> +=C2=A0=C2=A0=C2=A0 reg_alias=C2=A0=C2=A0=C2=A0 w, 4 >>> +=C2=A0=C2=A0=C2=A0 reg_alias=C2=A0=C2=A0=C2=A0 x, 8 >>> + >>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 f= n_start, name:req >>> +=C2=A0=C2=A0=C2=A0 .section=C2=A0=C2=A0=C2=A0 .text.\name >>> +=C2=A0=C2=A0=C2=A0 .globl=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \= name >>> +=C2=A0=C2=A0=C2=A0 .type=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \n= ame, %function >>> +\name\(): >>> +=C2=A0=C2=A0=C2=A0 .endm >>> + >>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 f= n_end, name:req >>> +=C2=A0=C2=A0=C2=A0 .size=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \n= ame, . - \name >>> +=C2=A0=C2=A0=C2=A0 .endm >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Emit an atomic helper for \model with oper= ands of size \sz,=20 >>> using >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * the operation specified by \insn (which is= the LSE name), and=20 >>> which >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * can be implemented using the generic=20 >>> load-locked/store-conditional >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * (LL/SC) sequence below, using the arithmet= ic operation given by >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * \opc. >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 emit_ld_sz, sz:req, insn:req, opc:req, model:req,=20 >>> s, a, l >>> +=C2=A0=C2=A0=C2=A0 fn_start=C2=A0=C2=A0=C2=A0 __aarch64_\insn\()\sz\(= )\model >>> +=C2=A0=C2=A0=C2=A0 mov=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tmp0= _\sz, r0_\sz >>> +0:=C2=A0=C2=A0=C2=A0 ld\a\()xr\s=C2=A0=C2=A0=C2=A0 r0_\sz, [x1] >>> +=C2=A0=C2=A0=C2=A0 .ifnc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \i= nsn, swp >>> +=C2=A0=C2=A0=C2=A0 \opc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tmp= 1_\sz, r0_\sz, tmp0_\sz >>> +=C2=A0=C2=A0=C2=A0 .else >>> +=C2=A0=C2=A0=C2=A0 \opc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tmp= 1_\sz, tmp0_\sz >>> +=C2=A0=C2=A0=C2=A0 .endif >>> +=C2=A0=C2=A0=C2=A0 st\l\()xr\s=C2=A0=C2=A0=C2=A0 w15, tmp1_\sz, [x1] >>> +=C2=A0=C2=A0=C2=A0 cbnz=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 w15= , 0b >> >> I see at the end \s is in {,b,h} range. >> >> Don't you need to use x15 on 64-bit? >=20 > Ard, I expanded all macros and reviewed this patch, but I am still=20 > having hard time to figure why w15 temp is OK instead of x15. Any hint? >=20 Why do you think it should be x15? >> >>> +=C2=A0=C2=A0=C2=A0 ret >>> +=C2=A0=C2=A0=C2=A0 fn_end=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 _= _aarch64_\insn\()\sz\()\model >>> +=C2=A0=C2=A0=C2=A0 .endm >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Emit atomic helpers for \model for operand= sizes in the >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * set {1, 2, 4, 8}, for the instruction patt= ern given by >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * \insn. (This is the LSE name, but this imp= lementation uses >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * the generic LL/SC sequence using \opc as t= he arithmetic >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * operation on the target.) >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 e= mit_ld, insn:req, opc:req, model:req, a, l >>> +=C2=A0=C2=A0=C2=A0 emit_ld_sz=C2=A0=C2=A0=C2=A0 1, \insn, \opc, \mode= l, b, \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_ld_sz=C2=A0=C2=A0=C2=A0 2, \insn, \opc, \mode= l, h, \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_ld_sz=C2=A0=C2=A0=C2=A0 4, \insn, \opc, \mode= l,=C2=A0 , \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_ld_sz=C2=A0=C2=A0=C2=A0 8, \insn, \opc, \mode= l,=C2=A0 , \a, \l >>> +=C2=A0=C2=A0=C2=A0 .endm >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Emit the compare and swap helper for \mode= l and size \sz >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * using LL/SC instructions. >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 emit_cas_sz, sz:req, model:req, uxt:req, s, a, l >>> +=C2=A0=C2=A0=C2=A0 fn_start=C2=A0=C2=A0=C2=A0 __aarch64_cas\sz\()\mod= el >>> +=C2=A0=C2=A0=C2=A0 \uxt=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tmp= 0_\sz, r0_\sz >>> +0:=C2=A0=C2=A0=C2=A0 ld\a\()xr\s=C2=A0=C2=A0=C2=A0 r0_\sz, [x2] >>> +=C2=A0=C2=A0=C2=A0 cmp=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 r0_\= sz, tmp0_\sz >>> +=C2=A0=C2=A0=C2=A0 bne=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1f >>> +=C2=A0=C2=A0=C2=A0 st\l\()xr\s=C2=A0=C2=A0=C2=A0 w15, r1_\sz, [x2] >>> +=C2=A0=C2=A0=C2=A0 cbnz=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 w15= , 0b >>> +1:=C2=A0=C2=A0=C2=A0 ret >>> +=C2=A0=C2=A0=C2=A0 fn_end=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 _= _aarch64_cas\sz\()\model >>> +=C2=A0=C2=A0=C2=A0 .endm >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Emit compare-and-swap helpers for \model f= or operand sizes in=20 >>> the >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * set {1, 2, 4, 8, 16}. >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 e= mit_cas, model:req, a, l >>> +=C2=A0=C2=A0=C2=A0 emit_cas_sz=C2=A0=C2=A0=C2=A0 1, \model, uxtb, b, = \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_cas_sz=C2=A0=C2=A0=C2=A0 2, \model, uxth, h, = \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_cas_sz=C2=A0=C2=A0=C2=A0 4, \model, mov ,=C2= =A0 , \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_cas_sz=C2=A0=C2=A0=C2=A0 8, \model, mov ,=C2= =A0 , \a, \l >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * We cannot use the parameterized sequence f= or 16 byte CAS, so we >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * need to define it explicitly. >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 fn_start=C2=A0=C2=A0=C2=A0 __aarch64_cas16\model >>> +=C2=A0=C2=A0=C2=A0 mov=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 x16,= x0 >>> +=C2=A0=C2=A0=C2=A0 mov=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 x17,= x1 >>> +0:=C2=A0=C2=A0=C2=A0 ld\a\()xp=C2=A0=C2=A0=C2=A0 x0, x1, [x4] >>> +=C2=A0=C2=A0=C2=A0 cmp=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 x0, = x16 >>> +=C2=A0=C2=A0=C2=A0 ccmp=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 x1,= x17, #0, eq >>> +=C2=A0=C2=A0=C2=A0 bne=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1f >>> +=C2=A0=C2=A0=C2=A0 st\l\()xp=C2=A0=C2=A0=C2=A0 w15, x16, x17, [x4] >>> +=C2=A0=C2=A0=C2=A0 cbnz=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 w15= , 0b >>> +1:=C2=A0=C2=A0=C2=A0 ret >>> +=C2=A0=C2=A0=C2=A0 fn_end=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 _= _aarch64_cas16\model >>> +=C2=A0=C2=A0=C2=A0 .endm >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Emit the set of GCC outline atomic helper = functions for >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * the memory ordering model given by \model: >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * - relax=C2=A0=C2=A0=C2=A0 unordered loads = and stores >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * - acq=C2=A0=C2=A0=C2=A0 load-acquire, unor= dered store >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * - rel=C2=A0=C2=A0=C2=A0 unordered load, st= ore-release >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * - acq_rel=C2=A0=C2=A0=C2=A0 load-acquire, = store-release >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 e= mit_model, model:req, a, l >>> +=C2=A0=C2=A0=C2=A0 emit_ld=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = ldadd, add, \model, \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_ld=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = ldclr, bic, \model, \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_ld=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = ldeor, eor, \model, \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_ld=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = ldset, orr, \model, \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_ld=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = swp,=C2=A0=C2=A0 mov, \model, \a, \l >>> +=C2=A0=C2=A0=C2=A0 emit_cas=C2=A0=C2=A0=C2=A0 \model, \a, \l >>> +=C2=A0=C2=A0=C2=A0 .endm >>> + >>> +=C2=A0=C2=A0=C2=A0 emit_model=C2=A0=C2=A0=C2=A0 _relax >>> +=C2=A0=C2=A0=C2=A0 emit_model=C2=A0=C2=A0=C2=A0 _acq, a >>> +=C2=A0=C2=A0=C2=A0 emit_model=C2=A0=C2=A0=C2=A0 _rel,, l >>> +=C2=A0=C2=A0=C2=A0 emit_model=C2=A0=C2=A0=C2=A0 _acq_rel, a, l >>> >=20 >=20 >=20 >=20