From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.10711.1590080540353861371 for ; Thu, 21 May 2020 10:02:20 -0700 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ard.biesheuvel@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0148330E; Thu, 21 May 2020 10:02:20 -0700 (PDT) Received: from [192.168.1.81] (unknown [10.37.8.250]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E57E73F68F; Thu, 21 May 2020 10:02:16 -0700 (PDT) Subject: Re: [edk2-devel] [PATCH v2] ArmPkg/CompilerIntrinsicsLib: provide atomics intrinsics To: =?UTF-8?Q?Philippe_Mathieu-Daud=c3=a9?= , devel@edk2.groups.io Cc: glin@suse.com, leif@nuviainc.com, lersek@redhat.com, liming.gao@intel.com References: <20200520114448.26104-1-ard.biesheuvel@arm.com> <47f54425-df5d-17a3-e134-fe9e01fb08bd@redhat.com> <6c9ec3a3-8aa1-c4d4-c7ba-1b9e28fd0866@redhat.com> <0e981fa9-c365-dcf8-1660-f472075e35d9@redhat.com> From: "Ard Biesheuvel" Message-ID: <0f76773c-76d9-d7c7-f6ee-4d34d663e114@arm.com> Date: Thu, 21 May 2020 19:02:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <0e981fa9-c365-dcf8-1660-f472075e35d9@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 5/21/20 6:59 PM, Philippe Mathieu-Daud=C3=A9 wrote: > On 5/21/20 6:45 PM, Ard Biesheuvel wrote: >> On 5/21/20 6:40 PM, Philippe Mathieu-Daud=C3=A9 via groups.io wrote: >>> On 5/20/20 2:37 PM, Philippe Mathieu-Daud=C3=A9 wrote: >>>> Hi Ard, >>>> >>>> On 5/20/20 1:44 PM, Ard Biesheuvel wrote: >>>>> Gary reports the GCC 10 will emit calls to atomics intrinsics routi= nes >>>>> unless -mno-outline-atomics is specified. This means GCC-10 introdu= ces >>>>> new intrinsics, and even though it would be possible to work around= =20 >>>>> this >>>>> by specifying the command line option, this would require a new GCC= 10 >>>>> toolchain profile to be created, which we prefer to avoid. >>>>> >>>>> So instead, add the new intrinsics to our library so they are provi= ded >>>>> when necessary. >>>>> >>>>> Signed-off-by: Ard Biesheuvel >>>>> --- >>>>> v2: >>>>> - add missing .globl to export the functions from the object file >>>>> - add function end markers so the size of each is visible in the=20 >>>>> ELF metadata >>>>> - add some comments to describe what is going on >>>> >>>> Thanks, head hurts a bit less... >>>> >>>>> >>>>> =C2=A0 ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.i= nf | 3 + >>>>> =C2=A0 ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=20 >>>>> 142 ++++++++++++++++++++ >>>>> =C2=A0 2 files changed, 145 insertions(+) >>>>> >>>>> diff --git=20 >>>>> a/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf=20 >>>>> b/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.inf >>>>> index d5bad9467758..fcf48c678119 100644 >>>>> --- a/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.in= f >>>>> +++ b/ArmPkg/Library/CompilerIntrinsicsLib/CompilerIntrinsicsLib.in= f >>>>> @@ -79,6 +79,9 @@ [Sources.ARM] >>>>> =C2=A0=C2=A0=C2=A0 Arm/ldivmod.asm=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | = MSFT >>>>> =C2=A0=C2=A0=C2=A0 Arm/llsr.asm=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 | MSFT >>>>> +[Sources.AARCH64] >>>>> +=C2=A0 AArch64/Atomics.S=C2=A0=C2=A0=C2=A0 | GCC >>>>> + >>>>> =C2=A0 [Packages] >>>>> =C2=A0=C2=A0=C2=A0 MdePkg/MdePkg.dec >>>>> =C2=A0=C2=A0=C2=A0 ArmPkg/ArmPkg.dec >>>>> diff --git a/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S= =20 >>>>> b/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S >>>>> new file mode 100644 >>>>> index 000000000000..dc61d6bb8e52 >>>>> --- /dev/null >>>>> +++ b/ArmPkg/Library/CompilerIntrinsicsLib/AArch64/Atomics.S >>>>> @@ -0,0 +1,142 @@ >>>>> +#-----------------------------------------------------------------= -------------=20 >>>>> >>>>> +# >>>>> +# Copyright (c) 2020, Arm, Limited. All rights reserved.
>>>>> +# >>>>> +# SPDX-License-Identifier: BSD-2-Clause-Patent >>>>> +# >>>>> +#-----------------------------------------------------------------= -------------=20 >>>>> >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Provide the GCC intrinsics that are req= uired when using GCC=20 >>>>> 9 or >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * later with the -moutline-atomics option= s (which became the=20 >>>>> default >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * in GCC 10) >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>>> +=C2=A0=C2=A0=C2=A0 .arch armv8-a >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= reg_alias, pfx, sz >>>>> +=C2=A0=C2=A0=C2=A0 r0_\sz=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= .req=C2=A0=C2=A0=C2=A0 \pfx\()0 >>>>> +=C2=A0=C2=A0=C2=A0 r1_\sz=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= .req=C2=A0=C2=A0=C2=A0 \pfx\()1 >>>>> +=C2=A0=C2=A0=C2=A0 tmp0_\sz=C2=A0=C2=A0=C2=A0 .req=C2=A0=C2=A0=C2=A0= \pfx\()16 >>>>> +=C2=A0=C2=A0=C2=A0 tmp1_\sz=C2=A0=C2=A0=C2=A0 .req=C2=A0=C2=A0=C2=A0= \pfx\()17 >>>>> +=C2=A0=C2=A0=C2=A0 .endm >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Define register aliases of the right ty= pe for each size >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * (xN for 8 bytes, wN for everything smal= ler) >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>>> +=C2=A0=C2=A0=C2=A0 reg_alias=C2=A0=C2=A0=C2=A0 w, 1 >>>>> +=C2=A0=C2=A0=C2=A0 reg_alias=C2=A0=C2=A0=C2=A0 w, 2 >>>>> +=C2=A0=C2=A0=C2=A0 reg_alias=C2=A0=C2=A0=C2=A0 w, 4 >>>>> +=C2=A0=C2=A0=C2=A0 reg_alias=C2=A0=C2=A0=C2=A0 x, 8 >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= fn_start, name:req >>>>> +=C2=A0=C2=A0=C2=A0 .section=C2=A0=C2=A0=C2=A0 .text.\name >>>>> +=C2=A0=C2=A0=C2=A0 .globl=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= \name >>>>> +=C2=A0=C2=A0=C2=A0 .type=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= \name, %function >>>>> +\name\(): >>>>> +=C2=A0=C2=A0=C2=A0 .endm >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= fn_end, name:req >>>>> +=C2=A0=C2=A0=C2=A0 .size=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= \name, . - \name >>>>> +=C2=A0=C2=A0=C2=A0 .endm >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Emit an atomic helper for \model with o= perands of size \sz,=20 >>>>> using >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * the operation specified by \insn (which= is the LSE name),=20 >>>>> and which >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * can be implemented using the generic=20 >>>>> load-locked/store-conditional >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * (LL/SC) sequence below, using the arith= metic operation=20 >>>>> given by >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * \opc. >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>>> +=C2=A0=C2=A0=C2=A0 .macro=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 emit_ld_sz, sz:req, insn:req, opc:req,=20 >>>>> model:req, s, a, l >>>>> +=C2=A0=C2=A0=C2=A0 fn_start=C2=A0=C2=A0=C2=A0 __aarch64_\insn\()\s= z\()\model >>>>> +=C2=A0=C2=A0=C2=A0 mov=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 t= mp0_\sz, r0_\sz >>>>> +0:=C2=A0=C2=A0=C2=A0 ld\a\()xr\s=C2=A0=C2=A0=C2=A0 r0_\sz, [x1] >>>>> +=C2=A0=C2=A0=C2=A0 .ifnc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= \insn, swp >>>>> +=C2=A0=C2=A0=C2=A0 \opc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = tmp1_\sz, r0_\sz, tmp0_\sz >>>>> +=C2=A0=C2=A0=C2=A0 .else >>>>> +=C2=A0=C2=A0=C2=A0 \opc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = tmp1_\sz, tmp0_\sz >>>>> +=C2=A0=C2=A0=C2=A0 .endif >>>>> +=C2=A0=C2=A0=C2=A0 st\l\()xr\s=C2=A0=C2=A0=C2=A0 w15, tmp1_\sz, [x= 1] >>>>> +=C2=A0=C2=A0=C2=A0 cbnz=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = w15, 0b >>>> >>>> I see at the end \s is in {,b,h} range. >>>> >>>> Don't you need to use x15 on 64-bit? >>> >>> Ard, I expanded all macros and reviewed this patch, but I am still=20 >>> having hard time to figure why w15 temp is OK instead of x15. Any hin= t? >>> >> >> Why do you think it should be x15? >=20 > I.e.: >=20 > https://developer.arm.com/docs/100076/0100/instruction-set-reference/a6= 4-data-transfer-instructions/ldaxr=20 >=20 >=20 > Syntax >=20 > LDAXR Wt, [Xn|SP{,#0}] ; 32-bit > LDAXR Xt, [Xn|SP{,#0}] ; 64-bit >=20 That is the load part, where Wt and Xt map onto r0 in the code above. https://developer.arm.com/docs/100076/0100/instruction-set-reference/a64-= data-transfer-instructions/stlxr gives the description for the store part, where the w15 register is=20 actually used.