From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=17.151.62.68; helo=nwk-aaemail-lapp03.apple.com; envelope-from=afish@apple.com; receiver=edk2-devel@lists.01.org Received: from nwk-aaemail-lapp03.apple.com (nwk-aaemail-lapp03.apple.com [17.151.62.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 53F4B211D59DF for ; Wed, 6 Mar 2019 17:31:34 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp03.apple.com [127.0.0.1]) by nwk-aaemail-lapp03.apple.com (8.16.0.27/8.16.0.27) with SMTP id x271QjTJ013298; Wed, 6 Mar 2019 17:31:33 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=mime-version : content-type : sender : from : message-id : subject : date : in-reply-to : cc : to : references; s=20180706; bh=ByffYjt2vhbQsP5nWQNmg3GA4zn8yoEYIVEGvXI1UFM=; b=HpvFlmQUr0liUm+j/vgh3afPOLBxcE9CNxlldIsnSVUhR4jTywNrY6R1ZCPjyKLL4DM5 Db3h+DO0A+E3Bkql7tmsjjhLh7cqRTD1PfunSSG0ZYxoN1//a+X4QPbfxx+1pmLRCkjW Nr9w44otxDGOs5Dy/rOhAMJ79DMKBaNzoon8B7Duy9xnOeA5NfhBizA8+3yRyJNkqkW6 W48dZPIWaCGiajVehfmJYI7qQNnTcMdZe2lxmeZhHdM4exTzwTHkNFzL+P/dSfZ8TJEW vJiHczUuEfHkf8m6sY4PLz+xi8QBhfW1TuHnNXVrHi3ACpMa5H/KNyJovPVluRNfKEj2 cw== Received: from mr2-mtap-s02.rno.apple.com (mr2-mtap-s02.rno.apple.com [17.179.226.134]) by nwk-aaemail-lapp03.apple.com with ESMTP id 2r0ac3qw0e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 06 Mar 2019 17:31:33 -0800 MIME-version: 1.0 Received: from ma1-mmpp-sz10.apple.com (ma1-mmpp-sz10.apple.com [17.171.128.150]) by mr2-mtap-s02.rno.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) with ESMTPS id <0PNZ0030B2WKRR20@mr2-mtap-s02.rno.apple.com>; Wed, 06 Mar 2019 17:31:33 -0800 (PST) Received: from process_milters-daemon.ma1-mmpp-sz10.apple.com by ma1-mmpp-sz10.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) id <0PNZ007001L63P00@ma1-mmpp-sz10.apple.com>; Wed, 06 Mar 2019 17:31:32 -0800 (PST) X-Va-A: X-Va-T-CD: da3a4df698400084da27c6ab403bcb35 X-Va-E-CD: 7d4c618f85f76966dfbaabb394b9e4f0 X-Va-R-CD: a66fd94a21bd5e42e01791934a036d45 X-Va-CD: 0 X-Va-ID: 4508cb5f-4f8d-4485-a131-01c54e351842 X-V-A: X-V-T-CD: da3a4df698400084da27c6ab403bcb35 X-V-E-CD: 7d4c618f85f76966dfbaabb394b9e4f0 X-V-R-CD: a66fd94a21bd5e42e01791934a036d45 X-V-CD: 0 X-V-ID: c1b77307-0569-4642-841b-569646681076 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-03-07_01:,, signatures=0 Received: from [17.234.172.123] (unknown [17.234.172.123]) by ma1-mmpp-sz10.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) with ESMTPSA id <0PNZ003SM2WFJS60@ma1-mmpp-sz10.apple.com>; Wed, 06 Mar 2019 17:31:32 -0800 (PST) Sender: afish@apple.com From: Andrew Fish Message-id: <47E8FE4A-4BDC-48A3-B7E6-02F1682C0990@apple.com> Date: Wed, 06 Mar 2019 17:31:25 -0800 In-reply-to: Cc: "Gao, Liming" , "Zhang, Shenglei" , edk2-devel-01 To: Mike Kinney References: <20190305014059.17988-1-shenglei.zhang@intel.com> <20190305014059.17988-4-shenglei.zhang@intel.com> <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FBBC2@SHSMSX104.ccr.corp.intel.com> <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FCC47@SHSMSX104.ccr.corp.intel.com> X-Mailer: Apple Mail (2.3445.6.18) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-07_01:, , signatures=0 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 01:31:34 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > On Mar 6, 2019, at 4:41 PM, Kinney, Michael D = wrote: >=20 > Liming, > =20 > That does not work either because inline assembly uses compiler = specific syntax. > =20 > Please update with the specific list of functions that you think the = inline should be removed to improve maintenance with no impacts in = size/perf. > =20 Mike, It is easy to find the gcc/clang inline assembler, just `git grep = "__asm__ __volatile__"` The main files are: = https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLibIntr= insic/IoLibGcc.c = https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X64/G= ccInline.c = https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/Ia32/= GccInline.c = https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSynchroni= zationLib/Ia32/GccInline.c = https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSynchroni= zationLib/X64/GccInline.c This is really just compiler optimizer control.=20 Library/BaseSynchronizationLib/SynchronizationGcc.c:21:#define = _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } = while(0) Looks like this is working around the alignment ASSERT in = BaseIoLibIntrinsic.=20 OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:43: __asm__ __volatile__ ( "outl = %0, %1" : : "a" (Value), "d" ((UINT16)Port) ); OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:67: __asm__ __volatile__ ( "inl = %1, %0" : "=3Da" (Data) : "d" ((UINT16)Port) ); It seems like we have a reasonable set and should not use the inline = assembly for any more complex code.=20 Thanks, Andrew Fish > The issue you pointed to was around SetJump()/LongJump(). Can we = limit this BZ to only those 2 functions? > =20 > Mike > =C2=A0 <> > From: Gao, Liming=20 > Sent: Wednesday, March 6, 2019 4:28 PM > To: afish@apple.com > Cc: Zhang, Shenglei ; edk2-devel-01 = ; Kinney, Michael D = > Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove = inline X86 assembly code > =20 > Andrew: > I want to keep only one implementation. If inline assembly c source = is preferred, I suggest to remove its nasm implementation. > =20 > Thanks > Liming > <>From: afish@apple.com = [mailto:afish@apple.com ]=20 > Sent: Tuesday, March 5, 2019 2:44 PM > To: Gao, Liming > > Cc: Zhang, Shenglei >; edk2-devel-01 = >; Kinney, = Michael D > > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove = inline X86 assembly code > =20 > =20 > =20 >=20 > On Mar 5, 2019, at 1:32 PM, Gao, Liming > wrote: > =20 > Andrew: > BZ 1163 is to remove inline X86 assembly code in C source file. But, = this patch is wrong. I have gave my comments to update this patch. >=20 > =20 > Why do we want to remove inline X86 assembly. As I mentioned it will = lead to larger binaries, slower binaries, and less optimized code. > =20 > For example take this simple inline assembly function: > =20 > VOID > EFIAPI > CpuBreakpoint ( > VOID > ) > { > __asm__ __volatile__ ("int $3"); > } > =20 >=20 > Today with clang LTO turned on calling CpuBreakpoint() looks like this = from the callers point of view.=20 > =20 > a.out[0x1fa5] <+6>: cc int3 =20 > =20 >=20 > But if we move that to NASM: > =20 >=20 > a.out[0x1fa6] <+7>: e8 07 00 00 00 calll 0x1fb2 = ; CpuBreakpoint > =20 >=20 > plus: > a.out`CpuBreakpoint: > a.out[0x1fb2] <+0>: cc int3 =20 > a.out[0x1fb3] <+1>: c3 retl =20 > =20 >=20 > And there is also an extra push and pop on the stack. The other issue = is the call to the function that is unknown to the compiler will act = like a _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in = VC++ 2017). Is the depreciation of some of these intrinsics in VC++ = driving the removal of inline assembly? For GCC inline assembly works = great for local compile, and for clang LTO it works in entire program = optimization. > =20 > The LTO bitcode includes inline assembly and constraints so that the = optimizer knows what to do so it can get optimized just like the = abstract bitcode during the Link Time Optimization.=20 > =20 > This is CpuBreakpoint(): > ; Function Attrs: noinline nounwind optnone ssp uwtable > define void @CpuBreakpoint() #0 { > call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() = #2, !srcloc !3 > ret void > } > =20 >=20 > This is AsmReadMsr64(): > ; Function Attrs: noinline nounwind optnone ssp uwtable > define i64 @AsmReadMsr64(i32) #0 { > %2 =3D alloca i32, align 4 > %3 =3D alloca i64, align 8 > store i32 %0, i32* %2, align 4 > %4 =3D load i32, i32* %2, align 4 > %5 =3D call i64 asm sideeffect "rdmsr", = "=3DA,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4 > store i64 %5, i64* %3, align 8 > %6 =3D load i64, i64* %3, align 8 > ret i64 %6 > } > =20 >=20 > =20 > I agree it make sense to remove .S and .asm files and only have .nasm = files.=20 > =20 > Thanks, > =20 > Andrew Fish > =20 > PS For the Xcode clang since it emits frame pointers by default we = need to add the extra 4 bytes to save the assembly functions so the = stack can get unwound.=20 > =20 > a.out`CpuBreakpoint: > a.out[0x1f99] <+0>: 55 pushl %ebp > a.out[0x1f9a] <+1>: 89 e5 movl %esp, %ebp > a.out[0x1f9c] <+3>: cc int3 =20 > a.out[0x1f9d] <+4>: 5d popl %ebp > a.out[0x1f9e] <+5>: c3 retl =20 > =20 >=20 > So breakpoint goes from 1 byte to 11 bytes if we get rid of the = intrinsics.=20 > =20 >=20 > The change is to reduce the duplicated implementation. It will be = good on the code maintain. Recently, one patch has to update .c and = .nasm both. Please see = https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html = . = Beside this change, I propose to remove all GAS assembly code for IA32 = and X64 arch. After those change, the patch owner will collect the = impact of the image size.=20 >=20 > Thanks > Liming