From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.120; helo=mga04.intel.com; envelope-from=liming.gao@intel.com; receiver=edk2-devel@lists.01.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 9229621A09130 for ; Wed, 6 Mar 2019 16:28:15 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Mar 2019 16:28:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,449,1544515200"; d="scan'208,217";a="131890440" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga003.jf.intel.com with ESMTP; 06 Mar 2019 16:28:14 -0800 Received: from FMSMSX110.amr.corp.intel.com (10.18.116.10) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.408.0; Wed, 6 Mar 2019 16:28:14 -0800 Received: from shsmsx108.ccr.corp.intel.com (10.239.4.97) by fmsmsx110.amr.corp.intel.com (10.18.116.10) with Microsoft SMTP Server (TLS) id 14.3.408.0; Wed, 6 Mar 2019 16:28:13 -0800 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.74]) by SHSMSX108.ccr.corp.intel.com ([169.254.8.57]) with mapi id 14.03.0415.000; Thu, 7 Mar 2019 08:28:12 +0800 From: "Gao, Liming" To: "afish@apple.com" CC: "Zhang, Shenglei" , edk2-devel-01 , "Kinney, Michael D" Thread-Topic: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code Thread-Index: AQHU0vSE94gh21G/NUi0Z1yYhGmykqX9ACSAgACMrPD//5DuAIACKyZQ Date: Thu, 7 Mar 2019 00:28:11 +0000 Message-ID: <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FCC47@SHSMSX104.ccr.corp.intel.com> References: <20190305014059.17988-1-shenglei.zhang@intel.com> <20190305014059.17988-4-shenglei.zhang@intel.com> <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FBBC2@SHSMSX104.ccr.corp.intel.com> In-Reply-To: Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_NT x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiY2JjOWNjMmEtNTI0ZC00ZDczLWIyMGEtODU5YWQ0NjM3YWVmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiaW1ya29JUjZ1V0ZzdWdcL0NRUExCcFwvbStSVE5NMlRyT3Exc1g1bXlBcGFxZytVTlh0TUZVckpYOEtkalhyWmxRIn0= dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.239.127.40] MIME-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 00:28:15 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Andrew: I want to keep only one implementation. If inline assembly c source is pr= eferred, I suggest to remove its nasm implementation. Thanks Liming From: afish@apple.com [mailto:afish@apple.com] Sent: Tuesday, March 5, 2019 2:44 PM To: Gao, Liming Cc: Zhang, Shenglei ; edk2-devel-01 ; Kinney, Michael D Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inlin= e X86 assembly code On Mar 5, 2019, at 1:32 PM, Gao, Liming > wrote: Andrew: BZ 1163 is to remove inline X86 assembly code in C source file. But, this = patch is wrong. I have gave my comments to update this patch. Why do we want to remove inline X86 assembly. As I mentioned it will lead t= o larger binaries, slower binaries, and less optimized code. For example take this simple inline assembly function: VOID EFIAPI CpuBreakpoint ( VOID ) { __asm__ __volatile__ ("int $3"); } Today with clang LTO turned on calling CpuBreakpoint() looks like this from= the callers point of view. a.out[0x1fa5] <+6>: cc int3 But if we move that to NASM: a.out[0x1fa6] <+7>: e8 07 00 00 00 calll 0x1fb2 ; Cpu= Breakpoint plus: a.out`CpuBreakpoint: a.out[0x1fb2] <+0>: cc int3 a.out[0x1fb3] <+1>: c3 retl And there is also an extra push and pop on the stack. The other issue is th= e call to the function that is unknown to the compiler will act like a _Rea= dWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++ 2017). I= s the depreciation of some of these intrinsics in VC++ driving the removal = of inline assembly? For GCC inline assembly works great for local compile, = and for clang LTO it works in entire program optimization. The LTO bitcode includes inline assembly and constraints so that the optimi= zer knows what to do so it can get optimized just like the abstract bitcode= during the Link Time Optimization. This is CpuBreakpoint(): ; Function Attrs: noinline nounwind optnone ssp uwtable define void @CpuBreakpoint() #0 { call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() #2, != srcloc !3 ret void } This is AsmReadMsr64(): ; Function Attrs: noinline nounwind optnone ssp uwtable define i64 @AsmReadMsr64(i32) #0 { %2 =3D alloca i32, align 4 %3 =3D alloca i64, align 8 store i32 %0, i32* %2, align 4 %4 =3D load i32, i32* %2, align 4 %5 =3D call i64 asm sideeffect "rdmsr", "=3DA,{cx},~{dirflag},~{fpsr},~{f= lags}"(i32 %4) #2, !srcloc !4 store i64 %5, i64* %3, align 8 %6 =3D load i64, i64* %3, align 8 ret i64 %6 } I agree it make sense to remove .S and .asm files and only have .nasm files= . Thanks, Andrew Fish PS For the Xcode clang since it emits frame pointers by default we need to = add the extra 4 bytes to save the assembly functions so the stack can get u= nwound. a.out`CpuBreakpoint: a.out[0x1f99] <+0>: 55 pushl %ebp a.out[0x1f9a] <+1>: 89 e5 movl %esp, %ebp a.out[0x1f9c] <+3>: cc int3 a.out[0x1f9d] <+4>: 5d popl %ebp a.out[0x1f9e] <+5>: c3 retl So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics. The change is to reduce the duplicated implementation. It will be good on = the code maintain. Recently, one patch has to update .c and .nasm both. Ple= ase see https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html= . Beside this change, I propose to remove all GAS assembly code for IA32 an= d X64 arch. After those change, the patch owner will collect the impact of = the image size. Thanks Liming