From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=17.171.2.68; helo=ma1-aaemail-dr-lapp02.apple.com; envelope-from=afish@apple.com; receiver=edk2-devel@lists.01.org Received: from ma1-aaemail-dr-lapp02.apple.com (ma1-aaemail-dr-lapp02.apple.com [17.171.2.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 026F5211CFBF2 for ; Tue, 5 Mar 2019 14:43:49 -0800 (PST) Received: from pps.filterd (ma1-aaemail-dr-lapp02.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id x25MWK2t035447; Tue, 5 Mar 2019 14:43:47 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=mime-version : content-type : sender : from : message-id : subject : date : in-reply-to : cc : to : references; s=20180706; bh=W+LSf03Y0NC8npzjTOJolyerKAMOUWdVAFYl9wLF12A=; b=rUkhuQCU6K49V9BAnX2OR6hRmeUS+s42IIIKh6sg9eN4dFdctv49auLbS4yi2o1nQZJi fV1aXkcY7OFpcixZVhrcpKeeRtxOnv5KC+9YNeR5SqRaYaTEyYTqzVf/9OAAvsA651Kw 4eFq4s7VM+R36OAHkjgc5ELZ3MCUF5JX6ntbeDqedICYtF1Hqxxk0M5FZMrqOjFh5n3l yxYin7lvXnmkUajXB6Fb1NWjgZseauOQT2whYwKT9pBxchAKwzcqGTWI5oT47mFid9vy PaClE9Qm7XhqO+iPKQa1pjCGHRdr3ZFub821ZZrU26ptR6NCVh5QQ4E8oRCbkgxUdod5 iQ== Received: from mr2-mtap-s02.rno.apple.com (mr2-mtap-s02.rno.apple.com [17.179.226.134]) by ma1-aaemail-dr-lapp02.apple.com with ESMTP id 2qyqd2swwk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 05 Mar 2019 14:43:47 -0800 MIME-version: 1.0 Received: from ma1-mmpp-sz10.apple.com (ma1-mmpp-sz10.apple.com [17.171.128.150]) by mr2-mtap-s02.rno.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) with ESMTPS id <0PNX00EKK0GY0V90@mr2-mtap-s02.rno.apple.com>; Tue, 05 Mar 2019 14:43:46 -0800 (PST) Received: from process_milters-daemon.ma1-mmpp-sz10.apple.com by ma1-mmpp-sz10.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) id <0PNX00I0006XOO00@ma1-mmpp-sz10.apple.com>; Tue, 05 Mar 2019 14:43:46 -0800 (PST) X-Va-A: X-Va-T-CD: 81ca60fce39c2560b6c4a7e5841f9b8f X-Va-E-CD: 7d4c618f85f76966dfbaabb394b9e4f0 X-Va-R-CD: a66fd94a21bd5e42e01791934a036d45 X-Va-CD: 0 X-Va-ID: 94f46a04-604b-4a93-aec9-18a0a7d22089 X-V-A: X-V-T-CD: 81ca60fce39c2560b6c4a7e5841f9b8f X-V-E-CD: 7d4c618f85f76966dfbaabb394b9e4f0 X-V-R-CD: a66fd94a21bd5e42e01791934a036d45 X-V-CD: 0 X-V-ID: cf9c4311-73b7-46b0-b084-251f1948e5ea X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-03-05_11:,, signatures=0 Received: from [17.234.230.246] (unknown [17.234.230.246]) by ma1-mmpp-sz10.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) with ESMTPSA id <0PNX009AS0GWRH50@ma1-mmpp-sz10.apple.com>; Tue, 05 Mar 2019 14:43:46 -0800 (PST) Sender: afish@apple.com From: Andrew Fish Message-id: Date: Tue, 05 Mar 2019 14:43:42 -0800 In-reply-to: <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FBBC2@SHSMSX104.ccr.corp.intel.com> Cc: "Zhang, Shenglei" , edk2-devel-01 , Mike Kinney To: "Gao, Liming" References: <20190305014059.17988-1-shenglei.zhang@intel.com> <20190305014059.17988-4-shenglei.zhang@intel.com> <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FBBC2@SHSMSX104.ccr.corp.intel.com> X-Mailer: Apple Mail (2.3445.6.18) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-05_11:, , signatures=0 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 22:43:50 -0000 Content-Type: text/plain; CHARSET=US-ASCII Content-Transfer-Encoding: 7BIT > On Mar 5, 2019, at 1:32 PM, Gao, Liming wrote: > > Andrew: > BZ 1163 is to remove inline X86 assembly code in C source file. But, this patch is wrong. I have gave my comments to update this patch. > Why do we want to remove inline X86 assembly. As I mentioned it will lead to larger binaries, slower binaries, and less optimized code. For example take this simple inline assembly function: VOID EFIAPI CpuBreakpoint ( VOID ) { __asm__ __volatile__ ("int $3"); } Today with clang LTO turned on calling CpuBreakpoint() looks like this from the callers point of view. a.out[0x1fa5] <+6>: cc int3 But if we move that to NASM: a.out[0x1fa6] <+7>: e8 07 00 00 00 calll 0x1fb2 ; CpuBreakpoint plus: a.out`CpuBreakpoint: a.out[0x1fb2] <+0>: cc int3 a.out[0x1fb3] <+1>: c3 retl And there is also an extra push and pop on the stack. The other issue is the call to the function that is unknown to the compiler will act like a _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++ 2017). Is the depreciation of some of these intrinsics in VC++ driving the removal of inline assembly? For GCC inline assembly works great for local compile, and for clang LTO it works in entire program optimization. The LTO bitcode includes inline assembly and constraints so that the optimizer knows what to do so it can get optimized just like the abstract bitcode during the Link Time Optimization. This is CpuBreakpoint(): ; Function Attrs: noinline nounwind optnone ssp uwtable define void @CpuBreakpoint() #0 { call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !3 ret void } This is AsmReadMsr64(): ; Function Attrs: noinline nounwind optnone ssp uwtable define i64 @AsmReadMsr64(i32) #0 { %2 = alloca i32, align 4 %3 = alloca i64, align 8 store i32 %0, i32* %2, align 4 %4 = load i32, i32* %2, align 4 %5 = call i64 asm sideeffect "rdmsr", "=A,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4 store i64 %5, i64* %3, align 8 %6 = load i64, i64* %3, align 8 ret i64 %6 } I agree it make sense to remove .S and .asm files and only have .nasm files. Thanks, Andrew Fish PS For the Xcode clang since it emits frame pointers by default we need to add the extra 4 bytes to save the assembly functions so the stack can get unwound. a.out`CpuBreakpoint: a.out[0x1f99] <+0>: 55 pushl %ebp a.out[0x1f9a] <+1>: 89 e5 movl %esp, %ebp a.out[0x1f9c] <+3>: cc int3 a.out[0x1f9d] <+4>: 5d popl %ebp a.out[0x1f9e] <+5>: c3 retl So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics. > The change is to reduce the duplicated implementation. It will be good on the code maintain. Recently, one patch has to update .c and .nasm both. Please see https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html . Beside this change, I propose to remove all GAS assembly code for IA32 and X64 arch. After those change, the patch owner will collect the impact of the image size. > > Thanks > Liming