From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=afish@apple.com; receiver=edk2-devel@lists.01.org Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 19752211D230E for ; Fri, 8 Mar 2019 16:15:14 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id x2906kVP044090; Fri, 8 Mar 2019 16:15:13 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=mime-version : content-type : sender : from : message-id : subject : date : in-reply-to : cc : to : references; s=20180706; bh=QqJlCW6au+gpbSioc20ympwDFSvM+Rn1uQY5FshguPc=; b=so3B3NuXEzQKmVEKwonguAgijCdrqeDnXyItvlSs2pCVrGU/1yLqtI/i6KuJXVFAneP8 +hoFNHf4dHR5k0x2zE5GXT1lUjxpQM0+rtvJscPCIJk6IVpN+dfdheqoqR4GT0J+4Nxy t5Q8rW1aD1mQhTFFcKRqmyYevkmF2NMi+Y0liXaZxgw/BlD9YL5P0QioyHaNAohhnj4z tTR7qV756z1sT6nC4LFuyXrLoexfCyJ4yjz1GkFMhFSf2G6Ky+SbfC2rx9h/HDeZGBE5 ksW6InDA5++KzMoZgRGovzxb8d9g4t0WXkIvps2+RY9qK+b14bMGr8c8sZIx/M7OfjR/ eA== Received: from mr2-mtap-s03.rno.apple.com (mr2-mtap-s03.rno.apple.com [17.179.226.135]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2qyq9vu0mg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Fri, 08 Mar 2019 16:15:13 -0800 MIME-version: 1.0 Received: from ma1-mmpp-sz10.apple.com (ma1-mmpp-sz10.apple.com [17.171.128.150]) by mr2-mtap-s03.rno.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) with ESMTPS id <0PO2003YTOPC55K0@mr2-mtap-s03.rno.apple.com>; Fri, 08 Mar 2019 16:15:13 -0800 (PST) Received: from process_milters-daemon.ma1-mmpp-sz10.apple.com by ma1-mmpp-sz10.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) id <0PO200C00NRYW500@ma1-mmpp-sz10.apple.com>; Fri, 08 Mar 2019 16:15:12 -0800 (PST) X-Va-A: X-Va-T-CD: da3a4df698400084da27c6ab403bcb35 X-Va-E-CD: 7d4c618f85f76966dfbaabb394b9e4f0 X-Va-R-CD: a66fd94a21bd5e42e01791934a036d45 X-Va-CD: 0 X-Va-ID: 2934fd93-fc90-4908-854e-13314c8a4443 X-V-A: X-V-T-CD: da3a4df698400084da27c6ab403bcb35 X-V-E-CD: 7d4c618f85f76966dfbaabb394b9e4f0 X-V-R-CD: a66fd94a21bd5e42e01791934a036d45 X-V-CD: 0 X-V-ID: 64f6d4ef-104c-43ed-8350-880d1f3bda78 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-03-08_21:,, signatures=0 Received: from [17.234.255.27] (unknown [17.234.255.27]) by ma1-mmpp-sz10.apple.com (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24 2018)) with ESMTPSA id <0PO200EF9OP9S030@ma1-mmpp-sz10.apple.com>; Fri, 08 Mar 2019 16:15:12 -0800 (PST) Sender: afish@apple.com From: Andrew Fish Message-id: Date: Fri, 08 Mar 2019 16:15:06 -0800 In-reply-to: Cc: "Yao, Jiewen" , "Gao, Liming" , edk2-devel-01 To: Mike Kinney References: <20190305014059.17988-1-shenglei.zhang@intel.com> <20190305014059.17988-4-shenglei.zhang@intel.com> <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FBBC2@SHSMSX104.ccr.corp.intel.com> <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FCC47@SHSMSX104.ccr.corp.intel.com> <47E8FE4A-4BDC-48A3-B7E6-02F1682C0990@apple.com> <74D8A39837DF1E4DA445A8C0B3885C503F556F4E@shsmsx102.ccr.corp.intel.com> <74D8A39837DF1E4DA445A8C0B3885C503F55722E@shsmsx102.ccr.corp.intel.com> <0F79F0FC-123A-4862-926D-DFE08CF5740F@apple.com> <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FCFF0@SHSMSX104.ccr.corp.intel.com> <74D8A39837DF1E4DA445A8C0B3885C503F55769F@shsmsx102.ccr.corp.intel.com> X-Mailer: Apple Mail (2.3445.6.18) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-08_21:, , signatures=0 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 00:15:15 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > On Mar 8, 2019, at 4:07 PM, Kinney, Michael D = wrote: >=20 > Jiewen, > =20 > There are not many of those functions and they can be on speed paths. > =20 > I do not recommend converting them to only NASM. > =20 > I do recommend we add comments to NASM files and C files with include = assembly that state that updates to one require updates to the others. > =20 Mike, It looks like NASM files only exist to support the INTEL compiler. Could = that be an outdated assumption? Overtime has the INTEL compiler added = more compatibility with GCC inline assembly, or VC++ intrinsics? I = assume people are still using the INTEL compiler?=20 Thanks, Andrew Fish > Mike > =C2=A0 <> > From: Yao, Jiewen=20 > Sent: Wednesday, March 6, 2019 11:25 PM > To: Gao, Liming ; afish@apple.com > Cc: Kinney, Michael D ; edk2-devel-01 = > Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove = inline X86 assembly code > =20 > Thanks. > =20 > I also recommend to take a look at = MdePkg\Library\BaseSynchronizationLib. > =20 > That is also complicated and not so readable for me. > =20 > And we have 8 patches to *fix* the real problem in 2018. > =20 > Thank you > Yao Jiewen > =20 > =20 > <>From: Gao, Liming=20 > Sent: Wednesday, March 6, 2019 11:15 PM > To: afish@apple.com ; Yao, Jiewen = > > Cc: Kinney, Michael D >; edk2-devel-01 = > > Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove = inline X86 assembly code > =20 > Thanks for your clarification. Now, we will focus on = SetJump/LongJump() first. > =20 > Thanks > Liming > From: afish@apple.com [mailto:afish@apple.com = ]=20 > Sent: Wednesday, March 6, 2019 10:45 PM > To: Yao, Jiewen > > Cc: Kinney, Michael D >; Gao, Liming >; edk2-devel-01 > > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove = inline X86 assembly code > =20 > =20 > =20 >=20 > On Mar 6, 2019, at 10:09 PM, Yao, Jiewen > wrote: > =20 > Thanks Andrew. > Now I 100% agree with you - I think it is fine to restrict new C = inline assembler, or at least have a very high bar to add anything new. = Any new inline assembler *should also be simple and just be a function = abstracting a CPU op-code* that is not available to C. This is how we = prevent the maintenance issues you are worrying about. > =20 > And I also agree with your case. > =20 > Let=E2=80=99s discuss another case. Below 2 functions SetJump/LongJump = are updated recently, by me unfortunately. > =20 > It is unclear to me what is the size optimization we can get by inline = SetJump/LongJump. > But obviously, it brings the maintenance issue to me. > And I don=E2=80=99t believe it meets the criteria you mentioned above. > =20 > =20 > Jiewen, > =20 > I agree it seems like given the rules we agree on this code should be = written in NASM.=20 > =20 > Thanks, > =20 > Andrew Fish > =20 >=20 > _declspec (naked) > RETURNS_TWICE > UINTN > EFIAPI > SetJump ( > OUT BASE_LIBRARY_JUMP_BUFFER *JumpBuffer > ) > { > _asm { > push [esp + 4] > call InternalAssertJumpBuffer > pop ecx > pop ecx > mov edx, [esp] > =20 > xor eax, eax > mov [edx + 24], eax ; save 0 to SSP > =20 > mov eax, [PcdGet32 (PcdControlFlowEnforcementPropertyMask)] > test eax, eax > jz CetDone > _emit 0x0F > _emit 0x20 > _emit 0xE0 ; mov eax, cr4 > bt eax, 23 ; check if CET is enabled > jnc CetDone > =20 > mov eax, 1 > _emit 0xF3 > _emit 0x0F > _emit 0xAE > _emit 0xE8 ; INCSSP EAX to read original SSP > _emit 0xF3 > _emit 0x0F > _emit 0x1E > _emit 0xC8 ; READSSP EAX > mov [edx + 0x24], eax ; save SSP > =20 > CetDone: > =20 > mov [edx], ebx > mov [edx + 4], esi > mov [edx + 8], edi > mov [edx + 12], ebp > mov [edx + 16], esp > mov [edx + 20], ecx > xor eax, eax > jmp ecx > } > } > =20 > __declspec (naked) > VOID > EFIAPI > InternalLongJump ( > IN BASE_LIBRARY_JUMP_BUFFER *JumpBuffer, > IN UINTN Value > ) > { > _asm { > mov eax, [PcdGet32 (PcdControlFlowEnforcementPropertyMask)] > test eax, eax > jz CetDone > _emit 0x0F > _emit 0x20 > _emit 0xE0 ; mov eax, cr4 > bt eax, 23 ; check if CET is enabled > jnc CetDone > =20 > mov edx, [esp + 4] ; edx =3D JumpBuffer > mov edx, [edx + 24] ; edx =3D target SSP > _emit 0xF3 > _emit 0x0F > _emit 0x1E > _emit 0xC8 ; READSSP EAX > sub edx, eax ; edx =3D delta > mov eax, edx ; eax =3D delta > =20 > shr eax, 2 ; eax =3D delta/sizeof(UINT32) > _emit 0xF3 > _emit 0x0F > _emit 0xAE > _emit 0xE8 ; INCSSP EAX > =20 > CetDone: > =20 > pop eax ; skip return address > pop edx ; edx <- JumpBuffer > pop eax ; eax <- Value > mov ebx, [edx] > mov esi, [edx + 4] > mov edi, [edx + 8] > mov ebp, [edx + 12] > mov esp, [edx + 16] > jmp dword ptr [edx + 20] > } > } > =20 > =20 > =20 > From: afish@apple.com [mailto:afish@apple.com = ]=20 > Sent: Wednesday, March 6, 2019 9:56 PM > To: Yao, Jiewen > > Cc: Kinney, Michael D >; Gao, Liming >; edk2-devel-01 > > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove = inline X86 assembly code > =20 > =20 > =20 >=20 > On Mar 6, 2019, at 9:06 PM, Yao, Jiewen > wrote: > =20 > HI Mike and Andrew > The problem is maintainability. >=20 > If we have multiple copy of implementation, a developer need verify = multiple copy of implementation, if we make update. Before that, a = developer has to be aware that there is multiple copy of implementation. = - That increases the complexity. >=20 > If we have everything, there MAY be 5 copy - ASM, NASM, S, GCC inline, = MS inline, theoretically. > Now, we remove ASM. It is good first step. > But we may still have 4 copies. I suggest we consider do more. >=20 > =20 > Jiewen, > =20 > I think you are trying do the right thing, but are optimize the wrong = thing.=20 > =20 > Most of the GCC/Clang inline assembler code is in Gccinline.c and = since that code is mostly just abstracting an x86 instruction and the = functions are very simple and thus it is NOT code that needs ongoing = maintenance.=20 > =20 > Lets look at the history: > = https://github.com/tianocore/edk2/commits/master/MdePkg/Library/BaseLib/X6= 4/GccInline.c = > The last logic fix was Jun 1, 2010 > =20 > = https://github.com/tianocore/edk2/commits/master/MdePkg/Library/BaseLib/Ia= 32/GccInline.c = > Ok so Mike had to make a fix in this file in 2015 to make something = optional, due to an embedded CPU defeating an instruction. But prior to = that it was 2010.=20 > =20 > The set of things that are C inline assembler we have should be static = and not require much maintenance. More complex code should be written in = C and use the C inline assembler functions we already have. If any more = complex assembly code is required we should just write it in NASM. So I = think it is fine to restrict new C inline assembler, or at least have a = very high bar to add anything new. Any new inline assembler should also = be simple and just be a function abstracting a CPU op-code that is not = available to C. This is how we prevent the maintenance issues you are = worrying about.=20 > =20 > I gave an example in this mail thread on how a Breakpoint goes from = being 1 byte to 11 bytes if you remove the C inline assembler. For clang = with LTO enabled a CpuBreakpoint will always get inlined into the = callers code and it will only take the one byte for int 3 instruction. = If that code moves to NASM then it get replaces with a 5 byte call = instruction and an actual C ABI function for the breakpoint.=20 > =20 > VOID > EFIAPI > CpuBreakpoint ( > VOID > ) > { > __asm__ __volatile__ ("int $3"); > } > =20 >=20 > Today with clang LTO turned on calling CpuBreakpoint() looks like this = from the callers point of view.=20 > =20 > a.out[0x1fa5] <+6>: cc int3 =20 > =20 >=20 > But if we move that to NASM: > =20 >=20 > a.out[0x1fa6] <+7>: e8 07 00 00 00 calll 0x1fb2 = ; CpuBreakpoint > =20 >=20 > plus: > a.out`CpuBreakpoint: > a.out[0x1f99] <+0>: 55 pushl %ebp > a.out[0x1f9a] <+1>: 89 e5 movl %esp, %ebp > a.out[0x1f9c] <+3>: cc int3 =20 > a.out[0x1f9d] <+4>: 5d popl %ebp > a.out[0x1f9e] <+5>: c3 retl =20 > =20 >=20 > For any compiler that emits the frame pointer if you move the INT 3 to = assembly you need the frame pointer or the Exception Lib is not going to = be able to print out the stack backtrace of the code when you hit a = breakpoint. So this is how I get to 11 vs. 1 bytes.=20 > =20 >=20 > Thanks, > =20 > Andrew Fish > =20 > PS for clang LTO the compiler compiles to bitcode and that is an = abstracted assembly language. At link time all that code gets optimized = to together and then passed through the CPU specific code gen. For C = inline assembler the assembly instructions end up in the bitcode with = lots of information about the constraints. That is why these GccInline.c = functions almost always get inlined with clang and LTO.=20 > =20 > The CpuBreakpoint() function looks like this in bitcode, but the = function call gets optimized away by LTO in the code gen.=20 > =20 > ; Function Attrs: noinline nounwind optnone ssp uwtable > define void @CpuBreakpoint() #0 { > call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() = #2, !srcloc !3 > ret void > } > =20 >=20 > Even something more complex like AsmReadMsr64() can get inlined, but = it contains a lot more info about the constraints: > =20 > ; Function Attrs: noinline nounwind optnone ssp uwtable > define i64 @AsmReadMsr64(i32) #0 { > %2 =3D alloca i32, align 4 > %3 =3D alloca i64, align 8 > store i32 %0, i32* %2, align 4 > %4 =3D load i32, i32* %2, align 4 > %5 =3D call i64 asm sideeffect "rdmsr", = "=3DA,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4 > store i64 %5, i64* %3, align 8 > %6 =3D load i64, i64* %3, align 8 > ret i64 %6 > } > =20 >=20 > The alloca, store, load, etc are the same bitcode instructions you = would see with C code.=20 > =20 >=20 > A recently case that SetJump/LongJump, I updated the NASM file for = both IA32 and X64. >=20 > Later, to my surprise, only X64 version NASM works, and IA32 version = NASM does not work. > Then I notice that there is a different copy if IA32 Jump.c MS inline = - I also need update. That is frustrated. >=20 > I think there should be a balance between optimization and code = readability/maintainability. >=20 > Do we have data on what size benefit we can get with these inline = function, from whole image level? > We can do evaluation at function level, case by case. > If we see a huge size benefit, I agree to keep this function. > If we see no size benefit, I suggest we do the cleanup for this = function. >=20 >=20 > Thank you > Yao Jiewen >=20 >=20 >=20 >=20 > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org = ] On Behalf Of > Andrew Fish via edk2-devel > Sent: Wednesday, March 6, 2019 5:31 PM > To: Kinney, Michael D > > Cc: edk2-devel-01 >; Gao, Liming > > > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove > inline X86 assembly code >=20 >=20 >=20 >=20 > On Mar 6, 2019, at 4:41 PM, Kinney, Michael D > > = wrote: >=20 >=20 > Liming, >=20 > That does not work either because inline assembly uses compiler = specific > syntax. >=20 >=20 > Please update with the specific list of functions that you think the = inline > should be removed to improve maintenance with no impacts in size/perf. >=20 > =20 >=20 > Mike, >=20 > It is easy to find the gcc/clang inline assembler, just `git grep = "__asm__ > __volatile__"` >=20 > The main files are: > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLib = > Intrinsic/IoLibGcc.c > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X = > 64/GccInline.c > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/I = > a32/GccInline.c > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync = > hronizationLib/Ia32/GccInline.c > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync = > hronizationLib/X64/GccInline.c >=20 > This is really just compiler optimizer control. > Library/BaseSynchronizationLib/SynchronizationGcc.c:21:#define > _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } = while(0) >=20 > Looks like this is working around the alignment ASSERT in = BaseIoLibIntrinsic. > OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:43: __asm__ __volatile__ > ( "outl %0, %1" : : "a" (Value), "d" ((UINT16)Port) ); > OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:67: __asm__ __volatile__ > ( "inl %1, %0" : "=3Da" (Data) : "d" ((UINT16)Port) ); >=20 > It seems like we have a reasonable set and should not use the inline > assembly for any more complex code. >=20 > Thanks, >=20 > Andrew Fish >=20 >=20 > The issue you pointed to was around SetJump()/LongJump(). Can we = limit > this BZ to only those 2 functions? >=20 >=20 > Mike > <> > From: Gao, Liming > Sent: Wednesday, March 6, 2019 4:28 PM > To: afish@apple.com > Cc: Zhang, Shenglei >; edk2-devel-01 > >; Kinney, = Michael D > > >=20 > Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove > inline X86 assembly code >=20 >=20 > Andrew: > I want to keep only one implementation. If inline assembly c source = is > preferred, I suggest to remove its nasm implementation. >=20 >=20 > Thanks > Liming > <>From: afish@apple.com = > > [mailto:afish@apple.com = ] >=20 > Sent: Tuesday, March 5, 2019 2:44 PM > To: Gao, Liming >> > Cc: Zhang, Shenglei > >>; = edk2-devel-01 > >>; = Kinney, > Michael D = >> >=20 > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove > inline X86 assembly code >=20 >=20 >=20 >=20 >=20 > On Mar 5, 2019, at 1:32 PM, Gao, Liming > >> wrote: >=20 >=20 > Andrew: > BZ 1163 is to remove inline X86 assembly code in C source file. But, = this > patch is wrong. I have gave my comments to update this patch. >=20 >=20 >=20 > Why do we want to remove inline X86 assembly. As I mentioned it will = lead > to larger binaries, slower binaries, and less optimized code. >=20 >=20 > For example take this simple inline assembly function: >=20 > VOID > EFIAPI > CpuBreakpoint ( > VOID > ) > { > __asm__ __volatile__ ("int $3"); > } >=20 >=20 > Today with clang LTO turned on calling CpuBreakpoint() looks like this = from > the callers point of view. >=20 >=20 > a.out[0x1fa5] <+6>: cc int3 >=20 >=20 > But if we move that to NASM: >=20 >=20 > a.out[0x1fa6] <+7>: e8 07 00 00 00 calll > 0x1fb2 ; CpuBreakpoint >=20 >=20 >=20 > plus: > a.out`CpuBreakpoint: > a.out[0x1fb2] <+0>: cc int3 > a.out[0x1fb3] <+1>: c3 retl >=20 >=20 > And there is also an extra push and pop on the stack. The other issue = is the > call to the function that is unknown to the compiler will act like a > _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++ > 2017). Is the depreciation of some of these intrinsics in VC++ driving = the > removal of inline assembly? For GCC inline assembly works great for = local > compile, and for clang LTO it works in entire program optimization. >=20 >=20 > The LTO bitcode includes inline assembly and constraints so that the > optimizer knows what to do so it can get optimized just like the = abstract > bitcode during the Link Time Optimization. >=20 >=20 > This is CpuBreakpoint(): > ; Function Attrs: noinline nounwind optnone ssp uwtable > define void @CpuBreakpoint() #0 { > call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() > #2, !srcloc !3 >=20 > ret void > } >=20 >=20 > This is AsmReadMsr64(): > ; Function Attrs: noinline nounwind optnone ssp uwtable > define i64 @AsmReadMsr64(i32) #0 { > %2 =3D alloca i32, align 4 > %3 =3D alloca i64, align 8 > store i32 %0, i32* %2, align 4 > %4 =3D load i32, i32* %2, align 4 > %5 =3D call i64 asm sideeffect "rdmsr", > "=3DA,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4 >=20 > store i64 %5, i64* %3, align 8 > %6 =3D load i64, i64* %3, align 8 > ret i64 %6 > } >=20 >=20 >=20 > I agree it make sense to remove .S and .asm files and only have .nasm = files. >=20 > Thanks, >=20 > Andrew Fish >=20 > PS For the Xcode clang since it emits frame pointers by default we = need to > add the extra 4 bytes to save the assembly functions so the stack can = get > unwound. >=20 >=20 > a.out`CpuBreakpoint: > a.out[0x1f99] <+0>: 55 pushl %ebp > a.out[0x1f9a] <+1>: 89 e5 movl %esp, %ebp > a.out[0x1f9c] <+3>: cc int3 > a.out[0x1f9d] <+4>: 5d popl %ebp > a.out[0x1f9e] <+5>: c3 retl >=20 >=20 > So breakpoint goes from 1 byte to 11 bytes if we get rid of the = intrinsics. >=20 >=20 > The change is to reduce the duplicated implementation. It will be good > on the code maintain. Recently, one patch has to update .c and .nasm = both. > Please see > https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html = > >. > Beside this change, I propose to remove all GAS assembly code for IA32 = and > X64 arch. After those change, the patch owner will collect the impact = of the > image size. >=20 >=20 > Thanks > Liming >=20 > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel =