From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <afish@apple.com>
Received-SPF: Pass (sender SPF authorized) identity=mailfrom;
 client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com;
 envelope-from=afish@apple.com; receiver=edk2-devel@lists.01.org 
Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com
 [17.151.62.67])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 19752211D230E
 for <edk2-devel@lists.01.org>; Fri,  8 Mar 2019 16:15:14 -0800 (PST)
Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1])
 by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id
 x2906kVP044090; Fri, 8 Mar 2019 16:15:13 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com;
 h=mime-version :
 content-type : sender : from : message-id : subject : date : in-reply-to :
 cc : to : references; s=20180706;
 bh=QqJlCW6au+gpbSioc20ympwDFSvM+Rn1uQY5FshguPc=;
 b=so3B3NuXEzQKmVEKwonguAgijCdrqeDnXyItvlSs2pCVrGU/1yLqtI/i6KuJXVFAneP8
 +hoFNHf4dHR5k0x2zE5GXT1lUjxpQM0+rtvJscPCIJk6IVpN+dfdheqoqR4GT0J+4Nxy
 t5Q8rW1aD1mQhTFFcKRqmyYevkmF2NMi+Y0liXaZxgw/BlD9YL5P0QioyHaNAohhnj4z
 tTR7qV756z1sT6nC4LFuyXrLoexfCyJ4yjz1GkFMhFSf2G6Ky+SbfC2rx9h/HDeZGBE5
 ksW6InDA5++KzMoZgRGovzxb8d9g4t0WXkIvps2+RY9qK+b14bMGr8c8sZIx/M7OfjR/ eA== 
Received: from mr2-mtap-s03.rno.apple.com (mr2-mtap-s03.rno.apple.com
 [17.179.226.135])
 by nwk-aaemail-lapp02.apple.com with ESMTP id 2qyq9vu0mg-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO);
 Fri, 08 Mar 2019 16:15:13 -0800
MIME-version: 1.0
Received: from ma1-mmpp-sz10.apple.com
 (ma1-mmpp-sz10.apple.com [17.171.128.150]) by mr2-mtap-s03.rno.apple.com
 (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24
 2018)) with ESMTPS id <0PO2003YTOPC55K0@mr2-mtap-s03.rno.apple.com>; Fri,
 08 Mar 2019 16:15:13 -0800 (PST)
Received: from process_milters-daemon.ma1-mmpp-sz10.apple.com by
 ma1-mmpp-sz10.apple.com
 (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24
 2018)) id <0PO200C00NRYW500@ma1-mmpp-sz10.apple.com>; Fri,
 08 Mar 2019 16:15:12 -0800 (PST)
X-Va-A: 
X-Va-T-CD: da3a4df698400084da27c6ab403bcb35
X-Va-E-CD: 7d4c618f85f76966dfbaabb394b9e4f0
X-Va-R-CD: a66fd94a21bd5e42e01791934a036d45
X-Va-CD: 0
X-Va-ID: 2934fd93-fc90-4908-854e-13314c8a4443
X-V-A: 
X-V-T-CD: da3a4df698400084da27c6ab403bcb35
X-V-E-CD: 7d4c618f85f76966dfbaabb394b9e4f0
X-V-R-CD: a66fd94a21bd5e42e01791934a036d45
X-V-CD: 0
X-V-ID: 64f6d4ef-104c-43ed-8350-880d1f3bda78
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,,
 definitions=2019-03-08_21:,, signatures=0
Received: from [17.234.255.27] (unknown [17.234.255.27])
 by ma1-mmpp-sz10.apple.com
 (Oracle Communications Messaging Server 8.0.2.3.20181024 64bit (built Oct 24
 2018)) with ESMTPSA id <0PO200EF9OP9S030@ma1-mmpp-sz10.apple.com>; Fri,
 08 Mar 2019 16:15:12 -0800 (PST)
Sender: afish@apple.com
From: Andrew Fish <afish@apple.com>
Message-id: <AA798C5A-514B-4DCD-BFCE-4EFD86BD85E8@apple.com>
Date: Fri, 08 Mar 2019 16:15:06 -0800
In-reply-to: <E92EE9817A31E24EB0585FDF735412F5B8BBA7FE@ORSMSX113.amr.corp.intel.com>
Cc: "Yao, Jiewen" <jiewen.yao@intel.com>, "Gao, Liming" <liming.gao@intel.com>,
 edk2-devel-01 <edk2-devel@lists.01.org>
To: Mike Kinney <michael.d.kinney@intel.com>
References: <20190305014059.17988-1-shenglei.zhang@intel.com>
 <20190305014059.17988-4-shenglei.zhang@intel.com>
 <C7542524-066B-4DC6-A2D8-B02EF3338042@apple.com>
 <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FBBC2@SHSMSX104.ccr.corp.intel.com>
 <D5E70882-6E2F-4230-8E3D-63D26B75F2DB@apple.com>
 <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FCC47@SHSMSX104.ccr.corp.intel.com>
 <E92EE9817A31E24EB0585FDF735412F5B8BB86B7@ORSMSX113.amr.corp.intel.com>
 <47E8FE4A-4BDC-48A3-B7E6-02F1682C0990@apple.com>
 <74D8A39837DF1E4DA445A8C0B3885C503F556F4E@shsmsx102.ccr.corp.intel.com>
 <A7AAA33A-AC3E-456D-AF88-B7F5CAD037BC@apple.com>
 <74D8A39837DF1E4DA445A8C0B3885C503F55722E@shsmsx102.ccr.corp.intel.com>
 <0F79F0FC-123A-4862-926D-DFE08CF5740F@apple.com>
 <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FCFF0@SHSMSX104.ccr.corp.intel.com>
 <74D8A39837DF1E4DA445A8C0B3885C503F55769F@shsmsx102.ccr.corp.intel.com>
 <E92EE9817A31E24EB0585FDF735412F5B8BBA7FE@ORSMSX113.amr.corp.intel.com>
X-Mailer: Apple Mail (2.3445.6.18)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, ,
 definitions=2019-03-08_21:, , signatures=0
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 00:15:15 -0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


> On Mar 8, 2019, at 4:07 PM, Kinney, Michael D =
<michael.d.kinney@intel.com> wrote:
>=20
> Jiewen,
> =20
> There are not many of those functions and they can be on speed paths.
> =20
> I do not recommend converting them to only NASM.
> =20
> I do recommend we add comments to NASM files and C files with include =
assembly that state that updates to one require updates to the others.
> =20

Mike,

It looks like NASM files only exist to support the INTEL compiler. Could =
that be an outdated assumption? Overtime has the INTEL compiler added =
more compatibility with GCC inline assembly, or VC++ intrinsics? I =
assume people are still using the INTEL compiler?=20

Thanks,

Andrew Fish


> Mike
> =C2=A0 <>
> From: Yao, Jiewen=20
> Sent: Wednesday, March 6, 2019 11:25 PM
> To: Gao, Liming <liming.gao@intel.com>; afish@apple.com
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; edk2-devel-01 =
<edk2-devel@lists.01.org>
> Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove =
inline X86 assembly code
> =20
> Thanks.
> =20
> I also recommend to take a look at =
MdePkg\Library\BaseSynchronizationLib.
> =20
> That is also complicated and not so readable for me.
> =20
> And we have 8 patches to *fix* the real problem in 2018.
> =20
> Thank you
> Yao Jiewen
> =20
> =20
>  <>From: Gao, Liming=20
> Sent: Wednesday, March 6, 2019 11:15 PM
> To: afish@apple.com <mailto:afish@apple.com>; Yao, Jiewen =
<jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>
> Cc: Kinney, Michael D <michael.d.kinney@intel.com =
<mailto:michael.d.kinney@intel.com>>; edk2-devel-01 =
<edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>>
> Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove =
inline X86 assembly code
> =20
> Thanks for your clarification. Now, we will focus on =
SetJump/LongJump() first.
> =20
> Thanks
> Liming
> From: afish@apple.com <mailto:afish@apple.com> [mailto:afish@apple.com =
<mailto:afish@apple.com>]=20
> Sent: Wednesday, March 6, 2019 10:45 PM
> To: Yao, Jiewen <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>
> Cc: Kinney, Michael D <michael.d.kinney@intel.com =
<mailto:michael.d.kinney@intel.com>>; Gao, Liming <liming.gao@intel.com =
<mailto:liming.gao@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org =
<mailto:edk2-devel@lists.01.org>>
> Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove =
inline X86 assembly code
> =20
> =20
> =20
>=20
> On Mar 6, 2019, at 10:09 PM, Yao, Jiewen <jiewen.yao@intel.com =
<mailto:jiewen.yao@intel.com>> wrote:
> =20
> Thanks Andrew.
> Now I 100% agree with you - I think it is fine to restrict new C =
inline assembler, or at least have a very high bar to add anything new. =
Any new inline assembler *should also be simple and just be a function =
abstracting a CPU op-code* that is not available to C. This is how we =
prevent the maintenance issues you are worrying about.
> =20
> And I also agree with your case.
> =20
> Let=E2=80=99s discuss another case. Below 2 functions SetJump/LongJump =
are updated recently, by me unfortunately.
> =20
> It is unclear to me what is the size optimization we can get by inline =
SetJump/LongJump.
> But obviously, it brings the maintenance issue to me.
> And I don=E2=80=99t believe it meets the criteria you mentioned above.
> =20
> =20
> Jiewen,
> =20
> I agree it seems like given the rules we agree on this code should be =
written in NASM.=20
> =20
> Thanks,
> =20
> Andrew Fish
> =20
>=20
> _declspec (naked)
> RETURNS_TWICE
> UINTN
> EFIAPI
> SetJump (
>   OUT     BASE_LIBRARY_JUMP_BUFFER  *JumpBuffer
>   )
> {
>   _asm {
>     push    [esp + 4]
>     call    InternalAssertJumpBuffer
>     pop     ecx
>     pop     ecx
>     mov     edx, [esp]
> =20
>     xor     eax, eax
>     mov     [edx + 24], eax        ; save 0 to SSP
> =20
>     mov     eax, [PcdGet32 (PcdControlFlowEnforcementPropertyMask)]
>     test    eax, eax
>     jz      CetDone
>     _emit      0x0F
>     _emit      0x20
>     _emit      0xE0                ; mov     eax, cr4
>     bt      eax, 23                ; check if CET is enabled
>     jnc     CetDone
> =20
>     mov     eax, 1
>     _emit      0xF3
>     _emit      0x0F
>     _emit      0xAE
>     _emit      0xE8                ; INCSSP EAX to read original SSP
>     _emit      0xF3
>     _emit      0x0F
>     _emit      0x1E
>     _emit      0xC8                ; READSSP EAX
>     mov     [edx + 0x24], eax      ; save SSP
> =20
> CetDone:
> =20
>     mov     [edx], ebx
>     mov     [edx + 4], esi
>     mov     [edx + 8], edi
>     mov     [edx + 12], ebp
>     mov     [edx + 16], esp
>     mov     [edx + 20], ecx
>     xor     eax, eax
>     jmp     ecx
>   }
> }
> =20
> __declspec (naked)
> VOID
> EFIAPI
> InternalLongJump (
>   IN      BASE_LIBRARY_JUMP_BUFFER  *JumpBuffer,
>   IN      UINTN                     Value
>   )
> {
>   _asm {
>     mov     eax, [PcdGet32 (PcdControlFlowEnforcementPropertyMask)]
>     test    eax, eax
>     jz      CetDone
>     _emit      0x0F
>     _emit      0x20
>     _emit      0xE0                ; mov     eax, cr4
>     bt      eax, 23                ; check if CET is enabled
>     jnc     CetDone
> =20
>     mov     edx, [esp + 4]         ; edx =3D JumpBuffer
>     mov     edx, [edx + 24]        ; edx =3D target SSP
>     _emit      0xF3
>     _emit      0x0F
>     _emit      0x1E
>     _emit      0xC8                ; READSSP EAX
>     sub     edx, eax               ; edx =3D delta
>     mov     eax, edx               ; eax =3D delta
> =20
>     shr     eax, 2                 ; eax =3D delta/sizeof(UINT32)
>     _emit      0xF3
>     _emit      0x0F
>     _emit      0xAE
>     _emit      0xE8                ; INCSSP EAX
> =20
> CetDone:
> =20
>     pop     eax                         ; skip return address
>     pop     edx                         ; edx <- JumpBuffer
>     pop     eax                         ; eax <- Value
>     mov     ebx, [edx]
>     mov     esi, [edx + 4]
>     mov     edi, [edx + 8]
>     mov     ebp, [edx + 12]
>     mov     esp, [edx + 16]
>     jmp     dword ptr [edx + 20]
>   }
> }
> =20
> =20
> =20
> From: afish@apple.com <mailto:afish@apple.com> [mailto:afish@apple.com =
<mailto:afish@apple.com>]=20
> Sent: Wednesday, March 6, 2019 9:56 PM
> To: Yao, Jiewen <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>
> Cc: Kinney, Michael D <michael.d.kinney@intel.com =
<mailto:michael.d.kinney@intel.com>>; Gao, Liming <liming.gao@intel.com =
<mailto:liming.gao@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org =
<mailto:edk2-devel@lists.01.org>>
> Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove =
inline X86 assembly code
> =20
> =20
> =20
>=20
> On Mar 6, 2019, at 9:06 PM, Yao, Jiewen <jiewen.yao@intel.com =
<mailto:jiewen.yao@intel.com>> wrote:
> =20
> HI Mike and Andrew
> The problem is maintainability.
>=20
> If we have multiple copy of implementation, a developer need verify =
multiple copy of implementation, if we make update. Before that, a =
developer has to be aware that there is multiple copy of implementation. =
- That increases the complexity.
>=20
> If we have everything, there MAY be 5 copy - ASM, NASM, S, GCC inline, =
MS inline, theoretically.
> Now, we remove ASM. It is good first step.
> But we may still have 4 copies. I suggest we consider do more.
>=20
> =20
> Jiewen,
> =20
> I think you are trying do the right thing, but are optimize the wrong =
thing.=20
> =20
> Most of the GCC/Clang inline assembler code is in Gccinline.c and =
since that code is mostly just abstracting an x86 instruction and the =
functions are very simple and thus it is NOT code that needs ongoing =
maintenance.=20
> =20
> Lets look at the history:
> =
https://github.com/tianocore/edk2/commits/master/MdePkg/Library/BaseLib/X6=
4/GccInline.c =
<https://github.com/tianocore/edk2/commits/master/MdePkg/Library/BaseLib/X=
64/GccInline.c>
> The last logic fix was Jun 1, 2010
> =20
> =
https://github.com/tianocore/edk2/commits/master/MdePkg/Library/BaseLib/Ia=
32/GccInline.c =
<https://github.com/tianocore/edk2/commits/master/MdePkg/Library/BaseLib/I=
a32/GccInline.c>
> Ok so Mike had to make a fix in this file in 2015 to make something =
optional, due to an embedded CPU defeating an instruction. But prior to =
that it was 2010.=20
> =20
> The set of things that are C inline assembler we have should be static =
and not require much maintenance. More complex code should be written in =
C and use the C inline assembler functions we already have. If any more =
complex assembly code is required we should just write it in NASM. So I =
think it is fine to restrict new C inline assembler, or at least have a =
very high bar to add anything new. Any new inline assembler should also =
be simple and just be a function abstracting a CPU op-code that is not =
available to C. This is how we prevent the maintenance issues you are =
worrying about.=20
> =20
> I gave an example in this  mail thread on how a Breakpoint goes from =
being 1 byte to 11 bytes if you remove the C inline assembler. For clang =
with LTO enabled a CpuBreakpoint will always get inlined into the =
callers code and it will only take the one byte for int 3 instruction. =
If that code moves to NASM then it get replaces with a 5 byte call =
instruction and an actual C ABI function for the breakpoint.=20
> =20
> VOID
> EFIAPI
> CpuBreakpoint (
>   VOID
>   )
> {
>   __asm__ __volatile__ ("int $3");
> }
> =20
>=20
> Today with clang LTO turned on calling CpuBreakpoint() looks like this =
from the callers point of view.=20
> =20
> a.out[0x1fa5] <+6>:  cc              int3  =20
> =20
>=20
> But if we move that to NASM:
> =20
>=20
> a.out[0x1fa6] <+7>:  e8 07 00 00 00  calll  0x1fb2                    =
; CpuBreakpoint
> =20
>=20
> plus:
> a.out`CpuBreakpoint:
> a.out[0x1f99] <+0>: 55     pushl  %ebp
> a.out[0x1f9a] <+1>: 89 e5  movl   %esp, %ebp
> a.out[0x1f9c] <+3>: cc     int3  =20
> a.out[0x1f9d] <+4>: 5d     popl   %ebp
> a.out[0x1f9e] <+5>: c3     retl  =20
> =20
>=20
> For any compiler that emits the frame pointer if you move the INT 3 to =
assembly you need the frame pointer or the Exception Lib is not going to =
be able to print out the stack backtrace of the code when you hit a =
breakpoint. So this is how I get to 11 vs. 1 bytes.=20
> =20
>=20
> Thanks,
> =20
> Andrew Fish
> =20
> PS for clang LTO the compiler compiles to bitcode and that is an =
abstracted assembly language. At link time all that code gets optimized =
to together and then passed through the CPU specific code gen. For C =
inline assembler the assembly instructions end up in the bitcode with =
lots of information about the constraints. That is why these GccInline.c =
functions almost always get inlined with clang and LTO.=20
> =20
> The CpuBreakpoint() function looks like this in bitcode, but the =
function call gets optimized away by LTO in the code gen.=20
> =20
> ; Function Attrs: noinline nounwind optnone ssp uwtable
> define void @CpuBreakpoint() #0 {
>   call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() =
#2, !srcloc !3
>   ret void
> }
> =20
>=20
> Even something more complex like AsmReadMsr64() can get inlined, but =
it contains a lot more info about the constraints:
> =20
> ; Function Attrs: noinline nounwind optnone ssp uwtable
> define i64 @AsmReadMsr64(i32) #0 {
>   %2 =3D alloca i32, align 4
>   %3 =3D alloca i64, align 8
>   store i32 %0, i32* %2, align 4
>   %4 =3D load i32, i32* %2, align 4
>   %5 =3D call i64 asm sideeffect "rdmsr", =
"=3DA,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4
>   store i64 %5, i64* %3, align 8
>   %6 =3D load i64, i64* %3, align 8
>   ret i64 %6
> }
> =20
>=20
> The alloca, store, load, etc are the same bitcode instructions you =
would see with C code.=20
> =20
>=20
> A recently case that SetJump/LongJump, I updated the NASM file for =
both IA32 and X64.
>=20
> Later, to my surprise, only X64 version NASM works, and IA32 version =
NASM does not work.
> Then I notice that there is a different copy if IA32 Jump.c MS inline =
- I also need update. That is frustrated.
>=20
> I think there should be a balance between optimization and code =
readability/maintainability.
>=20
> Do we have data on what size benefit we can get with these inline =
function, from whole image level?
> We can do evaluation at function level, case by case.
> If we see a huge size benefit, I agree to keep this function.
> If we see no size benefit, I suggest we do the cleanup for this =
function.
>=20
>=20
> Thank you
> Yao Jiewen
>=20
>=20
>=20
>=20
> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org =
<mailto:edk2-devel-bounces@lists.01.org>] On Behalf Of
> Andrew Fish via edk2-devel
> Sent: Wednesday, March 6, 2019 5:31 PM
> To: Kinney, Michael D <michael.d.kinney@intel.com =
<mailto:michael.d.kinney@intel.com>>
> Cc: edk2-devel-01 <edk2-devel@lists.01.org =
<mailto:edk2-devel@lists.01.org>>; Gao, Liming
> <liming.gao@intel.com <mailto:liming.gao@intel.com>>
> Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
>=20
>=20
>=20
>=20
> On Mar 6, 2019, at 4:41 PM, Kinney, Michael D
> <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>> =
wrote:
>=20
>=20
> Liming,
>=20
> That does not work either because inline assembly uses compiler =
specific
> syntax.
>=20
>=20
> Please update with the specific list of functions that you think the =
inline
> should be removed to improve maintenance with no impacts in size/perf.
>=20
> =20
>=20
> Mike,
>=20
> It is easy to find the gcc/clang inline assembler, just `git grep =
"__asm__
> __volatile__"`
>=20
> The main files are:
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLib =
<https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLib>
> Intrinsic/IoLibGcc.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X =
<https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X>
> 64/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/I =
<https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/I>
> a32/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync =
<https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync>
> hronizationLib/Ia32/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync =
<https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync>
> hronizationLib/X64/GccInline.c
>=20
> This is really just compiler optimizer control.
> Library/BaseSynchronizationLib/SynchronizationGcc.c:21:#define
> _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } =
while(0)
>=20
> Looks like this is working around the alignment ASSERT in =
BaseIoLibIntrinsic.
> OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:43:  __asm__ __volatile__
> ( "outl %0, %1" : : "a" (Value), "d" ((UINT16)Port) );
> OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:67:  __asm__ __volatile__
> ( "inl %1, %0" : "=3Da" (Data) : "d" ((UINT16)Port) );
>=20
> It seems like we have a reasonable set and should not use the inline
> assembly for any more complex code.
>=20
> Thanks,
>=20
> Andrew Fish
>=20
>=20
> The issue you pointed to was around SetJump()/LongJump().  Can we =
limit
> this BZ to only those 2 functions?
>=20
>=20
> Mike
>   <>
> From: Gao, Liming
> Sent: Wednesday, March 6, 2019 4:28 PM
> To: afish@apple.com <mailto:afish@apple.com>
> Cc: Zhang, Shenglei <shenglei.zhang@intel.com =
<mailto:shenglei.zhang@intel.com>>; edk2-devel-01
> <edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>>; Kinney, =
Michael D
> <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>>
>=20
> Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
>=20
>=20
> Andrew:
>  I want to keep only one implementation. If inline assembly c source =
is
> preferred, I suggest to remove its nasm implementation.
>=20
>=20
> Thanks
> Liming
> <>From: afish@apple.com <mailto:afish@apple.com> =
<mailto:afish@apple.com <mailto:afish@apple.com>>
> [mailto:afish@apple.com <mailto:afish@apple.com> =
<mailto:afish@apple.com%20%3cmailto:afish@apple.com%3e>]
>=20
> Sent: Tuesday, March 5, 2019 2:44 PM
> To: Gao, Liming <liming.gao@intel.com <mailto:liming.gao@intel.com =
<mailto:liming.gao@intel.com%20%3cmailto:liming.gao@intel.com>>>
> Cc: Zhang, Shenglei <shenglei.zhang@intel.com =
<mailto:shenglei.zhang@intel.com>
> <mailto:shenglei.zhang@intel.com <mailto:shenglei.zhang@intel.com>>>; =
edk2-devel-01
> <edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org =
<mailto:edk2-devel@lists.01.org%20%3cmailto:edk2-devel@lists.01.org>>>; =
Kinney,
> Michael D <michael.d.kinney@intel.com
> <mailto:michael.d.kinney@intel.com =
<mailto:michael.d.kinney@intel.com%0b%3cmailto:michael.d.kinney@intel.com>=
>>
>=20
> Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
>=20
>=20
>=20
>=20
>=20
> On Mar 5, 2019, at 1:32 PM, Gao, Liming <liming.gao@intel.com =
<mailto:liming.gao@intel.com>
> <mailto:liming.gao@intel.com <mailto:liming.gao@intel.com>>> wrote:
>=20
>=20
> Andrew:
> BZ 1163 is to remove inline X86 assembly code in C source file. But, =
this
> patch is wrong. I have gave my comments to update this patch.
>=20
>=20
>=20
> Why do we want to remove inline X86 assembly. As I mentioned it will =
lead
> to larger binaries, slower binaries, and less optimized code.
>=20
>=20
> For example take this simple inline assembly function:
>=20
> VOID
> EFIAPI
> CpuBreakpoint (
>  VOID
>  )
> {
>  __asm__ __volatile__ ("int $3");
> }
>=20
>=20
> Today with clang LTO turned on calling CpuBreakpoint() looks like this =
from
> the callers point of view.
>=20
>=20
> a.out[0x1fa5] <+6>:  cc              int3
>=20
>=20
> But if we move that to NASM:
>=20
>=20
> a.out[0x1fa6] <+7>:  e8 07 00 00 00  calll
> 0x1fb2                    ; CpuBreakpoint
>=20
>=20
>=20
> plus:
> a.out`CpuBreakpoint:
> a.out[0x1fb2] <+0>: cc     int3
> a.out[0x1fb3] <+1>: c3     retl
>=20
>=20
> And there is also an extra push and pop on the stack. The other issue =
is the
> call to the function that is unknown to the compiler will act like a
> _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++
> 2017). Is the depreciation of some of these intrinsics in VC++ driving =
the
> removal of inline assembly? For GCC inline assembly works great for =
local
> compile, and for clang LTO it works in entire program optimization.
>=20
>=20
> The LTO bitcode includes inline assembly and constraints so that the
> optimizer knows what to do so it can get optimized just like the =
abstract
> bitcode during the Link Time Optimization.
>=20
>=20
> This is CpuBreakpoint():
> ; Function Attrs: noinline nounwind optnone ssp uwtable
> define void @CpuBreakpoint() #0 {
>  call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"()
> #2, !srcloc !3
>=20
>  ret void
> }
>=20
>=20
> This is AsmReadMsr64():
> ; Function Attrs: noinline nounwind optnone ssp uwtable
> define i64 @AsmReadMsr64(i32) #0 {
>  %2 =3D alloca i32, align 4
>  %3 =3D alloca i64, align 8
>  store i32 %0, i32* %2, align 4
>  %4 =3D load i32, i32* %2, align 4
>  %5 =3D call i64 asm sideeffect "rdmsr",
> "=3DA,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4
>=20
>  store i64 %5, i64* %3, align 8
>  %6 =3D load i64, i64* %3, align 8
>  ret i64 %6
> }
>=20
>=20
>=20
> I agree it make sense to remove .S and .asm files and only have .nasm =
files.
>=20
> Thanks,
>=20
> Andrew Fish
>=20
> PS For the Xcode clang since it emits frame pointers by default we =
need to
> add the extra 4 bytes to save the assembly functions so the stack can =
get
> unwound.
>=20
>=20
> a.out`CpuBreakpoint:
> a.out[0x1f99] <+0>: 55     pushl  %ebp
> a.out[0x1f9a] <+1>: 89 e5  movl   %esp, %ebp
> a.out[0x1f9c] <+3>: cc     int3
> a.out[0x1f9d] <+4>: 5d     popl   %ebp
> a.out[0x1f9e] <+5>: c3     retl
>=20
>=20
> So breakpoint goes from 1 byte to 11 bytes if we get rid of the =
intrinsics.
>=20
>=20
> The change is to reduce the duplicated implementation. It will be good
> on the code maintain. Recently, one patch has to update .c and .nasm =
both.
> Please see
> https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html =
<https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html>
> <https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html =
<https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html>>.
> Beside this change, I propose to remove all GAS assembly code for IA32 =
and
> X64 arch. After those change, the patch owner will collect the impact =
of the
> image size.
>=20
>=20
> Thanks
> Liming
>=20
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel =
<https://lists.01.org/mailman/listinfo/edk2-devel>