public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Yao, Jiewen" <jiewen.yao@intel.com>
To: Andrew Fish <afish@apple.com>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>
Cc: "Gao, Liming" <liming.gao@intel.com>,
	edk2-devel-01 <edk2-devel@lists.01.org>
Subject: Re: [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code
Date: Thu, 7 Mar 2019 05:06:50 +0000	[thread overview]
Message-ID: <74D8A39837DF1E4DA445A8C0B3885C503F556F4E@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <47E8FE4A-4BDC-48A3-B7E6-02F1682C0990@apple.com>

HI Mike and Andrew
The problem is maintainability.

If we have multiple copy of implementation, a developer need verify multiple copy of implementation, if we make update. Before that, a developer has to be aware that there is multiple copy of implementation. - That increases the complexity.

If we have everything, there MAY be 5 copy - ASM, NASM, S, GCC inline, MS inline, theoretically.
Now, we remove ASM. It is good first step.
But we may still have 4 copies. I suggest we consider do more.

A recently case that SetJump/LongJump, I updated the NASM file for both IA32 and X64.

Later, to my surprise, only X64 version NASM works, and IA32 version NASM does not work.
Then I notice that there is a different copy if IA32 Jump.c MS inline - I also need update. That is frustrated.

I think there should be a balance between optimization and code readability/maintainability.

Do we have data on what size benefit we can get with these inline function, from whole image level?
We can do evaluation at function level, case by case.
If we see a huge size benefit, I agree to keep this function.
If we see no size benefit, I suggest we do the cleanup for this function.


Thank you
Yao Jiewen



> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Andrew Fish via edk2-devel
> Sent: Wednesday, March 6, 2019 5:31 PM
> To: Kinney, Michael D <michael.d.kinney@intel.com>
> Cc: edk2-devel-01 <edk2-devel@lists.01.org>; Gao, Liming
> <liming.gao@intel.com>
> Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
> 
> 
> 
> > On Mar 6, 2019, at 4:41 PM, Kinney, Michael D
> <michael.d.kinney@intel.com> wrote:
> >
> > Liming,
> >
> > That does not work either because inline assembly uses compiler specific
> syntax.
> >
> > Please update with the specific list of functions that you think the inline
> should be removed to improve maintenance with no impacts in size/perf.
> >
> 
> Mike,
> 
> It is easy to find the gcc/clang inline assembler, just `git grep "__asm__
> __volatile__"`
> 
> The main files are:
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLib
> Intrinsic/IoLibGcc.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X
> 64/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/I
> a32/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync
> hronizationLib/Ia32/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync
> hronizationLib/X64/GccInline.c
> 
> This is really just compiler optimizer control.
> Library/BaseSynchronizationLib/SynchronizationGcc.c:21:#define
> _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
> 
> Looks like this is working around the alignment ASSERT in BaseIoLibIntrinsic.
> OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:43:  __asm__ __volatile__
> ( "outl %0, %1" : : "a" (Value), "d" ((UINT16)Port) );
> OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:67:  __asm__ __volatile__
> ( "inl %1, %0" : "=a" (Data) : "d" ((UINT16)Port) );
> 
> It seems like we have a reasonable set and should not use the inline
> assembly for any more complex code.
> 
> Thanks,
> 
> Andrew Fish
> 
> > The issue you pointed to was around SetJump()/LongJump().  Can we limit
> this BZ to only those 2 functions?
> >
> > Mike
> >   <>
> > From: Gao, Liming
> > Sent: Wednesday, March 6, 2019 4:28 PM
> > To: afish@apple.com
> > Cc: Zhang, Shenglei <shenglei.zhang@intel.com>; edk2-devel-01
> <edk2-devel@lists.01.org>; Kinney, Michael D
> <michael.d.kinney@intel.com>
> > Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
> >
> > Andrew:
> >   I want to keep only one implementation. If inline assembly c source is
> preferred, I suggest to remove its nasm implementation.
> >
> > Thanks
> > Liming
> >  <>From: afish@apple.com <mailto:afish@apple.com>
> [mailto:afish@apple.com <mailto:afish@apple.com>]
> > Sent: Tuesday, March 5, 2019 2:44 PM
> > To: Gao, Liming <liming.gao@intel.com <mailto:liming.gao@intel.com>>
> > Cc: Zhang, Shenglei <shenglei.zhang@intel.com
> <mailto:shenglei.zhang@intel.com>>; edk2-devel-01
> <edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>>; Kinney,
> Michael D <michael.d.kinney@intel.com
> <mailto:michael.d.kinney@intel.com>>
> > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
> >
> >
> >
> >
> > On Mar 5, 2019, at 1:32 PM, Gao, Liming <liming.gao@intel.com
> <mailto:liming.gao@intel.com>> wrote:
> >
> > Andrew:
> >  BZ 1163 is to remove inline X86 assembly code in C source file. But, this
> patch is wrong. I have gave my comments to update this patch.
> >
> >
> > Why do we want to remove inline X86 assembly. As I mentioned it will lead
> to larger binaries, slower binaries, and less optimized code.
> >
> > For example take this simple inline assembly function:
> >
> > VOID
> > EFIAPI
> > CpuBreakpoint (
> >   VOID
> >   )
> > {
> >   __asm__ __volatile__ ("int $3");
> > }
> >
> >
> > Today with clang LTO turned on calling CpuBreakpoint() looks like this from
> the callers point of view.
> >
> > a.out[0x1fa5] <+6>:  cc              int3
> >
> >
> > But if we move that to NASM:
> >
> >
> > a.out[0x1fa6] <+7>:  e8 07 00 00 00  calll
> 0x1fb2                    ; CpuBreakpoint
> >
> >
> > plus:
> > a.out`CpuBreakpoint:
> > a.out[0x1fb2] <+0>: cc     int3
> > a.out[0x1fb3] <+1>: c3     retl
> >
> >
> > And there is also an extra push and pop on the stack. The other issue is the
> call to the function that is unknown to the compiler will act like a
> _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++
> 2017). Is the depreciation of some of these intrinsics in VC++ driving the
> removal of inline assembly? For GCC inline assembly works great for local
> compile, and for clang LTO it works in entire program optimization.
> >
> > The LTO bitcode includes inline assembly and constraints so that the
> optimizer knows what to do so it can get optimized just like the abstract
> bitcode during the Link Time Optimization.
> >
> > This is CpuBreakpoint():
> > ; Function Attrs: noinline nounwind optnone ssp uwtable
> > define void @CpuBreakpoint() #0 {
> >   call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"()
> #2, !srcloc !3
> >   ret void
> > }
> >
> >
> > This is AsmReadMsr64():
> > ; Function Attrs: noinline nounwind optnone ssp uwtable
> > define i64 @AsmReadMsr64(i32) #0 {
> >   %2 = alloca i32, align 4
> >   %3 = alloca i64, align 8
> >   store i32 %0, i32* %2, align 4
> >   %4 = load i32, i32* %2, align 4
> >   %5 = call i64 asm sideeffect "rdmsr",
> "=A,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4
> >   store i64 %5, i64* %3, align 8
> >   %6 = load i64, i64* %3, align 8
> >   ret i64 %6
> > }
> >
> >
> >
> > I agree it make sense to remove .S and .asm files and only have .nasm files.
> >
> > Thanks,
> >
> > Andrew Fish
> >
> > PS For the Xcode clang since it emits frame pointers by default we need to
> add the extra 4 bytes to save the assembly functions so the stack can get
> unwound.
> >
> > a.out`CpuBreakpoint:
> > a.out[0x1f99] <+0>: 55     pushl  %ebp
> > a.out[0x1f9a] <+1>: 89 e5  movl   %esp, %ebp
> > a.out[0x1f9c] <+3>: cc     int3
> > a.out[0x1f9d] <+4>: 5d     popl   %ebp
> > a.out[0x1f9e] <+5>: c3     retl
> >
> >
> > So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics.
> >
> >
> >  The change is to reduce the duplicated implementation. It will be good
> on the code maintain. Recently, one patch has to update .c and .nasm both.
> Please see
> https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html
> <https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html>.
> Beside this change, I propose to remove all GAS assembly code for IA32 and
> X64 arch. After those change, the patch owner will collect the impact of the
> image size.
> >
> > Thanks
> > Liming
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel

  reply	other threads:[~2019-03-07  5:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-05  1:40 [PATCH 0/3] MdePkg: Remove inline X86 assembly code Shenglei Zhang
2019-03-05  1:40 ` [PATCH 1/3] MdePkg/BaseCpuLib: " Shenglei Zhang
2019-03-05  1:40 ` [PATCH 2/3] MdePkg/BaseLib: " Shenglei Zhang
2019-03-05  1:40 ` [PATCH 3/3] MdePkg/BaseSynchronizationLib: " Shenglei Zhang
2019-03-05 20:57   ` Andrew Fish
2019-03-05 21:32     ` Gao, Liming
2019-03-05 21:45       ` Kinney, Michael D
2019-03-05 22:43       ` Andrew Fish
2019-03-07  0:28         ` Gao, Liming
2019-03-07  0:41           ` Kinney, Michael D
2019-03-07  1:31             ` Andrew Fish
2019-03-07  5:06               ` Yao, Jiewen [this message]
2019-03-07  5:56                 ` Andrew Fish
2019-03-07  6:09                   ` Yao, Jiewen
2019-03-07  6:45                     ` Andrew Fish
2019-03-07  7:15                       ` Gao, Liming
2019-03-07  7:25                         ` Yao, Jiewen
2019-03-09  0:07                           ` Kinney, Michael D
2019-03-09  0:15                             ` Andrew Fish
2019-03-14 22:27                               ` Gao, Liming
2019-03-05  2:10 ` [PATCH 0/3] MdePkg: " Gao, Liming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74D8A39837DF1E4DA445A8C0B3885C503F556F4E@shsmsx102.ccr.corp.intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox