From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <liming.gao@intel.com>
Received-SPF: Pass (sender SPF authorized) identity=mailfrom;
 client-ip=192.55.52.120; helo=mga04.intel.com;
 envelope-from=liming.gao@intel.com; receiver=edk2-devel@lists.01.org 
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 9229621A09130
 for <edk2-devel@lists.01.org>; Wed,  6 Mar 2019 16:28:15 -0800 (PST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 06 Mar 2019 16:28:15 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.58,449,1544515200"; 
 d="scan'208,217";a="131890440"
Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201])
 by orsmga003.jf.intel.com with ESMTP; 06 Mar 2019 16:28:14 -0800
Received: from FMSMSX110.amr.corp.intel.com (10.18.116.10) by
 FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS)
 id 14.3.408.0; Wed, 6 Mar 2019 16:28:14 -0800
Received: from shsmsx108.ccr.corp.intel.com (10.239.4.97) by
 fmsmsx110.amr.corp.intel.com (10.18.116.10) with Microsoft SMTP Server (TLS)
 id 14.3.408.0; Wed, 6 Mar 2019 16:28:13 -0800
Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.74]) by
 SHSMSX108.ccr.corp.intel.com ([169.254.8.57]) with mapi id 14.03.0415.000;
 Thu, 7 Mar 2019 08:28:12 +0800
From: "Gao, Liming" <liming.gao@intel.com>
To: "afish@apple.com" <afish@apple.com>
CC: "Zhang, Shenglei" <shenglei.zhang@intel.com>, edk2-devel-01
 <edk2-devel@lists.01.org>, "Kinney, Michael D" <michael.d.kinney@intel.com>
Thread-Topic: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
 inline X86 assembly code
Thread-Index: AQHU0vSE94gh21G/NUi0Z1yYhGmykqX9ACSAgACMrPD//5DuAIACKyZQ
Date: Thu, 7 Mar 2019 00:28:11 +0000
Message-ID: <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FCC47@SHSMSX104.ccr.corp.intel.com>
References: <20190305014059.17988-1-shenglei.zhang@intel.com>
 <20190305014059.17988-4-shenglei.zhang@intel.com>
 <C7542524-066B-4DC6-A2D8-B02EF3338042@apple.com>
 <4A89E2EF3DFEDB4C8BFDE51014F606A14E3FBBC2@SHSMSX104.ccr.corp.intel.com>
 <D5E70882-6E2F-4230-8E3D-63D26B75F2DB@apple.com>
In-Reply-To: <D5E70882-6E2F-4230-8E3D-63D26B75F2DB@apple.com>
Accept-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ctpclassification: CTP_NT
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiY2JjOWNjMmEtNTI0ZC00ZDczLWIyMGEtODU5YWQ0NjM3YWVmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiaW1ya29JUjZ1V0ZzdWdcL0NRUExCcFwvbStSVE5NMlRyT3Exc1g1bXlBcGFxZytVTlh0TUZVckpYOEtkalhyWmxRIn0=
dlp-product: dlpe-windows
dlp-version: 11.0.400.15
dlp-reaction: no-action
x-originating-ip: [10.239.127.40]
MIME-Version: 1.0
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Mar 2019 00:28:15 -0000
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Andrew:
  I want to keep only one implementation. If inline assembly c source is pr=
eferred, I suggest to remove its nasm implementation.

Thanks
Liming
From: afish@apple.com [mailto:afish@apple.com]
Sent: Tuesday, March 5, 2019 2:44 PM
To: Gao, Liming <liming.gao@intel.com>
Cc: Zhang, Shenglei <shenglei.zhang@intel.com>; edk2-devel-01 <edk2-devel@l=
ists.01.org>; Kinney, Michael D <michael.d.kinney@intel.com>
Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inlin=
e X86 assembly code


On Mar 5, 2019, at 1:32 PM, Gao, Liming <liming.gao@intel.com<mailto:liming=
.gao@intel.com>> wrote:

Andrew:
 BZ 1163 is to remove inline X86 assembly code in C source file. But, this =
patch is wrong. I have gave my comments to update this patch.


Why do we want to remove inline X86 assembly. As I mentioned it will lead t=
o larger binaries, slower binaries, and less optimized code.

For example take this simple inline assembly function:

VOID
EFIAPI
CpuBreakpoint (
  VOID
  )
{
  __asm__ __volatile__ ("int $3");
}


Today with clang LTO turned on calling CpuBreakpoint() looks like this from=
 the callers point of view.

a.out[0x1fa5] <+6>:  cc              int3


But if we move that to NASM:


a.out[0x1fa6] <+7>:  e8 07 00 00 00  calll  0x1fb2                    ; Cpu=
Breakpoint


plus:
a.out`CpuBreakpoint:
a.out[0x1fb2] <+0>: cc     int3
a.out[0x1fb3] <+1>: c3     retl


And there is also an extra push and pop on the stack. The other issue is th=
e call to the function that is unknown to the compiler will act like a _Rea=
dWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++ 2017). I=
s the depreciation of some of these intrinsics in VC++ driving the removal =
of inline assembly? For GCC inline assembly works great for local compile, =
and for clang LTO it works in entire program optimization.

The LTO bitcode includes inline assembly and constraints so that the optimi=
zer knows what to do so it can get optimized just like the abstract bitcode=
 during the Link Time Optimization.

This is CpuBreakpoint():
; Function Attrs: noinline nounwind optnone ssp uwtable
define void @CpuBreakpoint() #0 {
  call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() #2, !=
srcloc !3
  ret void
}


This is AsmReadMsr64():
; Function Attrs: noinline nounwind optnone ssp uwtable
define i64 @AsmReadMsr64(i32) #0 {
  %2 =3D alloca i32, align 4
  %3 =3D alloca i64, align 8
  store i32 %0, i32* %2, align 4
  %4 =3D load i32, i32* %2, align 4
  %5 =3D call i64 asm sideeffect "rdmsr", "=3DA,{cx},~{dirflag},~{fpsr},~{f=
lags}"(i32 %4) #2, !srcloc !4
  store i64 %5, i64* %3, align 8
  %6 =3D load i64, i64* %3, align 8
  ret i64 %6
}


I agree it make sense to remove .S and .asm files and only have .nasm files=
.

Thanks,

Andrew Fish

PS For the Xcode clang since it emits frame pointers by default we need to =
add the extra 4 bytes to save the assembly functions so the stack can get u=
nwound.

a.out`CpuBreakpoint:
a.out[0x1f99] <+0>: 55     pushl  %ebp
a.out[0x1f9a] <+1>: 89 e5  movl   %esp, %ebp
a.out[0x1f9c] <+3>: cc     int3
a.out[0x1f9d] <+4>: 5d     popl   %ebp
a.out[0x1f9e] <+5>: c3     retl


So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics.


 The change is to reduce the duplicated implementation. It will be good on =
the code maintain. Recently, one patch has to update .c and .nasm both. Ple=
ase see https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html=
. Beside this change, I propose to remove all GAS assembly code for IA32 an=
d X64 arch. After those change, the patch owner will collect the impact of =
the image size.

Thanks
Liming