From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from ma-mailsvcp-mx-lapp01.apple.com (ma-mailsvcp-mx-lapp01.apple.com [17.32.222.22]) by mx.groups.io with SMTP id smtpd.web10.8538.1684456968236164910 for ; Thu, 18 May 2023 17:42:48 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@apple.com header.s=20180706 header.b=Pbg/Px1O; spf=pass (domain: apple.com, ip: 17.32.222.22, mailfrom: afish@apple.com) Received: from rn-mailsvcp-mta-lapp02.rno.apple.com (rn-mailsvcp-mta-lapp02.rno.apple.com [10.225.203.150]) by ma-mailsvcp-mx-lapp01.apple.com (Oracle Communications Messaging Server 8.1.0.22.20230228 64bit (built Feb 28 2023)) with ESMTPS id <0RUV00UTQRAEKH10@ma-mailsvcp-mx-lapp01.apple.com> for devel@edk2.groups.io; Thu, 18 May 2023 17:42:47 -0700 (PDT) X-Proofpoint-ORIG-GUID: k30KgLPcJSAr_RhyYy7hJgYLVZtStTkv X-Proofpoint-GUID: k30KgLPcJSAr_RhyYy7hJgYLVZtStTkv X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.573,18.0.957 definitions=2023-05-18_17:2023-05-17,2023-05-18 signatures=0 X-Proofpoint-Spam-Details: rule=interactive_user_notspam policy=interactive_user score=0 adultscore=0 phishscore=0 suspectscore=0 bulkscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305190003 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=from : message-id : content-type : mime-version : subject : date : in-reply-to : cc : to : references; s=20180706; bh=z+cLnuEKwJyz0fgxrGlbmFQWYWr20A87KdtFEnO0swQ=; b=Pbg/Px1O865rKlaWOefXJ80yAclN7zGKWdcWxyPL/sYgBvMMC6vsN5EL9oFcZLixt5jS fYXWIOkWB3WVeqxZ+m17kDuB3cKxgGrrKEALRARx8C0gmRF/0WKxC108Au2lMmU4gOCN 8VilcSxFvIbGnwwD5B53xLMfvOexUip/dndUqqfQ/Bil4XV2c0FPsrIpxq3Q6hr03saw d5KcTZUlwTcaTKKRO+uQKfKIiXjH4igJcyjY6LQwkWTGWtQoZABijNgbudnBz5tilt0x bRk0Zm4xsVUX81VyaWLxdxfq0KSSX+X3PqChwCPKoL817sZf44tuZIKaghUlAQdHiRyH MA== Received: from rn-mailsvcp-mmp-lapp02.rno.apple.com (rn-mailsvcp-mmp-lapp02.rno.apple.com [17.179.253.15]) by rn-mailsvcp-mta-lapp02.rno.apple.com (Oracle Communications Messaging Server 8.1.0.22.20230228 64bit (built Feb 28 2023)) with ESMTPS id <0RUV003DSRB2MH80@rn-mailsvcp-mta-lapp02.rno.apple.com>; Thu, 18 May 2023 17:42:38 -0700 (PDT) Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp02.rno.apple.com by rn-mailsvcp-mmp-lapp02.rno.apple.com (Oracle Communications Messaging Server 8.1.0.22.20230228 64bit (built Feb 28 2023)) id <0RUV00X00QZ1D600@rn-mailsvcp-mmp-lapp02.rno.apple.com>; Thu, 18 May 2023 17:42:38 -0700 (PDT) X-Va-A: X-Va-T-CD: ea8ecdd7c7fec670404234df68c44261 X-Va-E-CD: 156d5ae2ad5b70872c2d8b4f187426ab X-Va-R-CD: faed37d9f0280288bcb510df700d8485 X-Va-ID: e0b6a881-12b9-4019-8496-9891be7cebf4 X-Va-CD: 0 X-V-A: X-V-T-CD: ea8ecdd7c7fec670404234df68c44261 X-V-E-CD: 156d5ae2ad5b70872c2d8b4f187426ab X-V-R-CD: faed37d9f0280288bcb510df700d8485 X-V-ID: a70a2021-343f-4b3e-979e-3432977ac6f4 X-V-CD: 0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.573,18.0.957 definitions=2023-05-18_17:2023-05-17,2023-05-18 signatures=0 Received: from smtpclient.apple (unknown [17.11.45.59]) by rn-mailsvcp-mmp-lapp02.rno.apple.com (Oracle Communications Messaging Server 8.1.0.22.20230228 64bit (built Feb 28 2023)) with ESMTPSA id <0RUV00JG9RAV4800@rn-mailsvcp-mmp-lapp02.rno.apple.com>; Thu, 18 May 2023 17:42:38 -0700 (PDT) From: "Andrew Fish" Message-id: <69A704CE-16A4-4C8D-9A32-9BBD22C00475@apple.com> MIME-version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler Date: Thu, 18 May 2023 17:42:20 -0700 In-reply-to: Cc: "Ni, Ray" , Rebecca Cran To: devel@edk2.groups.io, Mike Kinney References: <7C9FD4BA-328C-4CFE-AF5A-3A795BB147E4@apple.com> <0EECE39F-0B65-4BFE-8668-CB59448C578D@apple.com> <17605136DCF3E084.26337@groups.io> X-Mailer: Apple Mail (2.3731.600.7) Content-type: multipart/alternative; boundary="Apple-Mail=_BC2F7F6B-3067-4EB0-94C9-A9D7A1FA8C5C" --Apple-Mail=_BC2F7F6B-3067-4EB0-94C9-A9D7A1FA8C5C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Mike, Sorry static was just to scope the name to the file since it is a lib, not = to make it work. That is a cool site. I learned about it complaining about stuff to the comp= iler team on our internal clang Slack channel as they use it to answer my q= uestions. Thanks, Andrew Fish > On May 18, 2023, at 2:42 PM, Michael D Kinney wrote: >=20 > Using that tool, the following fragment seems to generate the right code.= Volatile is required. Static is optional. > =20 > static volatile int mDeadLoopCount =3D 0; > =20 > void > CpuDeadLoop( > void > ) > { > while (mDeadLoopCount =3D=3D 0); > } > =20 > =20 > GCC > =3D=3D=3D > CpuDeadLoop(): > .L2: > mov eax, DWORD PTR mDeadLoopCount[rip] > test eax, eax > je .L2 > ret > =20 > =20 > CLANG > =3D=3D=3D=3D=3D > CpuDeadLoop(): # @CpuDeadLoop() > .LBB0_1: # =3D>This Inner Loop Header: Dep= th=3D1 > cmp dword ptr [rip + _ZL14mDeadLoopCount], 0 > je .LBB0_1 > ret > =20 > =20 > Mike > =20 > =20 > From: Andrew (EFI) Fish >=20 > Sent: Thursday, May 18, 2023 1:45 PM > To: edk2-devel-groups-io >; Andrew Fish > > Cc: Kinney, Michael D >; Ni, Ray >; Rebecc= a Cran > > Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler > =20 > Whoops wrong compiler. Here is an update. I added the flags so this one r= eproduces the issue. > =20 > Compiler Explorer > godbolt.org =09 > > =20 > Thanks, > =20 > Andrew Fish >=20 >=20 > On May 18, 2023, at 11:45 AM, Andrew Fish via groups.io > wrote: > =20 > Mike, > =20 > This is a good way to play around with fixes, and to report bugs. You can= see the assembler for different compilers with different flag.=20 > =20 > Compiler Explorer > godbolt.org =09 > > =20 > Sorry I=E2=80=99m traveling and in Cupertino with lots of meetings so I d= id not have time to adjust the compiler flags=E2=80=A6. > =20 > Thanks, > =20 > Andrew Fish >=20 >=20 > On May 18, 2023, at 10:24 AM, Andrew (EFI) Fish > wrote: > =20 > Mike, > =20 > I guess my other question=E2=80=A6 If this turns out to be a compiler bug= should we scope the change to the broken toolchain. I=E2=80=99m not sure w= hat the right answer is for that, but I want to ask the question?=20 > =20 > Thanks, > =20 > Andrew Fish >=20 >=20 > On May 18, 2023, at 10:19 AM, Michael D Kinney > wrote: > =20 > Andrew, > =20 > This might work for XIP. Set non const global to initial value that is e= xpected value to stay in dead loop. > =20 > UINTN mDeadLoopCount =3D 0; > =20 > VOID > CpuDeadLoop( > VOID > )=20 > { > while (mDeadLoopCount =3D=3D 0) { > CpuPause(); > } > } > =20 > When deadloop is entered, developer can not change value of mDeadLoopCoun= t, but they can use debugger to force exit loop and return from function. > =20 > Mike > =20 > =20 > From: Andrew (EFI) Fish >=20 > Sent: Thursday, May 18, 2023 10:09 AM > To: Kinney, Michael D > > Cc: edk2-devel-groups-io >; Ni, Ray >; Rebecca Cran > > Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler > =20 > Mike, > =20 > Good point, that is why we are using the stack =E2=80=A6. > =20 > The only other thing I can think of is to pass the address of Index to so= me inline assembler, or an asm no op function, to give it a side effect the= compiler can=E2=80=99t resolve.=20 > =20 > Thanks, > =20 > Andrew Fish >=20 >=20 >=20 > On May 18, 2023, at 10:05 AM, Kinney, Michael D > wrote: > =20 > Static global will not work for XIP > =20 > Mike > =20 > From: Andrew (EFI) Fish >=20 > Sent: Thursday, May 18, 2023 9:49 AM > To: edk2-devel-groups-io >; Kinney, Michael D > > Cc: Ni, Ray >; Rebecca Cran > > Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler > =20 > Mike, > =20 > I pinged some compiler experts to see if our code is correct, or if the c= ompiler has an issue. Seems to be trending compiler issue right now, but I= =E2=80=99ve NOT gotten feedback from anyone on the spec committee yet.=20 > =20 > If we move Index to a static global that would likely work around the com= piler issue. > =20 > Thanks, > =20 > Andrew Fish >=20 >=20 >=20 >=20 > On May 18, 2023, at 8:36 AM, Michael D Kinney > wrote: > =20 > Hi Ray, > =20 > So the code generated does deadloop, but is just not easy to resume from = as we have been able to do in the past. > =20 > We use CpuDeadloop() for 2 purposes. One is a terminal condition with no= reason to ever continue. > =20 > The 2nd is a debug aide for developers to halt the system at a specific l= ocation and then continue from that point, usually with a debugger, to step= through code to an area to evaluate unexpected behavior. > =20 > We may have to do a NASM implementation of CpuDeadloop() to make sure it = meets both use cases. > =20 > Mike > =20 > From: Ni, Ray >=20 > Sent: Thursday, May 18, 2023 3:00 AM > To: devel@edk2.groups.io > Cc: Kinney, Michael D >; Rebecca Cran >;= Ni, Ray > > Subject: CpuDeadLoop() is optimized by compiler > =20 > Hi, > Starting from certain version of Visual Studio C compiler (I don=E2=80=99= t have the exact version. I am using VS2019), CpuDeadLoop is now optimized = quite well by compiler. > =20 > The optimization is so =E2=80=9Cgood=E2=80=9D that it becomes harder for = developers to break out of the deadloop. > =20 > I copied the assembly instructions as below for your reference. > The compiler does not generate instructions that jump out of the loop whe= n the Index is not zero. > So in order to break out of the loop, developers need to: > Manually adjust rsp by increasing 40 > Manually =E2=80=9Cret=E2=80=9D > =20 > I am not sure if anyone has interest to re-write this function so that co= mpiler can be =E2=80=9Cfooled=E2=80=9D again. > Thanks, > Ray > =20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > ; Function compile flags: /Ogspy > ; File e:\work\edk2\MdePkg\Library\BaseLib\CpuDeadLoop.c > ; COMDAT CpuDeadLoop > _TEXT SEGMENT > Index$ =3D 48 > CpuDeadLoop PROC = ; COMDAT > =20 > ; 26 : { > =20 > $LN12: > 00000 48 83 ec 28 sub rsp, 40 = ; 00000028H > =20 > ; 27 : volatile UINTN Index; > ; 28 :=20 > ; 29 : for (Index =3D 0; Index =3D=3D 0;) { > =20 > 00004 48 c7 44 24 30 > 00 00 00 00 mov QWORD PTR Index$[rsp], 0 > $LN10@CpuDeadLoo: > =20 > ; 30 : CpuPause (); > =20 > 0000d 48 8b 44 24 30 mov rax, QWORD PTR Index$[rsp] > 00012 e8 00 00 00 00 call CpuPause > 00017 eb f4 jmp SHORT $LN10@CpuDeadLoo > CpuDeadLoop ENDP > _TEXT ENDS > END > =20 > =20 > =20 > =20 > =20 > =20 >=20 --Apple-Mail=_BC2F7F6B-3067-4EB0-94C9-A9D7A1FA8C5C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Mike,

Sorry st= atic was just to scope the name to the file since it is a lib, not to make = it work.

That is a cool site. I learned about it c= omplaining about stuff to the compiler team on our internal clang Slack cha= nnel as they use it to answer my questions.

Thanks= ,

Andrew Fish

On May 18, 2023, at 2:42 PM, Michael D Kinney <michael.d.kinney@i= ntel.com> wrote:

Using that tool, the following fragment seems to generate = the right code.  Volatile is required.  Static is optional.<= /o:p>
 
static volatile=  int=   mDeadLoopCount =3D=  0;
 
void
  void=
  )<= /div>
{
  while (mDead= LoopCount =3D=3D <= span style=3D"font-size: 10.5pt; font-family: Consolas; color: rgb(9, 134, = 88);">0);<= o:p>
}
 
 
GCC
=3D=3D=3D<= o:p>
<= span style=3D"font-size: 10.5pt; font-family: Consolas; color: teal;">CpuDe= adLoop():<= o:p>
    =     mov     eax,&n= bsp;DWORD PTR mDeadLoopCount[rip]
<= div style=3D"margin: 0in; font-size: 11pt; font-family: Calibri, sans-serif= ; line-height: 14.25pt; background: rgb(255, 255, 254);">        test    eax, eax
        je    = ;  .L2
 =       = ret
 
 
= CLANG
=3D=3D=3D=3D=3D
CpuDeadLoop():                 &n= bsp;     # = @CpuDeadLoop()
.LBB0_1:                 &nb= sp;              # =3D>This Inner Loo= p Header:<= span class=3D"Apple-converted-space"> Depth= =3D1
&n= bsp;       cmp     dword&n= bsp;ptr [= rip + _ZL14mDeadLoopCount],&n= bsp;0
       &nbs= p;je      = .LBB0_1=
  &nb= sp;     = ret<= /span>=
 
 
Mike=
 
 
From: And= rew (EFI) Fish <afish@apple.com> 
Sent: Thursday, May 18, 2023 1:45 PM
To: edk2-devel-groups-io <devel@edk2.groups.io>; Andrew Fish <afish@app= le.com>
Cc: Kinney, Michael D <michael.d.kinney@intel.com<= /a>>; Ni, Ray <ray.ni@intel.com>; Rebecca Cran <= ;rebecca@bsdio.com>
Subject: Re: [edk2-devel] CpuDeadLoop() is optimiz= ed by compiler
 
W= hoops wrong compiler. Here is an update. I added the flags so this one repr= oduces the issue.
 
 
Thanks,
 
Andrew Fish


On May 18, 2023, at 11:45 AM, Andre= w Fish via groups.i= o<afish=3Dapple.com@groups.io> wrote:<= o:p>
 
Mike,
 
This is a good way= to play around with fixes, and to report bugs. You can see the assembler f= or different compilers with different flag. 
 
Sorry I=E2=80= =99m traveling and in Cupertino with lots of meetings so I did not have tim= e to adjust the compiler flags=E2=80=A6.
 
Thanks,
&= nbsp;
Andrew Fish

On May 18, 2023, at 10:24 AM, Andrew (EFI) Fish <= afish@apple.com> wrote:
 <= /div>
Mike,
 
I guess my other question=E2=80=A6 If this turns out to be a c= ompiler bug should we scope the change to the broken toolchain. I=E2=80=99m= not sure what the right answer is for that, but I want to ask the question= ? 
 
 
=
Andrew Fish


On May 18, 2023,= at 10:19 AM, Michael D Kinney <michael.d.kinney@i= ntel.com> wrote:
 
Andrew,
 
<= /div>
This might work for XIP.  Set non const global to initia= l value that is expected value to stay in dead loop.
=
 
UINTN  mDeadLoopCount = =3D 0;
 
VOID
CpuDeadLoop(
  VOID
  ) 
{
  while (mDeadLoopCount =3D=3D 0) {<= /div>
      CpuPause();
  }
}
 
When deadloop is ent= ered, developer can not change value of mDeadLoopCount, but they can use de= bugger to force exit loop and return from function.
<= div>
 
Mike
 
 
<= div style=3D"border-width: medium medium medium 1.5pt; border-style: none n= one none solid; padding: 0in 0in 0in 4pt; border-color: currentcolor curren= tcolor currentcolor blue; border-image: none;">
From: Andrew (EFI) Fish <afish@apple.com> 
Sent: Thu= rsday, May 18, 2023 10:09 AM
To: Kinney, Michael D <michael.d.kin= ney@intel.com>
Cc:&n= bsp;edk2-devel-groups-io <devel@edk2.groups.io>; Ni, Ray <ray.ni@intel.com>; Rebecca Cran <<= a href=3D"mailto:rebecca@bsdio.com" style=3D"color: blue; text-decoration: = underline;">rebecca@bsdio.com>
Subject: Re: [edk2-devel] CpuDeadLoop() is optimized= by compiler
 
Mike,
 
Good point, that is why we are using the= stack =E2=80=A6.
 
The only other thing I can think of is to= pass the address of Index to some inline assembler, or an asm no op functi= on, to give it a side effect the compiler can=E2=80=99t resolve. =
 
Thanks,
 
Andrew Fish
=



On May 18, 2023, at 10:05 AM, Kin= ney, Michael D <michael.d.kinney@intel.com>= wrote:
 
<= div>
Static global will not work for XIP
 
Mike<= o:p>
 
Fro= m: Andrew (EFI) Fish = <afish@apple.com> 
Sent: <= /span>Thursday, May 18, 2023 9:49 AM
To: edk2-devel-groups-io <devel@ed= k2.groups.io>; Kinney, Michael D <michael.d= .kinney@intel.com>
Cc: Ni, Ray <ray.ni@intel.com>; Rebecca Cr= an <rebecca@bsdio.com>
Subject: Re: [edk2-devel] CpuDeadLoop() is o= ptimized by compiler
=  
Mike,
 
I pinged some compiler experts to see if our code is correct, o= r if the compiler has an issue. Seems to be trending compiler issue right n= ow, but I=E2=80=99ve NOT gotten feedback from anyone on the spec committee = yet. 
 =
If we move Index to a stat= ic global that would likely work around the compiler issue.
 
Thanks,
=
 
Andrew= Fish




<= /o:p>
On May 18, 2023, at 8:36 AM, Michael D Kinney <= ;michael.d.kinney@intel.com> wrote:=
 
Hi Ray,
<= div>
 
So the= code generated does deadloop, but is just not easy to resume from as we ha= ve been able to do in the past.
 
We us= e CpuDeadloop() for 2 purposes.  One is a terminal condition with no r= eason to ever continue.
The 2nd<= /sup> is a debug aide for= developers to halt the system at a specific location and then continue fro= m that point, usually with a debugger, to step through code to an area to e= valuate unexpected behavior.
 
We may= have to do a NASM implementation of CpuDeadloop() to make sure it meets bo= th use cases.
 =
Mike
=
 
<= b>From: Ni, Ray <<= a href=3D"mailto:ray.ni@intel.com" style=3D"color: blue; text-decoration: u= nderline;">ray.ni@intel.com>&n= bsp;
Sent: Thursday, May 18, 2023 3:00 AM
To: devel@edk2.groups.io
Cc: Kinney, Michael D <= michael.d.kinney@intel.com>; Rebecca Cran <= rebecca@bsdio.com>; Ni, Ray <ray.ni@inte= l.com>
Subject: = ;CpuDeadLoop() is optimized by compiler
=
 
Hi,
=
Starting from certain version of Visual Studio C compiler (I don=E2=80= =99t have the exact version. I am using VS2019), CpuDeadLoop is now optimiz= ed quite well by compiler.
 
The optimi= zation is so =E2=80=9Cgood=E2=80=9D that it becomes harder for developers t= o break out of the deadloop.
 
I copi= ed the assembly instructions as below for your reference.
<= /div>
The compiler does not generate instruct= ions that jump out of the loop when the Index is not zero.
=
So in order to break out of the loop, = developers need to:
  1. Manually adjust rsp by increasing 40
  2. Manually =E2=80=9Cret=E2=80=9D
 
I am= not sure if anyone has interest to re-write this function so that compiler= can be =E2=80=9Cfooled=E2=80=9D again.
<= div>
Thanks,
Ray
 
=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
; Function compile flags: /Ogs= py
; File e:\work\edk2= \MdePkg\Library\BaseLib\CpuDeadLoop.c
;         &nbs= p;    COMDAT CpuDeadLoop
<= div>
_TEXT    SEGMENT
Index$ =3D 48
CpuDeadLoop PROC      &nb= sp;            =             &nb= sp;            =             &nb= sp;           ; COMDAT
 
<= /div>
; 26   : {
 
$LN12:
  00000  48 83 ec 28      &nbs= p;  sub        rsp, 40  &= nbsp;           &nbs= p;            &= nbsp;    ; 00000028H
=
 
;= 27   :   volatile UINTN  Index;
<= /div>
; 28   : 
=
; 29   :   for (Index =3D 0; Index =3D=3D 0;) = {
 
  00004  48 c7 44 24 30
    = ;           00 00 00 00&n= bsp;       mov      = QWORD PTR Index$[rsp], 0
<= div style=3D"margin: 0in; font-size: 11pt; font-family: Calibri, sans-serif= ;">$LN10@CpuDeadLoo:
&= nbsp;
; 30  = :     CpuPause ();
<= div>
 
<= div style=3D"margin: 0in; font-size: 11pt; font-family: Calibri, sans-serif= ;">  0000d  48 8b 44 24 30   mov    = ;  rax, QWORD PTR Index$[rsp]
<= div>
  00012  e8 00 00 00 00   call  =       CpuPause
<= div>
  00017  eb f4     &nb= sp;            =    jmp       SHORT $LN10@CpuDeadLoo=
CpuDeadLoop ENDP=
_TEXT    EN= DS
END
 
 
 
 
 

 
--Apple-Mail=_BC2F7F6B-3067-4EB0-94C9-A9D7A1FA8C5C--