Re: [PATCH] MdePkg/BaseSynchronizationLib XCODE: fix InternalSync[De|In]crement

From: "Ni, Ruiyu" <ruiyu.ni@intel.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Laszlo Ersek <lersek@redhat.com>
Cc: "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>,
	"Gao, Liming" <liming.gao@intel.com>
Subject: Re: [PATCH] MdePkg/BaseSynchronizationLib XCODE: fix InternalSync[De|In]crement
Date: Wed, 7 Nov 2018 15:27:43 +0000	[thread overview]
Message-ID: <734D49CCEBEEF84792F5B80ED585239D5BEF5A7F@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <CAKv+Gu8F7rNLgToE16f9kQmgXd89dtPkHF0KFL8sH834G1kH6Q@mail.gmail.com>

> -----Original Message-----
> From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
> Sent: Wednesday, November 7, 2018 10:46 PM
> To: Laszlo Ersek <lersek@redhat.com>
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>; edk2-devel@lists.01.org; Kinney, Michael D
> <michael.d.kinney@intel.com>; Gao, Liming <liming.gao@intel.com>
> Subject: Re: [edk2] [PATCH] MdePkg/BaseSynchronizationLib XCODE: fix
> InternalSync[De|In]crement
> 
> On 7 November 2018 at 15:31, Laszlo Ersek <lersek@redhat.com> wrote:
> > (+Andrew)
> >
> > Hi Ray,
> >
> > On 11/07/18 05:03, Ruiyu Ni wrote:
> >> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1303
> >>
> >> XCODE disassembly code of InternalSyncDecrement with today's code is:
> >>
> >>   __asm__ __volatile__ (
> >>     "movl    $1, %%eax  \n\t"
> >>     "lock               \n\t"
> >>     "xadd    %%eax, %1  \n\t"
> >>     "inc     %%eax      \n\t"
> >>     : "=a" (Result),          // %0
> >>       "+m" (*Value)           // %1
> >>     :                         // no inputs that aren't also outputs
> >>     : "memory",
> >>       "cc"
> >>     );
> >>
> >>  0:       55      pushl   %ebp
> >>  1:       89 e5   movl    %esp, %ebp
> >>  3:       8b 45 08        movl    8(%ebp), %eax
> >>  6:       b8 01 00 00 00  movl    $1, %eax
> >>  b:       f0      lock
> >>  c:       0f c1 00        xaddl   %eax, _InternalSyncIncrement(%eax)
> >>  f:       40      incl    %eax
> >> 10:       5d      popl    %ebp
> >> 11:       c3      retl
> >>
> >> %EAX value retrieved in line #3 is overwritten in line #6.
> >
> > (a) This looks like an XCODE bug to me. The "=a" constraint on operand
> > %0 means that Result should be set from eax/rax, and that this operand
> > is "write only". Here's the gcc documentation:
> >
> > "The ordinary output operands must be write-only; GCC assumes that the
> > values in these operands before the instruction are dead and need not
> > be generated."
> >
> 
> Do we need to use early clobber here? "=&a"?
> 
> It seems to me that without that, the compiler is free to use eax as input, since it
> only assumes that eax will be set by the last instruction.

Ard,
Thanks for pointing that.
I found below wordings from GCC documents:
"Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not
 overlap an input. Otherwise, GCC may allocate the output operand in the same
 register as an unrelated input operand, on the assumption that the assembler code
 consumes its inputs before producing outputs. This assumption may be false if the
 assembler code actually consists of more than one instruction."

I think '&' is just for this case.

@Laszlo,
Do you agree that a simple patch to add '&' to both Ia32 and X64 version of
InternalSync[De|In]crement is good enough?

I actually tried in XCODE5 toolchain and it works good.
movl    8(%ebp), %eax
becomes
movl    8(%ebp), %ecx

> 
> 
> > Furthermore, if a register is named in any operand constraint (such as
> > input, output, input-output), then it cannot, by definition, be listed
> > in the "clobber list" -- it is *obvious* that the inline assembly will
> > work with that register. In case of an output-only operand, it's
> > obvious that the inline assembly will *overwrite* that register, and
> > the compiler cannot assume *when* that overwrite will happen. Here's
> > the gcc
> > documentation:
> >
> > "You may not write a clobber description in a way that overlaps with
> > an input or output operand.  For example, you may not have an operand
> > describing a register class with one member if you mention that
> > register in the clobber list.  Variables declared to live in specific
> > registers [...] and used as asm input or output operands must have no
> > part mentioned in the clobber description." [1]
> >
> > However, XCODE generates such code that depends on EAX as an *input*
> > operand. That's clearly wrong and violates the constraints in the
> > operand list. This is an XCODE bug.
> >
> > In fact: the situation is worse than that. In the commit message you
> > spell out that the MOV instruction at offset 6 overwrites the value in
> > EAX that was just loaded at offset 3. But this is just the small
> > problem; the *large* problem is the generated XADD instruction itself,
> > at offset 0xb and 0xc. The binary encoding is, from your commit message:
> >
> >   f0 0f c1 00
> >
> > and this is *exactly* the problem that my commit 8a94eb9283fa fixed
> > for gcc! From my commit message:
> >
> >     >     439c:       f0 0f c1 00             lock xadd %eax,(%rax)
> >
> > Because, it makes *no sense* for XADD to use the AX register for
> > *both* pointer-to-memory (i.e. address of the destination location
> > that receives the sum) *and* as the other addend!
> >
> > In other words, regardless of how we fill the AX register up-front, an
> > XADD instruction generated like this *cannot* be right.
> >
> >> The patch uses the clobber list to tell GCC that EAX is used in ASM.
> >>
> >> Contributed-under: TianoCore Contribution Agreement 1.1
> >> Signed-off-by: Ruiyu Ni <ruiyu.ni@intel.com>
> >> Cc: Liming Gao <liming.gao@intel.com>
> >> Cc: Michael Kinney <michael.d.kinney@intel.com>
> >> Cc: Laszlo Ersek <lersek@redhat.com>
> >> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> >> ---
> >>  MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c | 12
> >> ++++++++----
> >>  1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
> >> b/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
> >> index af39bdeb51..0a985529fd 100644
> >> --- a/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
> >> +++ b/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
> >> @@ -40,11 +40,13 @@ InternalSyncIncrement (
> >>      "lock               \n\t"
> >>      "xadd    %%eax, %1  \n\t"
> >>      "inc     %%eax      \n\t"
> >> -    : "=a" (Result),          // %0
> >> +    "mov     %%eax, %0  \n\t"
> >> +    : "=r" (Result),          // %0
> >>        "+m" (*Value)           // %1
> >>      :                         // no inputs that aren't also outputs
> >>      : "memory",
> >> -      "cc"
> >> +      "cc",
> >> +      "eax"
> >>      );
> >>
> >>    return Result;
> >
> > (b) This change is invalid, for two separate reasons.
> >
> > Reason #1: on the clobber list, the AX register should be listed as
> > "a", not as "eax".
> >
> > Reason #2:
> >
> > - For operand %0, we say "use any register in the 'r' class as
> > write-only". (The class "r" means "general register".) And, at the end
> > of the inline assembly, we store EAX manually to that 'r' class
> > register. And we rely on the compiler to store that other general
> > register into Result.
> >
> > - We append the AX register (which should be spelled as "a") to the
> > clobber list. However, the "a" register is itself in the "general
> > register" class, and therefore this clobber list breaks the passage
> > that I quoted above, marked as [1]!
> >
> >
> >> @@ -76,11 +78,13 @@ InternalSyncDecrement (
> >>      "lock                \n\t"
> >>      "xadd    %%eax, %1   \n\t"
> >>      "dec     %%eax       \n\t"
> >> -    : "=a" (Result),           // %0
> >> +    "mov     %%eax, %0  \n\t"
> >> +    : "=r" (Result),           // %0
> >>        "+m" (*Value)            // %1
> >>      :                          // no inputs that aren't also outputs
> >>      : "memory",
> >> -      "cc"
> >> +      "cc",
> >> +      "eax"
> >>      );
> >>
> >>    return Result;
> >>
> >
> > (c) The patch doesn't update the X64 variant.
> >
> > I think we should be clear here that we are working around an XCODE bug.
> > Commit 8a94eb9283fa was different, because the code before that
> > violated the gcc documentation, and so the issue was in the code.
> >
> >
> > I'll comment more on your second email.
> >
> > Thanks,
> > Laszlo
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel