From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <lersek@redhat.com>
Received-SPF: Pass (sender SPF authorized) identity=mailfrom;
 client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=lersek@redhat.com;
 receiver=edk2-devel@lists.01.org 
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 758F620337348
 for <edk2-devel@lists.01.org>; Wed,  7 Nov 2018 11:02:58 -0800 (PST)
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com
 [10.5.11.16])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id E2FE990B17;
 Wed,  7 Nov 2018 19:02:57 +0000 (UTC)
Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-90.rdu2.redhat.com
 [10.10.120.90])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 8ECFA68D24;
 Wed,  7 Nov 2018 19:02:56 +0000 (UTC)
To: "Ni, Ruiyu" <ruiyu.ni@intel.com>,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
 "Kinney, Michael D" <michael.d.kinney@intel.com>,
 "Gao, Liming" <liming.gao@intel.com>
References: <20181107040346.153720-1-ruiyu.ni@intel.com>
 <555b0649-a067-472e-2b81-b52457deb67b@redhat.com>
 <CAKv+Gu8F7rNLgToE16f9kQmgXd89dtPkHF0KFL8sH834G1kH6Q@mail.gmail.com>
 <734D49CCEBEEF84792F5B80ED585239D5BEF5A7F@SHSMSX104.ccr.corp.intel.com>
From: Laszlo Ersek <lersek@redhat.com>
Message-ID: <c27e4236-db33-8656-6850-2a48d7b24dcf@redhat.com>
Date: Wed, 7 Nov 2018 20:02:55 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <734D49CCEBEEF84792F5B80ED585239D5BEF5A7F@SHSMSX104.ccr.corp.intel.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.5.110.29]); Wed, 07 Nov 2018 19:02:58 +0000 (UTC)
Subject: Re: [PATCH] MdePkg/BaseSynchronizationLib XCODE: fix InternalSync[De|In]crement
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Nov 2018 19:02:59 -0000
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit

On 11/07/18 16:27, Ni, Ruiyu wrote:
> 
> 
>> -----Original Message-----
>> From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
>> Sent: Wednesday, November 7, 2018 10:46 PM
>> To: Laszlo Ersek <lersek@redhat.com>
>> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>; edk2-devel@lists.01.org; Kinney, Michael D
>> <michael.d.kinney@intel.com>; Gao, Liming <liming.gao@intel.com>
>> Subject: Re: [edk2] [PATCH] MdePkg/BaseSynchronizationLib XCODE: fix
>> InternalSync[De|In]crement
>>
>> On 7 November 2018 at 15:31, Laszlo Ersek <lersek@redhat.com> wrote:
>>> (+Andrew)
>>>
>>> Hi Ray,
>>>
>>> On 11/07/18 05:03, Ruiyu Ni wrote:
>>>> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1303
>>>>
>>>> XCODE disassembly code of InternalSyncDecrement with today's code is:
>>>>
>>>>   __asm__ __volatile__ (
>>>>     "movl    $1, %%eax  \n\t"
>>>>     "lock               \n\t"
>>>>     "xadd    %%eax, %1  \n\t"
>>>>     "inc     %%eax      \n\t"
>>>>     : "=a" (Result),          // %0
>>>>       "+m" (*Value)           // %1
>>>>     :                         // no inputs that aren't also outputs
>>>>     : "memory",
>>>>       "cc"
>>>>     );
>>>>
>>>>  0:       55      pushl   %ebp
>>>>  1:       89 e5   movl    %esp, %ebp
>>>>  3:       8b 45 08        movl    8(%ebp), %eax
>>>>  6:       b8 01 00 00 00  movl    $1, %eax
>>>>  b:       f0      lock
>>>>  c:       0f c1 00        xaddl   %eax, _InternalSyncIncrement(%eax)
>>>>  f:       40      incl    %eax
>>>> 10:       5d      popl    %ebp
>>>> 11:       c3      retl
>>>>
>>>> %EAX value retrieved in line #3 is overwritten in line #6.
>>>
>>> (a) This looks like an XCODE bug to me. The "=a" constraint on operand
>>> %0 means that Result should be set from eax/rax, and that this operand
>>> is "write only". Here's the gcc documentation:
>>>
>>> "The ordinary output operands must be write-only; GCC assumes that the
>>> values in these operands before the instruction are dead and need not
>>> be generated."
>>>
>>
>> Do we need to use early clobber here? "=&a"?
>>
>> It seems to me that without that, the compiler is free to use eax as input, since it
>> only assumes that eax will be set by the last instruction.
> 
> Ard,
> Thanks for pointing that.
> I found below wordings from GCC documents:
> "Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not
>  overlap an input. Otherwise, GCC may allocate the output operand in the same
>  register as an unrelated input operand, on the assumption that the assembler code
>  consumes its inputs before producing outputs. This assumption may be false if the
>  assembler code actually consists of more than one instruction."
> 
> I think '&' is just for this case.
> 
> @Laszlo,
> Do you agree that a simple patch to add '&' to both Ia32 and X64 version of
> InternalSync[De|In]crement is good enough?

I think so, and I applaud Ard for finding this in the documentation! It
seems like the exact thing we need.

I agree that my commit 8a94eb9283fa ("MdePkg/BaseSynchronizationLib: fix
XADD operands in GCC IA32/X64 assembly", 2018-09-26) was not complete,
due to missing the "&" constraint modifier. Please submit a patch that
adds it.

And, for the commit message, I also recommend:

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: 8a94eb9283fa09a30f5f06f0c12cf0ee4e14fbcf

I'll test the patch on my end.

(This is a prime example for a change (= a bugfix) that is permitted
during the hard feature freeze, BTW.)

Thanks!
Laszlo


> 
> I actually tried in XCODE5 toolchain and it works good.
> movl    8(%ebp), %eax
> becomes
> movl    8(%ebp), %ecx
> 
>>
>>
>>> Furthermore, if a register is named in any operand constraint (such as
>>> input, output, input-output), then it cannot, by definition, be listed
>>> in the "clobber list" -- it is *obvious* that the inline assembly will
>>> work with that register. In case of an output-only operand, it's
>>> obvious that the inline assembly will *overwrite* that register, and
>>> the compiler cannot assume *when* that overwrite will happen. Here's
>>> the gcc
>>> documentation:
>>>
>>> "You may not write a clobber description in a way that overlaps with
>>> an input or output operand.  For example, you may not have an operand
>>> describing a register class with one member if you mention that
>>> register in the clobber list.  Variables declared to live in specific
>>> registers [...] and used as asm input or output operands must have no
>>> part mentioned in the clobber description." [1]
>>>
>>> However, XCODE generates such code that depends on EAX as an *input*
>>> operand. That's clearly wrong and violates the constraints in the
>>> operand list. This is an XCODE bug.
>>>
>>> In fact: the situation is worse than that. In the commit message you
>>> spell out that the MOV instruction at offset 6 overwrites the value in
>>> EAX that was just loaded at offset 3. But this is just the small
>>> problem; the *large* problem is the generated XADD instruction itself,
>>> at offset 0xb and 0xc. The binary encoding is, from your commit message:
>>>
>>>   f0 0f c1 00
>>>
>>> and this is *exactly* the problem that my commit 8a94eb9283fa fixed
>>> for gcc! From my commit message:
>>>
>>>     >     439c:       f0 0f c1 00             lock xadd %eax,(%rax)
>>>
>>> Because, it makes *no sense* for XADD to use the AX register for
>>> *both* pointer-to-memory (i.e. address of the destination location
>>> that receives the sum) *and* as the other addend!
>>>
>>> In other words, regardless of how we fill the AX register up-front, an
>>> XADD instruction generated like this *cannot* be right.
>>>
>>>> The patch uses the clobber list to tell GCC that EAX is used in ASM.
>>>>
>>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>>> Signed-off-by: Ruiyu Ni <ruiyu.ni@intel.com>
>>>> Cc: Liming Gao <liming.gao@intel.com>
>>>> Cc: Michael Kinney <michael.d.kinney@intel.com>
>>>> Cc: Laszlo Ersek <lersek@redhat.com>
>>>> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
>>>> ---
>>>>  MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c | 12
>>>> ++++++++----
>>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
>>>> b/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
>>>> index af39bdeb51..0a985529fd 100644
>>>> --- a/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
>>>> +++ b/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
>>>> @@ -40,11 +40,13 @@ InternalSyncIncrement (
>>>>      "lock               \n\t"
>>>>      "xadd    %%eax, %1  \n\t"
>>>>      "inc     %%eax      \n\t"
>>>> -    : "=a" (Result),          // %0
>>>> +    "mov     %%eax, %0  \n\t"
>>>> +    : "=r" (Result),          // %0
>>>>        "+m" (*Value)           // %1
>>>>      :                         // no inputs that aren't also outputs
>>>>      : "memory",
>>>> -      "cc"
>>>> +      "cc",
>>>> +      "eax"
>>>>      );
>>>>
>>>>    return Result;
>>>
>>> (b) This change is invalid, for two separate reasons.
>>>
>>> Reason #1: on the clobber list, the AX register should be listed as
>>> "a", not as "eax".
>>>
>>> Reason #2:
>>>
>>> - For operand %0, we say "use any register in the 'r' class as
>>> write-only". (The class "r" means "general register".) And, at the end
>>> of the inline assembly, we store EAX manually to that 'r' class
>>> register. And we rely on the compiler to store that other general
>>> register into Result.
>>>
>>> - We append the AX register (which should be spelled as "a") to the
>>> clobber list. However, the "a" register is itself in the "general
>>> register" class, and therefore this clobber list breaks the passage
>>> that I quoted above, marked as [1]!
>>>
>>>
>>>> @@ -76,11 +78,13 @@ InternalSyncDecrement (
>>>>      "lock                \n\t"
>>>>      "xadd    %%eax, %1   \n\t"
>>>>      "dec     %%eax       \n\t"
>>>> -    : "=a" (Result),           // %0
>>>> +    "mov     %%eax, %0  \n\t"
>>>> +    : "=r" (Result),           // %0
>>>>        "+m" (*Value)            // %1
>>>>      :                          // no inputs that aren't also outputs
>>>>      : "memory",
>>>> -      "cc"
>>>> +      "cc",
>>>> +      "eax"
>>>>      );
>>>>
>>>>    return Result;
>>>>
>>>
>>> (c) The patch doesn't update the X64 variant.
>>>
>>> I think we should be clear here that we are working around an XCODE bug.
>>> Commit 8a94eb9283fa was different, because the code before that
>>> violated the gcc documentation, and so the issue was in the code.
>>>
>>>
>>> I'll comment more on your second email.
>>>
>>> Thanks,
>>> Laszlo
>>> _______________________________________________
>>> edk2-devel mailing list
>>> edk2-devel@lists.01.org
>>> https://lists.01.org/mailman/listinfo/edk2-devel