From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=lersek@redhat.com; receiver=edk2-devel@lists.01.org Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 758F620337348 for ; Wed, 7 Nov 2018 11:02:58 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E2FE990B17; Wed, 7 Nov 2018 19:02:57 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-90.rdu2.redhat.com [10.10.120.90]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8ECFA68D24; Wed, 7 Nov 2018 19:02:56 +0000 (UTC) To: "Ni, Ruiyu" , Ard Biesheuvel Cc: "edk2-devel@lists.01.org" , "Kinney, Michael D" , "Gao, Liming" References: <20181107040346.153720-1-ruiyu.ni@intel.com> <555b0649-a067-472e-2b81-b52457deb67b@redhat.com> <734D49CCEBEEF84792F5B80ED585239D5BEF5A7F@SHSMSX104.ccr.corp.intel.com> From: Laszlo Ersek Message-ID: Date: Wed, 7 Nov 2018 20:02:55 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <734D49CCEBEEF84792F5B80ED585239D5BEF5A7F@SHSMSX104.ccr.corp.intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 07 Nov 2018 19:02:58 +0000 (UTC) Subject: Re: [PATCH] MdePkg/BaseSynchronizationLib XCODE: fix InternalSync[De|In]crement X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2018 19:02:59 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit On 11/07/18 16:27, Ni, Ruiyu wrote: > > >> -----Original Message----- >> From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] >> Sent: Wednesday, November 7, 2018 10:46 PM >> To: Laszlo Ersek >> Cc: Ni, Ruiyu ; edk2-devel@lists.01.org; Kinney, Michael D >> ; Gao, Liming >> Subject: Re: [edk2] [PATCH] MdePkg/BaseSynchronizationLib XCODE: fix >> InternalSync[De|In]crement >> >> On 7 November 2018 at 15:31, Laszlo Ersek wrote: >>> (+Andrew) >>> >>> Hi Ray, >>> >>> On 11/07/18 05:03, Ruiyu Ni wrote: >>>> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1303 >>>> >>>> XCODE disassembly code of InternalSyncDecrement with today's code is: >>>> >>>> __asm__ __volatile__ ( >>>> "movl $1, %%eax \n\t" >>>> "lock \n\t" >>>> "xadd %%eax, %1 \n\t" >>>> "inc %%eax \n\t" >>>> : "=a" (Result), // %0 >>>> "+m" (*Value) // %1 >>>> : // no inputs that aren't also outputs >>>> : "memory", >>>> "cc" >>>> ); >>>> >>>> 0: 55 pushl %ebp >>>> 1: 89 e5 movl %esp, %ebp >>>> 3: 8b 45 08 movl 8(%ebp), %eax >>>> 6: b8 01 00 00 00 movl $1, %eax >>>> b: f0 lock >>>> c: 0f c1 00 xaddl %eax, _InternalSyncIncrement(%eax) >>>> f: 40 incl %eax >>>> 10: 5d popl %ebp >>>> 11: c3 retl >>>> >>>> %EAX value retrieved in line #3 is overwritten in line #6. >>> >>> (a) This looks like an XCODE bug to me. The "=a" constraint on operand >>> %0 means that Result should be set from eax/rax, and that this operand >>> is "write only". Here's the gcc documentation: >>> >>> "The ordinary output operands must be write-only; GCC assumes that the >>> values in these operands before the instruction are dead and need not >>> be generated." >>> >> >> Do we need to use early clobber here? "=&a"? >> >> It seems to me that without that, the compiler is free to use eax as input, since it >> only assumes that eax will be set by the last instruction. > > Ard, > Thanks for pointing that. > I found below wordings from GCC documents: > "Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not > overlap an input. Otherwise, GCC may allocate the output operand in the same > register as an unrelated input operand, on the assumption that the assembler code > consumes its inputs before producing outputs. This assumption may be false if the > assembler code actually consists of more than one instruction." > > I think '&' is just for this case. > > @Laszlo, > Do you agree that a simple patch to add '&' to both Ia32 and X64 version of > InternalSync[De|In]crement is good enough? I think so, and I applaud Ard for finding this in the documentation! It seems like the exact thing we need. I agree that my commit 8a94eb9283fa ("MdePkg/BaseSynchronizationLib: fix XADD operands in GCC IA32/X64 assembly", 2018-09-26) was not complete, due to missing the "&" constraint modifier. Please submit a patch that adds it. And, for the commit message, I also recommend: Suggested-by: Ard Biesheuvel Fixes: 8a94eb9283fa09a30f5f06f0c12cf0ee4e14fbcf I'll test the patch on my end. (This is a prime example for a change (= a bugfix) that is permitted during the hard feature freeze, BTW.) Thanks! Laszlo > > I actually tried in XCODE5 toolchain and it works good. > movl 8(%ebp), %eax > becomes > movl 8(%ebp), %ecx > >> >> >>> Furthermore, if a register is named in any operand constraint (such as >>> input, output, input-output), then it cannot, by definition, be listed >>> in the "clobber list" -- it is *obvious* that the inline assembly will >>> work with that register. In case of an output-only operand, it's >>> obvious that the inline assembly will *overwrite* that register, and >>> the compiler cannot assume *when* that overwrite will happen. Here's >>> the gcc >>> documentation: >>> >>> "You may not write a clobber description in a way that overlaps with >>> an input or output operand. For example, you may not have an operand >>> describing a register class with one member if you mention that >>> register in the clobber list. Variables declared to live in specific >>> registers [...] and used as asm input or output operands must have no >>> part mentioned in the clobber description." [1] >>> >>> However, XCODE generates such code that depends on EAX as an *input* >>> operand. That's clearly wrong and violates the constraints in the >>> operand list. This is an XCODE bug. >>> >>> In fact: the situation is worse than that. In the commit message you >>> spell out that the MOV instruction at offset 6 overwrites the value in >>> EAX that was just loaded at offset 3. But this is just the small >>> problem; the *large* problem is the generated XADD instruction itself, >>> at offset 0xb and 0xc. The binary encoding is, from your commit message: >>> >>> f0 0f c1 00 >>> >>> and this is *exactly* the problem that my commit 8a94eb9283fa fixed >>> for gcc! From my commit message: >>> >>> > 439c: f0 0f c1 00 lock xadd %eax,(%rax) >>> >>> Because, it makes *no sense* for XADD to use the AX register for >>> *both* pointer-to-memory (i.e. address of the destination location >>> that receives the sum) *and* as the other addend! >>> >>> In other words, regardless of how we fill the AX register up-front, an >>> XADD instruction generated like this *cannot* be right. >>> >>>> The patch uses the clobber list to tell GCC that EAX is used in ASM. >>>> >>>> Contributed-under: TianoCore Contribution Agreement 1.1 >>>> Signed-off-by: Ruiyu Ni >>>> Cc: Liming Gao >>>> Cc: Michael Kinney >>>> Cc: Laszlo Ersek >>>> Cc: Philippe Mathieu-Daudé >>>> --- >>>> MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c | 12 >>>> ++++++++---- >>>> 1 file changed, 8 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c >>>> b/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c >>>> index af39bdeb51..0a985529fd 100644 >>>> --- a/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c >>>> +++ b/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c >>>> @@ -40,11 +40,13 @@ InternalSyncIncrement ( >>>> "lock \n\t" >>>> "xadd %%eax, %1 \n\t" >>>> "inc %%eax \n\t" >>>> - : "=a" (Result), // %0 >>>> + "mov %%eax, %0 \n\t" >>>> + : "=r" (Result), // %0 >>>> "+m" (*Value) // %1 >>>> : // no inputs that aren't also outputs >>>> : "memory", >>>> - "cc" >>>> + "cc", >>>> + "eax" >>>> ); >>>> >>>> return Result; >>> >>> (b) This change is invalid, for two separate reasons. >>> >>> Reason #1: on the clobber list, the AX register should be listed as >>> "a", not as "eax". >>> >>> Reason #2: >>> >>> - For operand %0, we say "use any register in the 'r' class as >>> write-only". (The class "r" means "general register".) And, at the end >>> of the inline assembly, we store EAX manually to that 'r' class >>> register. And we rely on the compiler to store that other general >>> register into Result. >>> >>> - We append the AX register (which should be spelled as "a") to the >>> clobber list. However, the "a" register is itself in the "general >>> register" class, and therefore this clobber list breaks the passage >>> that I quoted above, marked as [1]! >>> >>> >>>> @@ -76,11 +78,13 @@ InternalSyncDecrement ( >>>> "lock \n\t" >>>> "xadd %%eax, %1 \n\t" >>>> "dec %%eax \n\t" >>>> - : "=a" (Result), // %0 >>>> + "mov %%eax, %0 \n\t" >>>> + : "=r" (Result), // %0 >>>> "+m" (*Value) // %1 >>>> : // no inputs that aren't also outputs >>>> : "memory", >>>> - "cc" >>>> + "cc", >>>> + "eax" >>>> ); >>>> >>>> return Result; >>>> >>> >>> (c) The patch doesn't update the X64 variant. >>> >>> I think we should be clear here that we are working around an XCODE bug. >>> Commit 8a94eb9283fa was different, because the code before that >>> violated the gcc documentation, and so the issue was in the code. >>> >>> >>> I'll comment more on your second email. >>> >>> Thanks, >>> Laszlo >>> _______________________________________________ >>> edk2-devel mailing list >>> edk2-devel@lists.01.org >>> https://lists.01.org/mailman/listinfo/edk2-devel