From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 988158221C for ; Tue, 21 Feb 2017 09:07:19 -0800 (PST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0803A7FB6A; Tue, 21 Feb 2017 17:07:20 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-47.phx2.redhat.com [10.3.116.47]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v1LH7H0L032447; Tue, 21 Feb 2017 12:07:17 -0500 To: Anthony PERARD References: <20161201152819.8341-1-anthony.perard@citrix.com> <53c67cb5-e947-8979-7738-288cc83f374b@redhat.com> <20161202160201.GA1848@perard.uk.xensource.com> <9c0c9f43-a297-179f-2d57-fa5d8fab3763@redhat.com> <20170221163922.GC1867@perard.uk.xensource.com> Cc: Jordan Justen , "Gao, Liming" , "Zhu, Yonghong" , Ard Biesheuvel , edk2-devel@ml01.01.org, Rebecca Cran , Konrad Rzeszutek Wilk From: Laszlo Ersek Message-ID: <97214320-6054-8034-7667-6edde4debd80@redhat.com> Date: Tue, 21 Feb 2017 18:07:15 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <20170221163922.GC1867@perard.uk.xensource.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 21 Feb 2017 17:07:20 +0000 (UTC) Subject: Re: [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5 X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Feb 2017 17:07:19 -0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit CC Rebecca & Konrad On 02/21/17 17:39, Anthony PERARD wrote: > On Sat, Dec 03, 2016 at 06:59:28PM +0100, Laszlo Ersek wrote: >> On 12/02/16 20:26, Laszlo Ersek wrote: >>> On 12/02/16 17:02, Anthony PERARD wrote: >>>> On Thu, Dec 01, 2016 at 07:43:24PM +0100, Laszlo Ersek wrote: >>>>> On 12/01/16 16:28, Anthony PERARD wrote: >>>>>> Hi, >>>>>> >>>>>> That might be only with the Xen part of OVMF but now that the GCC5 >>>>>> toolchains is used with my gcc (6.2.1 20160830, Arch Linux), OVMF fail >>>>>> to boot in Xen guests. >>>>>> >>>> [...] >>>>>> >>>>>> Removing the gcc option -flto in only the XenBusDxe module makes OVMF >>>>>> boot. >>>>>> >>>>>> While trying to debug that, I've added some debug prints (in this module >>>>>> and in XenPvBlkDxe), and the exception could change and become a "page >>>>>> fault" instead, or even an assert failure in the PrintLib, that was the >>>>>> ASSERT(Buffer != NULL) at I think >>>>>> MdePkg/Library/BasePrintLib/PrintLibInternal.c:366 >>>>>> >>>>>> Adding EFIAPI to internal functions in XenBusDxe makes things work >>>>>> again. My guest is that gcc would bypass (optimise) an exported >>>>>> functions and call directly an internal one but without reordering the >>>>>> arguments (EFIAPI vs nothing). >>>>>> >>>>>> Does that make sense? >>>>> >>>>> If "-b NOOPT" works for you, I'd prefer that as a temporary solution >>>>> (until the root cause is found and addressed) to the XenBusDxe patches. >>>> >>>> That works, using GCC49 (with gcc 6.2.1) works as well. >>>> >>>>> Hrpmf, wait a second, I do see something interesting: in this series you >>>>> *are* modifying APIs declared in a library class header (namely >>>>> "OvmfPkg/Include/Library/XenHypercallLib.h"). Such functions (public >>>>> libraries) *are* required to specify EFIAPI. >>>>> >>>>> What happens if you apply patch #1 only? >>>> >>>> With only XenHypercallLib changes, the error is the same. >>>> >>>> But I did find the minimum change needed, it envolve a function with a >>>> VA_LIST as argument. >>>> >>>> With only the following patch, OVMF works again. >>>> >>>> diff --git a/OvmfPkg/XenBusDxe/XenStore.c b/OvmfPkg/XenBusDxe/XenStore.c >>>> index 1666c4b..85b0956 100644 >>>> --- a/OvmfPkg/XenBusDxe/XenStore.c >>>> +++ b/OvmfPkg/XenBusDxe/XenStore.c >>>> @@ -1307,6 +1307,7 @@ XenStoreTransactionEnd ( >>>> } >>>> >>>> XENSTORE_STATUS >>>> +EFIAPI >>>> XenStoreVSPrint ( >>>> IN CONST XENSTORE_TRANSACTION *Transaction, >>>> IN CONST CHAR8 *DirectoryPath, >>>> diff --git a/OvmfPkg/XenBusDxe/XenStore.h b/OvmfPkg/XenBusDxe/XenStore.h >>>> index c9d4c65..33bb647 100644 >>>> --- a/OvmfPkg/XenBusDxe/XenStore.h >>>> +++ b/OvmfPkg/XenBusDxe/XenStore.h >>>> @@ -209,6 +209,7 @@ XenStoreSPrint ( >>>> indicating the type of write failure. >>>> **/ >>>> XENSTORE_STATUS >>>> +EFIAPI >>>> XenStoreVSPrint ( >>>> IN CONST XENSTORE_TRANSACTION *Transaction, >>>> IN CONST CHAR8 *DirectoryPath, >>>> IN CONST CHAR8 *Node, >>>> IN CONST CHAR8 *FormatString, >>>> IN VA_LIST Marker >>>> ); >>>> >>>> >>>> I think the exception happen when this function is called via >>>> XENBUS_PROTOCOL->XsPrintf() from XenPvBlockFrontInitialization() in >>>> OvmfPkg/XenPvBlkDxe/BlockFront.c >>>> >>> >>> It used to be a known requirement / limitation that all functions with >>> variable argument lists had to be EFIAPI, regardless of cross-module >>> use. However, commit 48d5f9a551a93acb45f272dda879b0ab5a504e36 changed >>> that, and varargs should "just work" now. I suspect this is a >>> __builtin_ms_va_* regression in gcc-6. Thank you for narrowing it down. >>> It might make sense to report a bug in the upstream gcc tracker. >>> >>> ... Oh wow, this is a known gcc bug! See: >>> >>> https://lists.01.org/pipermail/edk2-devel/2016-August/001018.html >>> >>> Upstream gcc BZ was >>> apparently solved for "Target Milestone: 6.3" (your version is 6.2.1). >>> So we'll either need a GCC6 toolchain in BaseTools that drops -flto, in >>> order to work around this gcc issue, or we'll have to ask gcc-6 users to >>> use at least gcc-6.3. >>> >>> Oh wait, gcc-6.3 hasn't been released yet. We need the BaseTools >>> workaround then. >> >> I think I got confused in parts of the above; I got some details wrong. >> Namely, commit 48d5f9a551a9 did not remove the requirement/limitation >> that all varargs functions have to be EFIAPI. Said commit only changed >> how the VA_*() macros would be implemented. >> >> The two caller functions of XenStoreVSPrint(), namely XenStoreSPrint() >> and XenBusXenStoreSPrint(), are varargs functions, but they are already >> EFIAPI. So the requirement/limitation (which was unaffected by >> 48d5f9a551a9) is actually satisfied / considered in XenBusDxe. >> >> The XenStoreVSPrint() function, which you identified as the breaking >> part, is *not* a varargs function itself, so it needn't be EFIAPI. It >> simply receives a VA_LIST parameter (which is "__builtin_ms_va_list", >> from commit 48d5f9a551a9), and (a) copies it with VA_COPY() for passing >> the copy to SPrintLengthAsciiFormat(), (b) passes the original parameter >> to AsciiVSPrint(). In turn both of those functions call the common >> BasePrintLibSPrintMarker() function. >> >> Comment says, >> >>> This is bug report that the specialized >>> __builtin_ms_va_{list,start,end,copy} builtins have stopped working >>> when -flto is used. They worked until gcc 5.3, both with or without >>> -flto. In gcc 6.1 with -flto, the canonical iterator __builtin_va_arg >>> ignores them and works on a sysv_va_list. To be precise, it's >>> __builtin_va_arg in the context of -flto that's broken, not the >>> specialized builtins. __builtin_va_arg has always been a polymorphic >>> builtin that changes its behavior based on the type of va_list it's >>> given as an argument. Without this polymorphic behavior, there's no >>> way to iterate over an ms_va_list. >> >> Apparently, when BasePrintLibSPrintMarker() finally calls VA_ARG() (== >> __builtin_va_arg(), from commit 48d5f9a551a9) on Marker / Marker2, with >> LTO enabled, __builtin_va_arg() fails to notice what context >> VaListMarker comes from: >> - __builtin_ms_va_start() in XenStoreSPrint() and XenBusXenStoreSPrint(), or >> - __builtin_ms_va_copy() in XenStoreVSPrint(). >> >> So I think we *are* being hit by gcc BZ#70955, and making >> XenStoreVSPrint() EFIAPI only masks the issue. Comment >> seems relevant: >> >>> The change with GCC 6 is that the builtins are now lowered during >>> link-time optimization rather than at compile-time. Thus the abi >>> selection bits are possibly not transfered correctly (type merging?). >>> I remember the business was quite ugly, but eventually we just miss to >>> properly transfer the function attribute. >> >> The end result for edk2 remains the same (= BaseTools should work around >> this gcc issue with a new GCC6 toolchain that drops -flto, unless >> gcc-6.3 is about to become available to users real quick). I just wanted >> to point out that my earlier statement "commit 48d5f9a551a9 had removed >> the need for varargs functions to be EFIAPI" was incorrect -- varargs >> functions still must be EFIAPI (and XenBusDxe conforms, see >> XenStoreSPrint() and XenBusXenStoreSPrint()). > > Hi Laszlo, > > Now that gcc 6.3 is out, the bug described in the thread strikes again. > Building OVMF with -flto result in Page-Fault or General Protection > fault, due to the way va_args are used in XenStoreVSPrint(). > > Also, now I've tried to build OVMF with gcc 5.4, same result, using > -flto result in a firmware that does not work. > > > I've tried to create a small programme that use the va_args in the same > way, and compiled-test it with different gcc (gcc 4.9.2, gcc 5.4, gcc > 6.3), then depending on the options use, it does not work or it works: > > Don't work (prog segv or wrong output): > gcc -o prog va_main.c va_test.c > gcc -o prog -flto va_main.c va_test.c > gcc -Os -o prog -flto va_main.c va_test.c > > Work: > gcc -Os -o prog va_main.c va_test.c > > I'll attach va_main.c and va_test.c. > > So, should I add EFIAPI to XenStoreVSPrint, as it is using VA_COPY? > Hm, please help me jog my memory... If I remember correctly, this is still a GCC bug, one that we suppressed for gcc-6.2 with your patch as follows: > commit 432f1d83f77acf92d52ef18d2cee6dbf7c5b9b86 > Author: Anthony PERARD > Date: Tue Dec 6 12:03:25 2016 +0000 > > OvmfPkg/build.sh: Use GCC49 toolchains with GCC 6.[0-2] > > The goal of the patch is to avoid using -flto with GCC 6.0 to 6.2. > > This is to workaround a GCC bug: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955 > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Anthony PERARD > Reviewed-by: Laszlo Ersek > Regression-tested-by: Laszlo Ersek > > diff --git a/OvmfPkg/build.sh b/OvmfPkg/build.sh > index 95fe8fb07647..b6e936056ca0 100755 > --- a/OvmfPkg/build.sh > +++ b/OvmfPkg/build.sh > @@ -102,7 +102,7 @@ case `uname` in > 4.8.*) > TARGET_TOOLS=GCC48 > ;; > - 4.9.*) > + 4.9.*|6.[0-2].*) > TARGET_TOOLS=GCC49 > ;; > *) Do I understand correctly that the gcc bug has not been fixed in gcc-6.3, and -- because we don't suppress it for gcc-6.3 as the above expression does not match -- it causes problems again? You also mention gcc-5.4 as problematic. I think we haven't received such reports about gcc-5 versions up to and including gcc-5.3 (that's why GCC5 is the default selection in "OvmfPkg/build.sh"). Do you mean that the gcc bug has now been "backported" from the gcc-6 series to the gcc-5 series (starting with gcc-5.4)? If that's the case, then I suggest flipping "OvmfPkg/build.sh" from black-listing gcc versions for -flto to white-listing. In other words, assume that -flto is generally broken with GCC, except for a few known versions: 5.0 through 5.3 inclusive. Those versions should trigger the use of the GCC5 toolchain, and everything else (5.4+, 6.*, 4.9.*) should use GCC49. I don't feel comfortable about adding EFIAPI to XenStoreVSPrint just because it takes a VA_LIST parameter -- note: it is *not* a varargs function itself! --; the same issue might hit elsewhere in the edk2 tree at any time, outside of OvmfPkg too. Would the gcc white-listing work for you? Note that the white-listing would practically undo Konrad's commit 2667ad40919a ("OvmfPkg/build.sh: Make GCC5 the default toolchain, catch GCC43 and earlier", 2016-11-23), but given the recent gcc developments (gcc-6.3 has not fixed the gcc bug, and the bug has even surfaced in gcc-5.4), I think it would be justified. Thanks Laszlo