From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from SMTP02.CITRIX.COM (smtp02.citrix.com [66.165.176.63]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id CCE63821CB for ; Tue, 21 Feb 2017 09:53:05 -0800 (PST) X-IronPort-AV: E=Sophos;i="5.35,190,1484006400"; d="scan'208";a="417551670" Date: Tue, 21 Feb 2017 17:53:03 +0000 From: Anthony PERARD To: Laszlo Ersek CC: Jordan Justen , "Gao, Liming" , "Zhu, Yonghong" , Ard Biesheuvel , , Rebecca Cran , Konrad Rzeszutek Wilk Message-ID: <20170221175303.GD1867@perard.uk.xensource.com> References: <20161201152819.8341-1-anthony.perard@citrix.com> <53c67cb5-e947-8979-7738-288cc83f374b@redhat.com> <20161202160201.GA1848@perard.uk.xensource.com> <9c0c9f43-a297-179f-2d57-fa5d8fab3763@redhat.com> <20170221163922.GC1867@perard.uk.xensource.com> <97214320-6054-8034-7667-6edde4debd80@redhat.com> MIME-Version: 1.0 In-Reply-To: <97214320-6054-8034-7667-6edde4debd80@redhat.com> User-Agent: Mutt/1.7.2 (2016-11-26) Subject: Re: [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5 X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Feb 2017 17:53:06 -0000 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline On Tue, Feb 21, 2017 at 06:07:15PM +0100, Laszlo Ersek wrote: > CC Rebecca & Konrad > > On 02/21/17 17:39, Anthony PERARD wrote: > > On Sat, Dec 03, 2016 at 06:59:28PM +0100, Laszlo Ersek wrote: > >> On 12/02/16 20:26, Laszlo Ersek wrote: > >>> On 12/02/16 17:02, Anthony PERARD wrote: > >>>> On Thu, Dec 01, 2016 at 07:43:24PM +0100, Laszlo Ersek wrote: > >>>>> On 12/01/16 16:28, Anthony PERARD wrote: > >>>>>> Hi, > >>>>>> > >>>>>> That might be only with the Xen part of OVMF but now that the GCC5 > >>>>>> toolchains is used with my gcc (6.2.1 20160830, Arch Linux), OVMF fail > >>>>>> to boot in Xen guests. > >>>>>> > >>>> [...] > >>>>>> > >>>>>> Removing the gcc option -flto in only the XenBusDxe module makes OVMF > >>>>>> boot. > >>>>>> > >>>>>> While trying to debug that, I've added some debug prints (in this module > >>>>>> and in XenPvBlkDxe), and the exception could change and become a "page > >>>>>> fault" instead, or even an assert failure in the PrintLib, that was the > >>>>>> ASSERT(Buffer != NULL) at I think > >>>>>> MdePkg/Library/BasePrintLib/PrintLibInternal.c:366 > >>>>>> > >>>>>> Adding EFIAPI to internal functions in XenBusDxe makes things work > >>>>>> again. My guest is that gcc would bypass (optimise) an exported > >>>>>> functions and call directly an internal one but without reordering the > >>>>>> arguments (EFIAPI vs nothing). > >>>>>> > >>>>>> Does that make sense? > >>>>> > >>>>> If "-b NOOPT" works for you, I'd prefer that as a temporary solution > >>>>> (until the root cause is found and addressed) to the XenBusDxe patches. > >>>> > >>>> That works, using GCC49 (with gcc 6.2.1) works as well. > >>>> > >>>>> Hrpmf, wait a second, I do see something interesting: in this series you > >>>>> *are* modifying APIs declared in a library class header (namely > >>>>> "OvmfPkg/Include/Library/XenHypercallLib.h"). Such functions (public > >>>>> libraries) *are* required to specify EFIAPI. > >>>>> > >>>>> What happens if you apply patch #1 only? > >>>> > >>>> With only XenHypercallLib changes, the error is the same. > >>>> > >>>> But I did find the minimum change needed, it envolve a function with a > >>>> VA_LIST as argument. > >>>> > >>>> With only the following patch, OVMF works again. > >>>> > >>>> diff --git a/OvmfPkg/XenBusDxe/XenStore.c b/OvmfPkg/XenBusDxe/XenStore.c > >>>> index 1666c4b..85b0956 100644 > >>>> --- a/OvmfPkg/XenBusDxe/XenStore.c > >>>> +++ b/OvmfPkg/XenBusDxe/XenStore.c > >>>> @@ -1307,6 +1307,7 @@ XenStoreTransactionEnd ( > >>>> } > >>>> > >>>> XENSTORE_STATUS > >>>> +EFIAPI > >>>> XenStoreVSPrint ( > >>>> IN CONST XENSTORE_TRANSACTION *Transaction, > >>>> IN CONST CHAR8 *DirectoryPath, > >>>> diff --git a/OvmfPkg/XenBusDxe/XenStore.h b/OvmfPkg/XenBusDxe/XenStore.h > >>>> index c9d4c65..33bb647 100644 > >>>> --- a/OvmfPkg/XenBusDxe/XenStore.h > >>>> +++ b/OvmfPkg/XenBusDxe/XenStore.h > >>>> @@ -209,6 +209,7 @@ XenStoreSPrint ( > >>>> indicating the type of write failure. > >>>> **/ > >>>> XENSTORE_STATUS > >>>> +EFIAPI > >>>> XenStoreVSPrint ( > >>>> IN CONST XENSTORE_TRANSACTION *Transaction, > >>>> IN CONST CHAR8 *DirectoryPath, > >>>> IN CONST CHAR8 *Node, > >>>> IN CONST CHAR8 *FormatString, > >>>> IN VA_LIST Marker > >>>> ); > >>>> > >>>> > >>>> I think the exception happen when this function is called via > >>>> XENBUS_PROTOCOL->XsPrintf() from XenPvBlockFrontInitialization() in > >>>> OvmfPkg/XenPvBlkDxe/BlockFront.c > >>>> > >>> > >>> It used to be a known requirement / limitation that all functions with > >>> variable argument lists had to be EFIAPI, regardless of cross-module > >>> use. However, commit 48d5f9a551a93acb45f272dda879b0ab5a504e36 changed > >>> that, and varargs should "just work" now. I suspect this is a > >>> __builtin_ms_va_* regression in gcc-6. Thank you for narrowing it down. > >>> It might make sense to report a bug in the upstream gcc tracker. > >>> > >>> ... Oh wow, this is a known gcc bug! See: > >>> > >>> https://lists.01.org/pipermail/edk2-devel/2016-August/001018.html > >>> > >>> Upstream gcc BZ was > >>> apparently solved for "Target Milestone: 6.3" (your version is 6.2.1). > >>> So we'll either need a GCC6 toolchain in BaseTools that drops -flto, in > >>> order to work around this gcc issue, or we'll have to ask gcc-6 users to > >>> use at least gcc-6.3. > >>> > >>> Oh wait, gcc-6.3 hasn't been released yet. We need the BaseTools > >>> workaround then. > >> > >> I think I got confused in parts of the above; I got some details wrong. > >> Namely, commit 48d5f9a551a9 did not remove the requirement/limitation > >> that all varargs functions have to be EFIAPI. Said commit only changed > >> how the VA_*() macros would be implemented. > >> > >> The two caller functions of XenStoreVSPrint(), namely XenStoreSPrint() > >> and XenBusXenStoreSPrint(), are varargs functions, but they are already > >> EFIAPI. So the requirement/limitation (which was unaffected by > >> 48d5f9a551a9) is actually satisfied / considered in XenBusDxe. > >> > >> The XenStoreVSPrint() function, which you identified as the breaking > >> part, is *not* a varargs function itself, so it needn't be EFIAPI. It > >> simply receives a VA_LIST parameter (which is "__builtin_ms_va_list", > >> from commit 48d5f9a551a9), and (a) copies it with VA_COPY() for passing > >> the copy to SPrintLengthAsciiFormat(), (b) passes the original parameter > >> to AsciiVSPrint(). In turn both of those functions call the common > >> BasePrintLibSPrintMarker() function. > >> > >> Comment says, > >> > >>> This is bug report that the specialized > >>> __builtin_ms_va_{list,start,end,copy} builtins have stopped working > >>> when -flto is used. They worked until gcc 5.3, both with or without > >>> -flto. In gcc 6.1 with -flto, the canonical iterator __builtin_va_arg > >>> ignores them and works on a sysv_va_list. To be precise, it's > >>> __builtin_va_arg in the context of -flto that's broken, not the > >>> specialized builtins. __builtin_va_arg has always been a polymorphic > >>> builtin that changes its behavior based on the type of va_list it's > >>> given as an argument. Without this polymorphic behavior, there's no > >>> way to iterate over an ms_va_list. > >> > >> Apparently, when BasePrintLibSPrintMarker() finally calls VA_ARG() (== > >> __builtin_va_arg(), from commit 48d5f9a551a9) on Marker / Marker2, with > >> LTO enabled, __builtin_va_arg() fails to notice what context > >> VaListMarker comes from: > >> - __builtin_ms_va_start() in XenStoreSPrint() and XenBusXenStoreSPrint(), or > >> - __builtin_ms_va_copy() in XenStoreVSPrint(). > >> > >> So I think we *are* being hit by gcc BZ#70955, and making > >> XenStoreVSPrint() EFIAPI only masks the issue. Comment > >> seems relevant: > >> > >>> The change with GCC 6 is that the builtins are now lowered during > >>> link-time optimization rather than at compile-time. Thus the abi > >>> selection bits are possibly not transfered correctly (type merging?). > >>> I remember the business was quite ugly, but eventually we just miss to > >>> properly transfer the function attribute. > >> > >> The end result for edk2 remains the same (= BaseTools should work around > >> this gcc issue with a new GCC6 toolchain that drops -flto, unless > >> gcc-6.3 is about to become available to users real quick). I just wanted > >> to point out that my earlier statement "commit 48d5f9a551a9 had removed > >> the need for varargs functions to be EFIAPI" was incorrect -- varargs > >> functions still must be EFIAPI (and XenBusDxe conforms, see > >> XenStoreSPrint() and XenBusXenStoreSPrint()). > > > > Hi Laszlo, > > > > Now that gcc 6.3 is out, the bug described in the thread strikes again. > > Building OVMF with -flto result in Page-Fault or General Protection > > fault, due to the way va_args are used in XenStoreVSPrint(). > > > > Also, now I've tried to build OVMF with gcc 5.4, same result, using > > -flto result in a firmware that does not work. > > > > > > I've tried to create a small programme that use the va_args in the same > > way, and compiled-test it with different gcc (gcc 4.9.2, gcc 5.4, gcc > > 6.3), then depending on the options use, it does not work or it works: > > > > Don't work (prog segv or wrong output): > > gcc -o prog va_main.c va_test.c > > gcc -o prog -flto va_main.c va_test.c > > gcc -Os -o prog -flto va_main.c va_test.c > > > > Work: > > gcc -Os -o prog va_main.c va_test.c > > > > I'll attach va_main.c and va_test.c. > > > > So, should I add EFIAPI to XenStoreVSPrint, as it is using VA_COPY? > > > > Hm, please help me jog my memory... > > If I remember correctly, this is still a GCC bug, one that we suppressed for gcc-6.2 with your patch as follows: Yes. > > commit 432f1d83f77acf92d52ef18d2cee6dbf7c5b9b86 > > Author: Anthony PERARD > > Date: Tue Dec 6 12:03:25 2016 +0000 > > > > OvmfPkg/build.sh: Use GCC49 toolchains with GCC 6.[0-2] > > > > The goal of the patch is to avoid using -flto with GCC 6.0 to 6.2. > > > > This is to workaround a GCC bug: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955 > > > > Contributed-under: TianoCore Contribution Agreement 1.0 > > Signed-off-by: Anthony PERARD > > Reviewed-by: Laszlo Ersek > > Regression-tested-by: Laszlo Ersek > > > > diff --git a/OvmfPkg/build.sh b/OvmfPkg/build.sh > > index 95fe8fb07647..b6e936056ca0 100755 > > --- a/OvmfPkg/build.sh > > +++ b/OvmfPkg/build.sh > > @@ -102,7 +102,7 @@ case `uname` in > > 4.8.*) > > TARGET_TOOLS=GCC48 > > ;; > > - 4.9.*) > > + 4.9.*|6.[0-2].*) > > TARGET_TOOLS=GCC49 > > ;; > > *) > > Do I understand correctly that the gcc bug has not been fixed in gcc-6.3, and -- because we don't suppress it for gcc-6.3 as the above expression does not match -- it causes problems again? The bug describe in the GCC bugzilla is probably fix, but the test-case does not make use of __builtin_ms_va_copy. > You also mention gcc-5.4 as problematic. I think we haven't received such reports about gcc-5 versions up to and including gcc-5.3 (that's why GCC5 is the default selection in "OvmfPkg/build.sh"). Do you mean that the gcc bug has now been "backported" from the gcc-6 series to the gcc-5 series (starting with gcc-5.4)? I don't know the state of gcc-5.0 to gcc-5.3, I have never tested -flto with gcc-5.x (until now), I would say they are also problematic until proven otherwise. > If that's the case, then I suggest flipping "OvmfPkg/build.sh" from black-listing gcc versions for -flto to white-listing. In other words, assume that -flto is generally broken with GCC, except for a few known versions: 5.0 through 5.3 inclusive. Those versions should trigger the use of the GCC5 toolchain, and everything else (5.4+, 6.*, 4.9.*) should use GCC49. > > I don't feel comfortable about adding EFIAPI to XenStoreVSPrint just because it takes a VA_LIST parameter -- note: it is *not* a varargs function itself! --; the same issue might hit elsewhere in the edk2 tree at any time, outside of OvmfPkg too. >>From the different tests I've done, I feel more like VA_COPY might be the issue, but I don't know how __builtin_ms_va_* are supposed to be used. > Would the gcc white-listing work for you? > > Note that the white-listing would practically undo Konrad's commit 2667ad40919a ("OvmfPkg/build.sh: Make GCC5 the default toolchain, catch GCC43 and earlier", 2016-11-23), but given the recent gcc developments (gcc-6.3 has not fixed the gcc bug, and the bug has even surfaced in gcc-5.4), I think it would be justified. Do be honnest, I don't think the toolchain GCC5 has ever been tested with gcc-5.x and the module XenBusDxe. I think most people that want to start OVMF under Xen are likely to build it with gcc-4.9 or already had gcc-6.x when OVMF switch to the GCC5 toolchain by default. -- Anthony PERARD