From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from SMTP02.CITRIX.COM (smtp02.citrix.com [66.165.176.63]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id DEAAC821CD for ; Tue, 21 Feb 2017 08:44:14 -0800 (PST) X-IronPort-AV: E=Sophos;i="5.35,190,1484006400"; d="c'?scan'208";a="417529579" Date: Tue, 21 Feb 2017 16:39:22 +0000 From: Anthony PERARD To: Laszlo Ersek CC: Jordan Justen , "Gao, Liming" , "Zhu, Yonghong" , Ard Biesheuvel , Message-ID: <20170221163922.GC1867@perard.uk.xensource.com> References: <20161201152819.8341-1-anthony.perard@citrix.com> <53c67cb5-e947-8979-7738-288cc83f374b@redhat.com> <20161202160201.GA1848@perard.uk.xensource.com> <9c0c9f43-a297-179f-2d57-fa5d8fab3763@redhat.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.7.2 (2016-11-26) X-Content-Filtered-By: Mailman/MimeDel 2.1.21 Subject: Re: [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5 X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Feb 2017 16:44:15 -0000 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline On Sat, Dec 03, 2016 at 06:59:28PM +0100, Laszlo Ersek wrote: > On 12/02/16 20:26, Laszlo Ersek wrote: > > On 12/02/16 17:02, Anthony PERARD wrote: > >> On Thu, Dec 01, 2016 at 07:43:24PM +0100, Laszlo Ersek wrote: > >>> On 12/01/16 16:28, Anthony PERARD wrote: > >>>> Hi, > >>>> > >>>> That might be only with the Xen part of OVMF but now that the GCC5 > >>>> toolchains is used with my gcc (6.2.1 20160830, Arch Linux), OVMF fail > >>>> to boot in Xen guests. > >>>> > >> [...] > >>>> > >>>> Removing the gcc option -flto in only the XenBusDxe module makes OVMF > >>>> boot. > >>>> > >>>> While trying to debug that, I've added some debug prints (in this module > >>>> and in XenPvBlkDxe), and the exception could change and become a "page > >>>> fault" instead, or even an assert failure in the PrintLib, that was the > >>>> ASSERT(Buffer != NULL) at I think > >>>> MdePkg/Library/BasePrintLib/PrintLibInternal.c:366 > >>>> > >>>> Adding EFIAPI to internal functions in XenBusDxe makes things work > >>>> again. My guest is that gcc would bypass (optimise) an exported > >>>> functions and call directly an internal one but without reordering the > >>>> arguments (EFIAPI vs nothing). > >>>> > >>>> Does that make sense? > >>> > >>> If "-b NOOPT" works for you, I'd prefer that as a temporary solution > >>> (until the root cause is found and addressed) to the XenBusDxe patches. > >> > >> That works, using GCC49 (with gcc 6.2.1) works as well. > >> > >>> Hrpmf, wait a second, I do see something interesting: in this series you > >>> *are* modifying APIs declared in a library class header (namely > >>> "OvmfPkg/Include/Library/XenHypercallLib.h"). Such functions (public > >>> libraries) *are* required to specify EFIAPI. > >>> > >>> What happens if you apply patch #1 only? > >> > >> With only XenHypercallLib changes, the error is the same. > >> > >> But I did find the minimum change needed, it envolve a function with a > >> VA_LIST as argument. > >> > >> With only the following patch, OVMF works again. > >> > >> diff --git a/OvmfPkg/XenBusDxe/XenStore.c b/OvmfPkg/XenBusDxe/XenStore.c > >> index 1666c4b..85b0956 100644 > >> --- a/OvmfPkg/XenBusDxe/XenStore.c > >> +++ b/OvmfPkg/XenBusDxe/XenStore.c > >> @@ -1307,6 +1307,7 @@ XenStoreTransactionEnd ( > >> } > >> > >> XENSTORE_STATUS > >> +EFIAPI > >> XenStoreVSPrint ( > >> IN CONST XENSTORE_TRANSACTION *Transaction, > >> IN CONST CHAR8 *DirectoryPath, > >> diff --git a/OvmfPkg/XenBusDxe/XenStore.h b/OvmfPkg/XenBusDxe/XenStore.h > >> index c9d4c65..33bb647 100644 > >> --- a/OvmfPkg/XenBusDxe/XenStore.h > >> +++ b/OvmfPkg/XenBusDxe/XenStore.h > >> @@ -209,6 +209,7 @@ XenStoreSPrint ( > >> indicating the type of write failure. > >> **/ > >> XENSTORE_STATUS > >> +EFIAPI > >> XenStoreVSPrint ( > >> IN CONST XENSTORE_TRANSACTION *Transaction, > >> IN CONST CHAR8 *DirectoryPath, > >> IN CONST CHAR8 *Node, > >> IN CONST CHAR8 *FormatString, > >> IN VA_LIST Marker > >> ); > >> > >> > >> I think the exception happen when this function is called via > >> XENBUS_PROTOCOL->XsPrintf() from XenPvBlockFrontInitialization() in > >> OvmfPkg/XenPvBlkDxe/BlockFront.c > >> > > > > It used to be a known requirement / limitation that all functions with > > variable argument lists had to be EFIAPI, regardless of cross-module > > use. However, commit 48d5f9a551a93acb45f272dda879b0ab5a504e36 changed > > that, and varargs should "just work" now. I suspect this is a > > __builtin_ms_va_* regression in gcc-6. Thank you for narrowing it down. > > It might make sense to report a bug in the upstream gcc tracker. > > > > ... Oh wow, this is a known gcc bug! See: > > > > https://lists.01.org/pipermail/edk2-devel/2016-August/001018.html > > > > Upstream gcc BZ was > > apparently solved for "Target Milestone: 6.3" (your version is 6.2.1). > > So we'll either need a GCC6 toolchain in BaseTools that drops -flto, in > > order to work around this gcc issue, or we'll have to ask gcc-6 users to > > use at least gcc-6.3. > > > > Oh wait, gcc-6.3 hasn't been released yet. We need the BaseTools > > workaround then. > > I think I got confused in parts of the above; I got some details wrong. > Namely, commit 48d5f9a551a9 did not remove the requirement/limitation > that all varargs functions have to be EFIAPI. Said commit only changed > how the VA_*() macros would be implemented. > > The two caller functions of XenStoreVSPrint(), namely XenStoreSPrint() > and XenBusXenStoreSPrint(), are varargs functions, but they are already > EFIAPI. So the requirement/limitation (which was unaffected by > 48d5f9a551a9) is actually satisfied / considered in XenBusDxe. > > The XenStoreVSPrint() function, which you identified as the breaking > part, is *not* a varargs function itself, so it needn't be EFIAPI. It > simply receives a VA_LIST parameter (which is "__builtin_ms_va_list", > from commit 48d5f9a551a9), and (a) copies it with VA_COPY() for passing > the copy to SPrintLengthAsciiFormat(), (b) passes the original parameter > to AsciiVSPrint(). In turn both of those functions call the common > BasePrintLibSPrintMarker() function. > > Comment says, > > > This is bug report that the specialized > > __builtin_ms_va_{list,start,end,copy} builtins have stopped working > > when -flto is used. They worked until gcc 5.3, both with or without > > -flto. In gcc 6.1 with -flto, the canonical iterator __builtin_va_arg > > ignores them and works on a sysv_va_list. To be precise, it's > > __builtin_va_arg in the context of -flto that's broken, not the > > specialized builtins. __builtin_va_arg has always been a polymorphic > > builtin that changes its behavior based on the type of va_list it's > > given as an argument. Without this polymorphic behavior, there's no > > way to iterate over an ms_va_list. > > Apparently, when BasePrintLibSPrintMarker() finally calls VA_ARG() (== > __builtin_va_arg(), from commit 48d5f9a551a9) on Marker / Marker2, with > LTO enabled, __builtin_va_arg() fails to notice what context > VaListMarker comes from: > - __builtin_ms_va_start() in XenStoreSPrint() and XenBusXenStoreSPrint(), or > - __builtin_ms_va_copy() in XenStoreVSPrint(). > > So I think we *are* being hit by gcc BZ#70955, and making > XenStoreVSPrint() EFIAPI only masks the issue. Comment > seems relevant: > > > The change with GCC 6 is that the builtins are now lowered during > > link-time optimization rather than at compile-time. Thus the abi > > selection bits are possibly not transfered correctly (type merging?). > > I remember the business was quite ugly, but eventually we just miss to > > properly transfer the function attribute. > > The end result for edk2 remains the same (= BaseTools should work around > this gcc issue with a new GCC6 toolchain that drops -flto, unless > gcc-6.3 is about to become available to users real quick). I just wanted > to point out that my earlier statement "commit 48d5f9a551a9 had removed > the need for varargs functions to be EFIAPI" was incorrect -- varargs > functions still must be EFIAPI (and XenBusDxe conforms, see > XenStoreSPrint() and XenBusXenStoreSPrint()). Hi Laszlo, Now that gcc 6.3 is out, the bug described in the thread strikes again. Building OVMF with -flto result in Page-Fault or General Protection fault, due to the way va_args are used in XenStoreVSPrint(). Also, now I've tried to build OVMF with gcc 5.4, same result, using -flto result in a firmware that does not work. I've tried to create a small programme that use the va_args in the same way, and compiled-test it with different gcc (gcc 4.9.2, gcc 5.4, gcc 6.3), then depending on the options use, it does not work or it works: Don't work (prog segv or wrong output): gcc -o prog va_main.c va_test.c gcc -o prog -flto va_main.c va_test.c gcc -Os -o prog -flto va_main.c va_test.c Work: gcc -Os -o prog va_main.c va_test.c I'll attach va_main.c and va_test.c. So, should I add EFIAPI to XenStoreVSPrint, as it is using VA_COPY? -- Anthony PERARD