public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Anthony PERARD <anthony.perard@citrix.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>,
	"Gao, Liming" <liming.gao@intel.com>,
	"Zhu, Yonghong" <yonghong.zhu@intel.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	<edk2-devel@ml01.01.org>, Rebecca Cran <rebecca@bluestop.org>,
	Konrad Rzeszutek Wilk <konrad@kernel.org>
Subject: Re: [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5
Date: Tue, 21 Feb 2017 17:53:03 +0000	[thread overview]
Message-ID: <20170221175303.GD1867@perard.uk.xensource.com> (raw)
In-Reply-To: <97214320-6054-8034-7667-6edde4debd80@redhat.com>

On Tue, Feb 21, 2017 at 06:07:15PM +0100, Laszlo Ersek wrote:
> CC Rebecca & Konrad
> 
> On 02/21/17 17:39, Anthony PERARD wrote:
> > On Sat, Dec 03, 2016 at 06:59:28PM +0100, Laszlo Ersek wrote:
> >> On 12/02/16 20:26, Laszlo Ersek wrote:
> >>> On 12/02/16 17:02, Anthony PERARD wrote:
> >>>> On Thu, Dec 01, 2016 at 07:43:24PM +0100, Laszlo Ersek wrote:
> >>>>> On 12/01/16 16:28, Anthony PERARD wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> That might be only with the Xen part of OVMF but now that the GCC5
> >>>>>> toolchains is used with my gcc (6.2.1 20160830, Arch Linux), OVMF fail
> >>>>>> to boot in Xen guests.
> >>>>>>
> >>>> [...]
> >>>>>>
> >>>>>> Removing the gcc option -flto in only the XenBusDxe module makes OVMF
> >>>>>> boot.
> >>>>>>
> >>>>>> While trying to debug that, I've added some debug prints (in this module
> >>>>>> and in XenPvBlkDxe), and the exception could change and become a "page
> >>>>>> fault" instead, or even an assert failure in the PrintLib, that was the
> >>>>>> ASSERT(Buffer != NULL) at I think
> >>>>>> MdePkg/Library/BasePrintLib/PrintLibInternal.c:366
> >>>>>>
> >>>>>> Adding EFIAPI to internal functions in XenBusDxe makes things work
> >>>>>> again.  My guest is that gcc would bypass (optimise) an exported
> >>>>>> functions and call directly an internal one but without reordering the
> >>>>>> arguments (EFIAPI vs nothing).
> >>>>>>
> >>>>>> Does that make sense?
> >>>>>
> >>>>> If "-b NOOPT" works for you, I'd prefer that as a temporary solution
> >>>>> (until the root cause is found and addressed) to the XenBusDxe patches.
> >>>>
> >>>> That works, using GCC49 (with gcc 6.2.1) works as well.
> >>>>
> >>>>> Hrpmf, wait a second, I do see something interesting: in this series you
> >>>>> *are* modifying APIs declared in a library class header (namely
> >>>>> "OvmfPkg/Include/Library/XenHypercallLib.h"). Such functions (public
> >>>>> libraries) *are* required to specify EFIAPI.
> >>>>>
> >>>>> What happens if you apply patch #1 only?
> >>>>
> >>>> With only XenHypercallLib changes, the error is the same.
> >>>>
> >>>> But I did find the minimum change needed, it envolve a function with a
> >>>> VA_LIST as argument.
> >>>>
> >>>> With only the following patch, OVMF works again.
> >>>>
> >>>> diff --git a/OvmfPkg/XenBusDxe/XenStore.c b/OvmfPkg/XenBusDxe/XenStore.c
> >>>> index 1666c4b..85b0956 100644
> >>>> --- a/OvmfPkg/XenBusDxe/XenStore.c
> >>>> +++ b/OvmfPkg/XenBusDxe/XenStore.c
> >>>> @@ -1307,6 +1307,7 @@ XenStoreTransactionEnd (
> >>>>  }
> >>>>  
> >>>>  XENSTORE_STATUS
> >>>> +EFIAPI
> >>>>  XenStoreVSPrint (
> >>>>    IN CONST XENSTORE_TRANSACTION *Transaction,
> >>>>    IN CONST CHAR8           *DirectoryPath,
> >>>> diff --git a/OvmfPkg/XenBusDxe/XenStore.h b/OvmfPkg/XenBusDxe/XenStore.h
> >>>> index c9d4c65..33bb647 100644
> >>>> --- a/OvmfPkg/XenBusDxe/XenStore.h
> >>>> +++ b/OvmfPkg/XenBusDxe/XenStore.h
> >>>> @@ -209,6 +209,7 @@ XenStoreSPrint (
> >>>>             indicating the type of write failure.
> >>>>  **/
> >>>>  XENSTORE_STATUS
> >>>> +EFIAPI
> >>>>  XenStoreVSPrint (
> >>>>    IN CONST XENSTORE_TRANSACTION *Transaction,
> >>>>    IN CONST CHAR8           *DirectoryPath,
> >>>>    IN CONST CHAR8           *Node,
> >>>>    IN CONST CHAR8           *FormatString,
> >>>>    IN VA_LIST               Marker
> >>>>    );
> >>>>
> >>>>
> >>>> I think the exception happen when this function is called via
> >>>> XENBUS_PROTOCOL->XsPrintf() from XenPvBlockFrontInitialization() in
> >>>> OvmfPkg/XenPvBlkDxe/BlockFront.c
> >>>>
> >>>
> >>> It used to be a known requirement / limitation that all functions with
> >>> variable argument lists had to be EFIAPI, regardless of cross-module
> >>> use. However, commit 48d5f9a551a93acb45f272dda879b0ab5a504e36 changed
> >>> that, and varargs should "just work" now. I suspect this is a
> >>> __builtin_ms_va_* regression in gcc-6. Thank you for narrowing it down.
> >>> It might make sense to report a bug in the upstream gcc tracker.
> >>>
> >>> ... Oh wow, this is a known gcc bug! See:
> >>>
> >>> https://lists.01.org/pipermail/edk2-devel/2016-August/001018.html
> >>>
> >>> Upstream gcc BZ <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955> was
> >>> apparently solved for "Target Milestone: 6.3" (your version is 6.2.1).
> >>> So we'll either need a GCC6 toolchain in BaseTools that drops -flto, in
> >>> order to work around this gcc issue, or we'll have to ask gcc-6 users to
> >>> use at least gcc-6.3.
> >>>
> >>> Oh wait, gcc-6.3 hasn't been released yet. We need the BaseTools
> >>> workaround then.
> >>
> >> I think I got confused in parts of the above; I got some details wrong.
> >> Namely, commit 48d5f9a551a9 did not remove the requirement/limitation
> >> that all varargs functions have to be EFIAPI. Said commit only changed
> >> how the VA_*() macros would be implemented.
> >>
> >> The two caller functions of XenStoreVSPrint(), namely XenStoreSPrint()
> >> and XenBusXenStoreSPrint(), are varargs functions, but they are already
> >> EFIAPI. So the requirement/limitation (which was unaffected by
> >> 48d5f9a551a9) is actually satisfied / considered in XenBusDxe.
> >>
> >> The XenStoreVSPrint() function, which you identified as the breaking
> >> part, is *not* a varargs function itself, so it needn't be EFIAPI. It
> >> simply receives a VA_LIST parameter (which is "__builtin_ms_va_list",
> >> from commit 48d5f9a551a9), and (a) copies it with VA_COPY() for passing
> >> the copy to SPrintLengthAsciiFormat(), (b) passes the original parameter
> >> to AsciiVSPrint(). In turn both of those functions call the common
> >> BasePrintLibSPrintMarker() function.
> >>
> >> Comment <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955#c6> says,
> >>
> >>> This is bug report that the specialized
> >>> __builtin_ms_va_{list,start,end,copy} builtins have stopped working
> >>> when -flto is used.  They worked until gcc 5.3, both with or without
> >>> -flto.  In gcc 6.1 with -flto, the canonical iterator __builtin_va_arg
> >>> ignores them and works on a sysv_va_list.  To be precise, it's
> >>> __builtin_va_arg in the context of -flto that's broken, not the
> >>> specialized builtins.  __builtin_va_arg has always been a polymorphic
> >>> builtin that changes its behavior based on the type of va_list it's
> >>> given as an argument.  Without this polymorphic behavior, there's no
> >>> way to iterate over an ms_va_list.
> >>
> >> Apparently, when BasePrintLibSPrintMarker() finally calls VA_ARG() (==
> >> __builtin_va_arg(), from commit 48d5f9a551a9) on Marker / Marker2, with
> >> LTO enabled, __builtin_va_arg() fails to notice what context
> >> VaListMarker comes from:
> >> - __builtin_ms_va_start() in XenStoreSPrint() and XenBusXenStoreSPrint(), or
> >> - __builtin_ms_va_copy() in XenStoreVSPrint().
> >>
> >> So I think we *are* being hit by gcc BZ#70955, and making
> >> XenStoreVSPrint() EFIAPI only masks the issue. Comment
> >> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955#c7> seems relevant:
> >>
> >>> The change with GCC 6 is that the builtins are now lowered during
> >>> link-time optimization rather than at compile-time.  Thus the abi
> >>> selection bits are possibly not transfered correctly (type merging?).
> >>> I remember the business was quite ugly, but eventually we just miss to
> >>> properly transfer the function attribute.
> >>
> >> The end result for edk2 remains the same (= BaseTools should work around
> >> this gcc issue with a new GCC6 toolchain that drops -flto, unless
> >> gcc-6.3 is about to become available to users real quick). I just wanted
> >> to point out that my earlier statement "commit 48d5f9a551a9 had removed
> >> the need for varargs functions to be EFIAPI" was incorrect -- varargs
> >> functions still must be EFIAPI (and XenBusDxe conforms, see
> >> XenStoreSPrint() and XenBusXenStoreSPrint()).
> > 
> > Hi Laszlo,
> > 
> > Now that gcc 6.3 is out, the bug described in the thread strikes again.
> > Building OVMF with -flto result in Page-Fault or General Protection
> > fault, due to the way va_args are used in XenStoreVSPrint().
> > 
> > Also, now I've tried to build OVMF with gcc 5.4, same result, using
> > -flto result in a firmware that does not work.
> > 
> > 
> > I've tried to create a small programme that use the va_args in the same
> > way, and compiled-test it with different gcc (gcc 4.9.2, gcc 5.4, gcc
> > 6.3), then depending on the options use, it does not work or it works:
> > 
> > Don't work (prog segv or wrong output):
> > gcc -o prog va_main.c va_test.c
> > gcc -o prog -flto va_main.c va_test.c
> > gcc -Os -o prog -flto va_main.c va_test.c
> > 
> > Work:
> > gcc -Os -o prog va_main.c va_test.c
> > 
> > I'll attach va_main.c and va_test.c.
> > 
> > So, should I add EFIAPI to XenStoreVSPrint, as it is using VA_COPY?
> > 
> 
> Hm, please help me jog my memory...
> 
> If I remember correctly, this is still a GCC bug, one that we suppressed for gcc-6.2 with your patch as follows:

Yes.

> > commit 432f1d83f77acf92d52ef18d2cee6dbf7c5b9b86
> > Author: Anthony PERARD <anthony.perard@citrix.com>
> > Date:   Tue Dec 6 12:03:25 2016 +0000
> >
> >     OvmfPkg/build.sh: Use GCC49 toolchains with GCC 6.[0-2]
> >
> >     The goal of the patch is to avoid using -flto with GCC 6.0 to 6.2.
> >
> >     This is to workaround a GCC bug:
> >     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955
> >
> >     Contributed-under: TianoCore Contribution Agreement 1.0
> >     Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> >     Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> >     Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
> >
> > diff --git a/OvmfPkg/build.sh b/OvmfPkg/build.sh
> > index 95fe8fb07647..b6e936056ca0 100755
> > --- a/OvmfPkg/build.sh
> > +++ b/OvmfPkg/build.sh
> > @@ -102,7 +102,7 @@ case `uname` in
> >        4.8.*)
> >          TARGET_TOOLS=GCC48
> >          ;;
> > -      4.9.*)
> > +      4.9.*|6.[0-2].*)
> >          TARGET_TOOLS=GCC49
> >          ;;
> >        *)
> 
> Do I understand correctly that the gcc bug has not been fixed in gcc-6.3, and -- because we don't suppress it for gcc-6.3 as the above expression does not match -- it causes problems again?

The bug describe in the GCC bugzilla is probably fix, but the test-case
does not make use of __builtin_ms_va_copy.

> You also mention gcc-5.4 as problematic. I think we haven't received such reports about gcc-5 versions up to and including gcc-5.3 (that's why GCC5 is the default selection in "OvmfPkg/build.sh"). Do you mean that the gcc bug has now been "backported" from the gcc-6 series to the gcc-5 series (starting with gcc-5.4)?

I don't know the state of gcc-5.0 to gcc-5.3, I have never tested -flto
with gcc-5.x (until now), I would say they are also problematic until
proven otherwise.

> If that's the case, then I suggest flipping "OvmfPkg/build.sh" from black-listing gcc versions for -flto to white-listing. In other words, assume that -flto is generally broken with GCC, except for a few known versions: 5.0 through 5.3 inclusive. Those versions should trigger the use of the GCC5 toolchain, and everything else (5.4+, 6.*, 4.9.*) should use GCC49.
> 
> I don't feel comfortable about adding EFIAPI to XenStoreVSPrint just because it takes a VA_LIST parameter -- note: it is *not* a varargs function itself! --; the same issue might hit elsewhere in the edk2 tree at any time, outside of OvmfPkg too.

>From the different tests I've done, I feel more like VA_COPY might be
the issue, but I don't know how __builtin_ms_va_* are supposed to be
used.

> Would the gcc white-listing work for you?
> 
> Note that the white-listing would practically undo Konrad's commit 2667ad40919a ("OvmfPkg/build.sh: Make GCC5 the default toolchain, catch GCC43 and earlier", 2016-11-23), but given the recent gcc developments (gcc-6.3 has not fixed the gcc bug, and the bug has even surfaced in gcc-5.4), I think it would be justified.

Do be honnest, I don't think the toolchain GCC5 has ever been tested
with gcc-5.x and the module XenBusDxe. I think most people that want to
start OVMF under Xen are likely to build it with gcc-4.9 or already had
gcc-6.x when OVMF switch to the GCC5 toolchain by default.

-- 
Anthony PERARD


  reply	other threads:[~2017-02-21 17:53 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-01 15:28 [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5 Anthony PERARD
2016-12-01 15:28 ` [PATCH 1/4] OvmfPkg/XenHypercallLib: Add EFIAPI Anthony PERARD
2016-12-01 15:28 ` [PATCH 2/4] OvmfPkg/XenBusDxe: Add EFIAPI to XenEventChannelNotify Anthony PERARD
2016-12-01 15:28 ` [PATCH 3/4] OvmfPkg/XenBusDxe: Add EFIAPI to XenStore functions Anthony PERARD
2016-12-01 15:28 ` [PATCH 4/4] OvmfPkg/XenBusDxe: Add EFIAPI to XenGrantTable{Grant, End}Access Anthony PERARD
2016-12-01 18:43 ` [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5 Laszlo Ersek
2016-12-01 20:06   ` Jordan Justen
2016-12-01 20:54     ` Laszlo Ersek
2016-12-02  0:58       ` Jordan Justen
2016-12-02  9:45         ` Laszlo Ersek
2016-12-02  4:36   ` Gao, Liming
2016-12-02 10:00     ` Laszlo Ersek
2016-12-02 16:02   ` Anthony PERARD
2016-12-02 19:26     ` Laszlo Ersek
2016-12-03 17:59       ` Laszlo Ersek
2016-12-05  2:55         ` Gao, Liming
2016-12-05 10:09           ` Laszlo Ersek
2017-02-21 16:39         ` Anthony PERARD
2017-02-21 17:07           ` Laszlo Ersek
2017-02-21 17:53             ` Anthony PERARD [this message]
2017-02-21 19:02               ` Laszlo Ersek
2017-02-21 19:08                 ` Rebecca Cran
2017-02-21 22:45                   ` Jordan Justen
2017-02-21 23:59                     ` Laszlo Ersek
2017-02-22 14:16                       ` Gao, Liming
2017-02-22  8:54                 ` Gao, Liming
2017-02-23 10:19                   ` Laszlo Ersek
2017-02-23 12:43                     ` Anthony PERARD
2017-02-23 13:00                     ` Gao, Liming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170221175303.GD1867@perard.uk.xensource.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox