From: Laszlo Ersek <lersek@redhat.com>
To: Anthony PERARD <anthony.perard@citrix.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>,
"Gao, Liming" <liming.gao@intel.com>,
"Zhu, Yonghong" <yonghong.zhu@intel.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
edk2-devel@ml01.01.org, Rebecca Cran <rebecca@bluestop.org>,
Konrad Rzeszutek Wilk <konrad@kernel.org>
Subject: Re: [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5
Date: Tue, 21 Feb 2017 18:07:15 +0100 [thread overview]
Message-ID: <97214320-6054-8034-7667-6edde4debd80@redhat.com> (raw)
In-Reply-To: <20170221163922.GC1867@perard.uk.xensource.com>
CC Rebecca & Konrad
On 02/21/17 17:39, Anthony PERARD wrote:
> On Sat, Dec 03, 2016 at 06:59:28PM +0100, Laszlo Ersek wrote:
>> On 12/02/16 20:26, Laszlo Ersek wrote:
>>> On 12/02/16 17:02, Anthony PERARD wrote:
>>>> On Thu, Dec 01, 2016 at 07:43:24PM +0100, Laszlo Ersek wrote:
>>>>> On 12/01/16 16:28, Anthony PERARD wrote:
>>>>>> Hi,
>>>>>>
>>>>>> That might be only with the Xen part of OVMF but now that the GCC5
>>>>>> toolchains is used with my gcc (6.2.1 20160830, Arch Linux), OVMF fail
>>>>>> to boot in Xen guests.
>>>>>>
>>>> [...]
>>>>>>
>>>>>> Removing the gcc option -flto in only the XenBusDxe module makes OVMF
>>>>>> boot.
>>>>>>
>>>>>> While trying to debug that, I've added some debug prints (in this module
>>>>>> and in XenPvBlkDxe), and the exception could change and become a "page
>>>>>> fault" instead, or even an assert failure in the PrintLib, that was the
>>>>>> ASSERT(Buffer != NULL) at I think
>>>>>> MdePkg/Library/BasePrintLib/PrintLibInternal.c:366
>>>>>>
>>>>>> Adding EFIAPI to internal functions in XenBusDxe makes things work
>>>>>> again. My guest is that gcc would bypass (optimise) an exported
>>>>>> functions and call directly an internal one but without reordering the
>>>>>> arguments (EFIAPI vs nothing).
>>>>>>
>>>>>> Does that make sense?
>>>>>
>>>>> If "-b NOOPT" works for you, I'd prefer that as a temporary solution
>>>>> (until the root cause is found and addressed) to the XenBusDxe patches.
>>>>
>>>> That works, using GCC49 (with gcc 6.2.1) works as well.
>>>>
>>>>> Hrpmf, wait a second, I do see something interesting: in this series you
>>>>> *are* modifying APIs declared in a library class header (namely
>>>>> "OvmfPkg/Include/Library/XenHypercallLib.h"). Such functions (public
>>>>> libraries) *are* required to specify EFIAPI.
>>>>>
>>>>> What happens if you apply patch #1 only?
>>>>
>>>> With only XenHypercallLib changes, the error is the same.
>>>>
>>>> But I did find the minimum change needed, it envolve a function with a
>>>> VA_LIST as argument.
>>>>
>>>> With only the following patch, OVMF works again.
>>>>
>>>> diff --git a/OvmfPkg/XenBusDxe/XenStore.c b/OvmfPkg/XenBusDxe/XenStore.c
>>>> index 1666c4b..85b0956 100644
>>>> --- a/OvmfPkg/XenBusDxe/XenStore.c
>>>> +++ b/OvmfPkg/XenBusDxe/XenStore.c
>>>> @@ -1307,6 +1307,7 @@ XenStoreTransactionEnd (
>>>> }
>>>>
>>>> XENSTORE_STATUS
>>>> +EFIAPI
>>>> XenStoreVSPrint (
>>>> IN CONST XENSTORE_TRANSACTION *Transaction,
>>>> IN CONST CHAR8 *DirectoryPath,
>>>> diff --git a/OvmfPkg/XenBusDxe/XenStore.h b/OvmfPkg/XenBusDxe/XenStore.h
>>>> index c9d4c65..33bb647 100644
>>>> --- a/OvmfPkg/XenBusDxe/XenStore.h
>>>> +++ b/OvmfPkg/XenBusDxe/XenStore.h
>>>> @@ -209,6 +209,7 @@ XenStoreSPrint (
>>>> indicating the type of write failure.
>>>> **/
>>>> XENSTORE_STATUS
>>>> +EFIAPI
>>>> XenStoreVSPrint (
>>>> IN CONST XENSTORE_TRANSACTION *Transaction,
>>>> IN CONST CHAR8 *DirectoryPath,
>>>> IN CONST CHAR8 *Node,
>>>> IN CONST CHAR8 *FormatString,
>>>> IN VA_LIST Marker
>>>> );
>>>>
>>>>
>>>> I think the exception happen when this function is called via
>>>> XENBUS_PROTOCOL->XsPrintf() from XenPvBlockFrontInitialization() in
>>>> OvmfPkg/XenPvBlkDxe/BlockFront.c
>>>>
>>>
>>> It used to be a known requirement / limitation that all functions with
>>> variable argument lists had to be EFIAPI, regardless of cross-module
>>> use. However, commit 48d5f9a551a93acb45f272dda879b0ab5a504e36 changed
>>> that, and varargs should "just work" now. I suspect this is a
>>> __builtin_ms_va_* regression in gcc-6. Thank you for narrowing it down.
>>> It might make sense to report a bug in the upstream gcc tracker.
>>>
>>> ... Oh wow, this is a known gcc bug! See:
>>>
>>> https://lists.01.org/pipermail/edk2-devel/2016-August/001018.html
>>>
>>> Upstream gcc BZ <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955> was
>>> apparently solved for "Target Milestone: 6.3" (your version is 6.2.1).
>>> So we'll either need a GCC6 toolchain in BaseTools that drops -flto, in
>>> order to work around this gcc issue, or we'll have to ask gcc-6 users to
>>> use at least gcc-6.3.
>>>
>>> Oh wait, gcc-6.3 hasn't been released yet. We need the BaseTools
>>> workaround then.
>>
>> I think I got confused in parts of the above; I got some details wrong.
>> Namely, commit 48d5f9a551a9 did not remove the requirement/limitation
>> that all varargs functions have to be EFIAPI. Said commit only changed
>> how the VA_*() macros would be implemented.
>>
>> The two caller functions of XenStoreVSPrint(), namely XenStoreSPrint()
>> and XenBusXenStoreSPrint(), are varargs functions, but they are already
>> EFIAPI. So the requirement/limitation (which was unaffected by
>> 48d5f9a551a9) is actually satisfied / considered in XenBusDxe.
>>
>> The XenStoreVSPrint() function, which you identified as the breaking
>> part, is *not* a varargs function itself, so it needn't be EFIAPI. It
>> simply receives a VA_LIST parameter (which is "__builtin_ms_va_list",
>> from commit 48d5f9a551a9), and (a) copies it with VA_COPY() for passing
>> the copy to SPrintLengthAsciiFormat(), (b) passes the original parameter
>> to AsciiVSPrint(). In turn both of those functions call the common
>> BasePrintLibSPrintMarker() function.
>>
>> Comment <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955#c6> says,
>>
>>> This is bug report that the specialized
>>> __builtin_ms_va_{list,start,end,copy} builtins have stopped working
>>> when -flto is used. They worked until gcc 5.3, both with or without
>>> -flto. In gcc 6.1 with -flto, the canonical iterator __builtin_va_arg
>>> ignores them and works on a sysv_va_list. To be precise, it's
>>> __builtin_va_arg in the context of -flto that's broken, not the
>>> specialized builtins. __builtin_va_arg has always been a polymorphic
>>> builtin that changes its behavior based on the type of va_list it's
>>> given as an argument. Without this polymorphic behavior, there's no
>>> way to iterate over an ms_va_list.
>>
>> Apparently, when BasePrintLibSPrintMarker() finally calls VA_ARG() (==
>> __builtin_va_arg(), from commit 48d5f9a551a9) on Marker / Marker2, with
>> LTO enabled, __builtin_va_arg() fails to notice what context
>> VaListMarker comes from:
>> - __builtin_ms_va_start() in XenStoreSPrint() and XenBusXenStoreSPrint(), or
>> - __builtin_ms_va_copy() in XenStoreVSPrint().
>>
>> So I think we *are* being hit by gcc BZ#70955, and making
>> XenStoreVSPrint() EFIAPI only masks the issue. Comment
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955#c7> seems relevant:
>>
>>> The change with GCC 6 is that the builtins are now lowered during
>>> link-time optimization rather than at compile-time. Thus the abi
>>> selection bits are possibly not transfered correctly (type merging?).
>>> I remember the business was quite ugly, but eventually we just miss to
>>> properly transfer the function attribute.
>>
>> The end result for edk2 remains the same (= BaseTools should work around
>> this gcc issue with a new GCC6 toolchain that drops -flto, unless
>> gcc-6.3 is about to become available to users real quick). I just wanted
>> to point out that my earlier statement "commit 48d5f9a551a9 had removed
>> the need for varargs functions to be EFIAPI" was incorrect -- varargs
>> functions still must be EFIAPI (and XenBusDxe conforms, see
>> XenStoreSPrint() and XenBusXenStoreSPrint()).
>
> Hi Laszlo,
>
> Now that gcc 6.3 is out, the bug described in the thread strikes again.
> Building OVMF with -flto result in Page-Fault or General Protection
> fault, due to the way va_args are used in XenStoreVSPrint().
>
> Also, now I've tried to build OVMF with gcc 5.4, same result, using
> -flto result in a firmware that does not work.
>
>
> I've tried to create a small programme that use the va_args in the same
> way, and compiled-test it with different gcc (gcc 4.9.2, gcc 5.4, gcc
> 6.3), then depending on the options use, it does not work or it works:
>
> Don't work (prog segv or wrong output):
> gcc -o prog va_main.c va_test.c
> gcc -o prog -flto va_main.c va_test.c
> gcc -Os -o prog -flto va_main.c va_test.c
>
> Work:
> gcc -Os -o prog va_main.c va_test.c
>
> I'll attach va_main.c and va_test.c.
>
> So, should I add EFIAPI to XenStoreVSPrint, as it is using VA_COPY?
>
Hm, please help me jog my memory...
If I remember correctly, this is still a GCC bug, one that we suppressed for gcc-6.2 with your patch as follows:
> commit 432f1d83f77acf92d52ef18d2cee6dbf7c5b9b86
> Author: Anthony PERARD <anthony.perard@citrix.com>
> Date: Tue Dec 6 12:03:25 2016 +0000
>
> OvmfPkg/build.sh: Use GCC49 toolchains with GCC 6.[0-2]
>
> The goal of the patch is to avoid using -flto with GCC 6.0 to 6.2.
>
> This is to workaround a GCC bug:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70955
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
>
> diff --git a/OvmfPkg/build.sh b/OvmfPkg/build.sh
> index 95fe8fb07647..b6e936056ca0 100755
> --- a/OvmfPkg/build.sh
> +++ b/OvmfPkg/build.sh
> @@ -102,7 +102,7 @@ case `uname` in
> 4.8.*)
> TARGET_TOOLS=GCC48
> ;;
> - 4.9.*)
> + 4.9.*|6.[0-2].*)
> TARGET_TOOLS=GCC49
> ;;
> *)
Do I understand correctly that the gcc bug has not been fixed in gcc-6.3, and -- because we don't suppress it for gcc-6.3 as the above expression does not match -- it causes problems again?
You also mention gcc-5.4 as problematic. I think we haven't received such reports about gcc-5 versions up to and including gcc-5.3 (that's why GCC5 is the default selection in "OvmfPkg/build.sh"). Do you mean that the gcc bug has now been "backported" from the gcc-6 series to the gcc-5 series (starting with gcc-5.4)?
If that's the case, then I suggest flipping "OvmfPkg/build.sh" from black-listing gcc versions for -flto to white-listing. In other words, assume that -flto is generally broken with GCC, except for a few known versions: 5.0 through 5.3 inclusive. Those versions should trigger the use of the GCC5 toolchain, and everything else (5.4+, 6.*, 4.9.*) should use GCC49.
I don't feel comfortable about adding EFIAPI to XenStoreVSPrint just because it takes a VA_LIST parameter -- note: it is *not* a varargs function itself! --; the same issue might hit elsewhere in the edk2 tree at any time, outside of OvmfPkg too.
Would the gcc white-listing work for you?
Note that the white-listing would practically undo Konrad's commit 2667ad40919a ("OvmfPkg/build.sh: Make GCC5 the default toolchain, catch GCC43 and earlier", 2016-11-23), but given the recent gcc developments (gcc-6.3 has not fixed the gcc bug, and the bug has even surfaced in gcc-5.4), I think it would be justified.
Thanks
Laszlo
next prev parent reply other threads:[~2017-02-21 17:07 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-01 15:28 [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5 Anthony PERARD
2016-12-01 15:28 ` [PATCH 1/4] OvmfPkg/XenHypercallLib: Add EFIAPI Anthony PERARD
2016-12-01 15:28 ` [PATCH 2/4] OvmfPkg/XenBusDxe: Add EFIAPI to XenEventChannelNotify Anthony PERARD
2016-12-01 15:28 ` [PATCH 3/4] OvmfPkg/XenBusDxe: Add EFIAPI to XenStore functions Anthony PERARD
2016-12-01 15:28 ` [PATCH 4/4] OvmfPkg/XenBusDxe: Add EFIAPI to XenGrantTable{Grant, End}Access Anthony PERARD
2016-12-01 18:43 ` [PATCH 0/4] Fix runtime issue in XenBusDxe when compiled with GCC5 Laszlo Ersek
2016-12-01 20:06 ` Jordan Justen
2016-12-01 20:54 ` Laszlo Ersek
2016-12-02 0:58 ` Jordan Justen
2016-12-02 9:45 ` Laszlo Ersek
2016-12-02 4:36 ` Gao, Liming
2016-12-02 10:00 ` Laszlo Ersek
2016-12-02 16:02 ` Anthony PERARD
2016-12-02 19:26 ` Laszlo Ersek
2016-12-03 17:59 ` Laszlo Ersek
2016-12-05 2:55 ` Gao, Liming
2016-12-05 10:09 ` Laszlo Ersek
2017-02-21 16:39 ` Anthony PERARD
2017-02-21 17:07 ` Laszlo Ersek [this message]
2017-02-21 17:53 ` Anthony PERARD
2017-02-21 19:02 ` Laszlo Ersek
2017-02-21 19:08 ` Rebecca Cran
2017-02-21 22:45 ` Jordan Justen
2017-02-21 23:59 ` Laszlo Ersek
2017-02-22 14:16 ` Gao, Liming
2017-02-22 8:54 ` Gao, Liming
2017-02-23 10:19 ` Laszlo Ersek
2017-02-23 12:43 ` Anthony PERARD
2017-02-23 13:00 ` Gao, Liming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97214320-6054-8034-7667-6edde4debd80@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox