* PR fails due to OVMF time out
@ 2023-03-31 15:00 Ni, Ray
2023-03-31 19:46 ` Gerd Hoffmann
0 siblings, 1 reply; 6+ messages in thread
From: Ni, Ray @ 2023-03-31 15:00 UTC (permalink / raw)
To: devel@edk2.groups.io
Cc: Kinney, Michael D, Ard Biesheuvel, 'Gerd Hoffmann'
[-- Attachment #1: Type: text/plain, Size: 565 bytes --]
Hi,
I found several of my PRs cannot pass the CI due to OVMF boot timeout.
e.g.: Mktme fix by niruiyu * Pull Request #4072 * tianocore/edk2 (github.com)<https://github.com/tianocore/edk2/pull/4072>
I also one PR from Ard (X64 text reloc fixes by ardbiesheuvel * Pull Request #4216 * tianocore/edk2 (github.com)<https://github.com/tianocore/edk2/pull/4216>) failed as well.
But I remember Mike increased the boot timeout from 1min to 2 mins.
Why does OVMF boot require more time than before?
Is that because Gerd enabled the SMP 4 boot?
Thanks,
Ray
[-- Attachment #2: Type: text/html, Size: 2764 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: PR fails due to OVMF time out
2023-03-31 15:00 PR fails due to OVMF time out Ni, Ray
@ 2023-03-31 19:46 ` Gerd Hoffmann
2023-04-02 18:23 ` Michael D Kinney
0 siblings, 1 reply; 6+ messages in thread
From: Gerd Hoffmann @ 2023-03-31 19:46 UTC (permalink / raw)
To: Ni, Ray; +Cc: devel@edk2.groups.io, Kinney, Michael D, Ard Biesheuvel
On Fri, Mar 31, 2023 at 03:00:37PM +0000, Ni, Ray wrote:
> Hi,
> I found several of my PRs cannot pass the CI due to OVMF boot timeout.
> e.g.: Mktme fix by niruiyu * Pull Request #4072 * tianocore/edk2 (github.com)<https://github.com/tianocore/edk2/pull/4072>
>
> I also one PR from Ard (X64 text reloc fixes by ardbiesheuvel * Pull Request #4216 * tianocore/edk2 (github.com)<https://github.com/tianocore/edk2/pull/4216>) failed as well.
>
> But I remember Mike increased the boot timeout from 1min to 2 mins.
> Why does OVMF boot require more time than before?
>
> Is that because Gerd enabled the SMP 4 boot?
That could be the reason. It happened to work in my CI test run,
but maybe I was just lucky.
take care,
Gerd
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: PR fails due to OVMF time out
2023-03-31 19:46 ` Gerd Hoffmann
@ 2023-04-02 18:23 ` Michael D Kinney
2023-04-03 8:21 ` Ard Biesheuvel
0 siblings, 1 reply; 6+ messages in thread
From: Michael D Kinney @ 2023-04-02 18:23 UTC (permalink / raw)
To: 'Gerd Hoffmann', Ni, Ray
Cc: devel@edk2.groups.io, Ard Biesheuvel, Kinney, Michael D
Hi Gerd,
I have investigated this failure with enabling -smp 4. I think this is an
important feature that should be on by default.
I did find that the results where inconsistent at 2 or 4 cpus on my laptop,
but going 8 or higher would fail consistently. I used -smp 32 for all the
testing below.
First, this failure mode is only seen with SMM_REQUIRE=1 builds. If I
set SMM_REQUIRE=0, then QEMU can boot with different smp settings.
This appears to be a QEMU MP SMM related issue.
I first tried going back in edk2 history looking for a point where smp
boot would work using the version of QEMU used by EDK II CI agents which
is 2021.5.5 (QEMU 6.0.0)
* https://github.com/tianocore/edk2/blob/fc00ff286a541c047b7d343e66ec10890b80d3ea/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml#L142
I went back in edk2 history to the 2017/2018 timeframe and the issue was
still present, and I recall testing this feature back then myself, so I
do not think it is related to edk2 changes.
I then did a binary search through the QEMU releases:
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20170113.exe
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20170808.exe 2.10.0-rc2
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20171122.exe 2.11.0-rc2
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20171211.exe 2.11.0-rc5
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20171217.exe 2.11.0
FAIL: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180321.exe 2.12.0-rc0 - AioContextPolling - Unrelated failure
PASS: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180404.exe 2.12.0-rc1
PASS: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180711.exe 3.0.0-rc0
PASS: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180807.exe 3.0.0-rc4
PASS: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180815.exe 3.0.0
FAIL: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20181108.exe 3.1.0-rc0 - Long Delay at BSD Entry
FAIL: https://qemu.weilnetz.de/w64/2019/qemu-w64-setup-20181211.exe 3.1.0 - Long Delay at BDS Entry
FAIL: https://qemu.weilnetz.de/w64/2019/qemu-w64-setup-20190815.exe 4.1.0 - Long Delay at BDS Entry
It appears the failure was introduced in the transition from 3.0.0 -> 3.1.0.
QEMU 3.1 introduced Multi-Threaded TCG for x86 CPUs
* https://www.qemu.org/2018/12/12/qemu-3-1-0/
* https://qemu.readthedocs.io/en/latest/devel/multi-thread-tcg.html#multi-threaded-tcg
The feature can be disabled with the following QEMU command line option:
* -accel tcg,thread=single
* https://doc.ycharbi.fr/fichiers/virtualisation/qemu/documentation/qemu-doc-3.1.0.html
When I apply -accel tcg,thread-single to the QEMU command line, QEMU 3.1.0 passes.
When I apply -accel tcg,thread-single to the QEMU command line, QEMU 6.0.0 passes.
I think we can create a new PR with: -smp 4 -accel tcg,thead-single
I consider this a workaround, and we should investigate why Multi-Threaded
TCG breaks QEMU MP SMM.
Best regards,
Mike
> -----Original Message-----
> From: 'Gerd Hoffmann' <kraxel@redhat.com>
> Sent: Friday, March 31, 2023 12:47 PM
> To: Ni, Ray <ray.ni@intel.com>
> Cc: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Ard Biesheuvel <ardb@kernel.org>
> Subject: Re: PR fails due to OVMF time out
>
> On Fri, Mar 31, 2023 at 03:00:37PM +0000, Ni, Ray wrote:
> > Hi,
> > I found several of my PRs cannot pass the CI due to OVMF boot timeout.
> > e.g.: Mktme fix by niruiyu * Pull Request #4072 * tianocore/edk2
> (github.com)<https://github.com/tianocore/edk2/pull/4072>
> >
> > I also one PR from Ard (X64 text reloc fixes by ardbiesheuvel * Pull Request #4216 * tianocore/edk2
> (github.com)<https://github.com/tianocore/edk2/pull/4216>) failed as well.
> >
> > But I remember Mike increased the boot timeout from 1min to 2 mins.
> > Why does OVMF boot require more time than before?
> >
> > Is that because Gerd enabled the SMP 4 boot?
>
> That could be the reason. It happened to work in my CI test run,
> but maybe I was just lucky.
>
> take care,
> Gerd
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: PR fails due to OVMF time out
2023-04-02 18:23 ` Michael D Kinney
@ 2023-04-03 8:21 ` Ard Biesheuvel
2023-04-03 10:37 ` [edk2-devel] " Gerd Hoffmann
0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2023-04-03 8:21 UTC (permalink / raw)
To: Kinney, Michael D, Laszlo Ersek
Cc: Gerd Hoffmann, Ni, Ray, devel@edk2.groups.io
On Sun, 2 Apr 2023 at 20:23, Kinney, Michael D
<michael.d.kinney@intel.com> wrote:
>
> Hi Gerd,
>
> I have investigated this failure with enabling -smp 4. I think this is an
> important feature that should be on by default.
>
Agreed. And given your investigation below (thanks!), we should be
able to revert the revert as long as we add '-accel
tcg,thread-single', right?
It would be nice if we could also get some coverage for KVM, but for
the time being, this seems to be an improvement already (AFAIU, we
want -smp 4 primarily for the logic used in the code, while there is
very little code that would actually run concurrently, if any)
--
Ard.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] PR fails due to OVMF time out
2023-04-03 8:21 ` Ard Biesheuvel
@ 2023-04-03 10:37 ` Gerd Hoffmann
2023-04-04 20:04 ` Michael D Kinney
0 siblings, 1 reply; 6+ messages in thread
From: Gerd Hoffmann @ 2023-04-03 10:37 UTC (permalink / raw)
To: devel, ardb; +Cc: Kinney, Michael D, Laszlo Ersek, Ni, Ray
On Mon, Apr 03, 2023 at 10:21:27AM +0200, Ard Biesheuvel wrote:
> On Sun, 2 Apr 2023 at 20:23, Kinney, Michael D
> <michael.d.kinney@intel.com> wrote:
> >
> > Hi Gerd,
> >
> > I have investigated this failure with enabling -smp 4. I think this is an
> > important feature that should be on by default.
> >
>
> Agreed. And given your investigation below (thanks!), we should be
> able to revert the revert as long as we add '-accel
> tcg,thread-single', right?
That is how I read it, yes.
> It would be nice if we could also get some coverage for KVM, but for
> the time being, this seems to be an improvement already (AFAIU, we
> want -smp 4 primarily for the logic used in the code, while there is
> very little code that would actually run concurrently, if any)
Yes, it's to improve MpInit test coverage. Most code would continue to
run single-threaded.
take care,
Gerd
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] PR fails due to OVMF time out
2023-04-03 10:37 ` [edk2-devel] " Gerd Hoffmann
@ 2023-04-04 20:04 ` Michael D Kinney
0 siblings, 0 replies; 6+ messages in thread
From: Michael D Kinney @ 2023-04-04 20:04 UTC (permalink / raw)
To: devel@edk2.groups.io, kraxel@redhat.com, ardb@kernel.org
Cc: Laszlo Ersek, Ni, Ray, Kinney, Michael D
Hi Gerd,
Do you want to submit a new email review/PR with this suggested change?
Thanks,
Mike
> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Gerd Hoffmann
> Sent: Monday, April 3, 2023 3:38 AM
> To: devel@edk2.groups.io; ardb@kernel.org
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Laszlo Ersek <lersek@redhat.com>; Ni, Ray <ray.ni@intel.com>
> Subject: Re: [edk2-devel] PR fails due to OVMF time out
>
> On Mon, Apr 03, 2023 at 10:21:27AM +0200, Ard Biesheuvel wrote:
> > On Sun, 2 Apr 2023 at 20:23, Kinney, Michael D
> > <michael.d.kinney@intel.com> wrote:
> > >
> > > Hi Gerd,
> > >
> > > I have investigated this failure with enabling -smp 4. I think this is an
> > > important feature that should be on by default.
> > >
> >
> > Agreed. And given your investigation below (thanks!), we should be
> > able to revert the revert as long as we add '-accel
> > tcg,thread-single', right?
>
> That is how I read it, yes.
>
> > It would be nice if we could also get some coverage for KVM, but for
> > the time being, this seems to be an improvement already (AFAIU, we
> > want -smp 4 primarily for the logic used in the code, while there is
> > very little code that would actually run concurrently, if any)
>
> Yes, it's to improve MpInit test coverage. Most code would continue to
> run single-threaded.
>
> take care,
> Gerd
>
>
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-04-04 20:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-31 15:00 PR fails due to OVMF time out Ni, Ray
2023-03-31 19:46 ` Gerd Hoffmann
2023-04-02 18:23 ` Michael D Kinney
2023-04-03 8:21 ` Ard Biesheuvel
2023-04-03 10:37 ` [edk2-devel] " Gerd Hoffmann
2023-04-04 20:04 ` Michael D Kinney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox