public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* PR fails due to OVMF time out
@ 2023-03-31 15:00 Ni, Ray
  2023-03-31 19:46 ` Gerd Hoffmann
  0 siblings, 1 reply; 6+ messages in thread
From: Ni, Ray @ 2023-03-31 15:00 UTC (permalink / raw)
  To: devel@edk2.groups.io
  Cc: Kinney, Michael D, Ard Biesheuvel, 'Gerd Hoffmann'

[-- Attachment #1: Type: text/plain, Size: 565 bytes --]

Hi,
I found several of my PRs cannot pass the CI due to OVMF boot timeout.
e.g.: Mktme fix by niruiyu * Pull Request #4072 * tianocore/edk2 (github.com)<https://github.com/tianocore/edk2/pull/4072>

I also one PR from Ard (X64 text reloc fixes by ardbiesheuvel * Pull Request #4216 * tianocore/edk2 (github.com)<https://github.com/tianocore/edk2/pull/4216>) failed as well.

But I remember Mike increased the boot timeout from 1min to 2 mins.
Why does OVMF boot require more time than before?

Is that because Gerd enabled the SMP 4 boot?

Thanks,
Ray

[-- Attachment #2: Type: text/html, Size: 2764 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PR fails due to OVMF time out
  2023-03-31 15:00 PR fails due to OVMF time out Ni, Ray
@ 2023-03-31 19:46 ` Gerd Hoffmann
  2023-04-02 18:23   ` Michael D Kinney
  0 siblings, 1 reply; 6+ messages in thread
From: Gerd Hoffmann @ 2023-03-31 19:46 UTC (permalink / raw)
  To: Ni, Ray; +Cc: devel@edk2.groups.io, Kinney, Michael D, Ard Biesheuvel

On Fri, Mar 31, 2023 at 03:00:37PM +0000, Ni, Ray wrote:
> Hi,
> I found several of my PRs cannot pass the CI due to OVMF boot timeout.
> e.g.: Mktme fix by niruiyu * Pull Request #4072 * tianocore/edk2 (github.com)<https://github.com/tianocore/edk2/pull/4072>
> 
> I also one PR from Ard (X64 text reloc fixes by ardbiesheuvel * Pull Request #4216 * tianocore/edk2 (github.com)<https://github.com/tianocore/edk2/pull/4216>) failed as well.
> 
> But I remember Mike increased the boot timeout from 1min to 2 mins.
> Why does OVMF boot require more time than before?
> 
> Is that because Gerd enabled the SMP 4 boot?

That could be the reason.  It happened to work in my CI test run,
but maybe I was just lucky.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PR fails due to OVMF time out
  2023-03-31 19:46 ` Gerd Hoffmann
@ 2023-04-02 18:23   ` Michael D Kinney
  2023-04-03  8:21     ` Ard Biesheuvel
  0 siblings, 1 reply; 6+ messages in thread
From: Michael D Kinney @ 2023-04-02 18:23 UTC (permalink / raw)
  To: 'Gerd Hoffmann', Ni, Ray
  Cc: devel@edk2.groups.io, Ard Biesheuvel, Kinney, Michael D

Hi Gerd,

I have investigated this failure with enabling -smp 4.  I think this is an
important feature that should be on by default.

I did find that the results where inconsistent at 2 or 4 cpus on my laptop,
but going 8 or higher would fail consistently. I used -smp 32 for all the
testing below.

First, this failure mode is only seen with SMM_REQUIRE=1 builds.  If I
set SMM_REQUIRE=0, then QEMU can boot with different smp settings.
This appears to be a QEMU MP SMM related issue.

I first tried going back in edk2 history looking for a point where smp
boot would work using the version of QEMU used by EDK II CI agents which
is 2021.5.5 (QEMU 6.0.0)
* https://github.com/tianocore/edk2/blob/fc00ff286a541c047b7d343e66ec10890b80d3ea/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml#L142

I went back in edk2 history to the 2017/2018 timeframe and the issue was
still present, and I recall testing this feature back then myself, so I
do not think it is related to edk2 changes.

I then did a binary search through the QEMU releases:
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20170113.exe
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20170808.exe    2.10.0-rc2
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20171122.exe    2.11.0-rc2
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20171211.exe    2.11.0-rc5
PASS: https://qemu.weilnetz.de/w64/2017/qemu-w64-setup-20171217.exe    2.11.0
FAIL: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180321.exe    2.12.0-rc0 - AioContextPolling - Unrelated failure
PASS: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180404.exe    2.12.0-rc1
PASS: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180711.exe    3.0.0-rc0
PASS: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180807.exe    3.0.0-rc4
PASS: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20180815.exe    3.0.0 
FAIL: https://qemu.weilnetz.de/w64/2018/qemu-w64-setup-20181108.exe    3.1.0-rc0  - Long Delay at BSD Entry
FAIL: https://qemu.weilnetz.de/w64/2019/qemu-w64-setup-20181211.exe    3.1.0      - Long Delay at BDS Entry
FAIL: https://qemu.weilnetz.de/w64/2019/qemu-w64-setup-20190815.exe    4.1.0      - Long Delay at BDS Entry

It appears the failure was introduced in the transition from 3.0.0 -> 3.1.0.

QEMU 3.1 introduced Multi-Threaded TCG for x86 CPUs
* https://www.qemu.org/2018/12/12/qemu-3-1-0/
* https://qemu.readthedocs.io/en/latest/devel/multi-thread-tcg.html#multi-threaded-tcg

The feature can be disabled with the following QEMU command line option:
* -accel tcg,thread=single
* https://doc.ycharbi.fr/fichiers/virtualisation/qemu/documentation/qemu-doc-3.1.0.html

When I apply -accel tcg,thread-single to the QEMU command line, QEMU 3.1.0 passes.
When I apply -accel tcg,thread-single to the QEMU command line, QEMU 6.0.0 passes.

I think we can create a new PR with: -smp 4 -accel tcg,thead-single

I consider this a workaround, and we should investigate why Multi-Threaded
TCG breaks QEMU MP SMM.

Best regards,

Mike

> -----Original Message-----
> From: 'Gerd Hoffmann' <kraxel@redhat.com>
> Sent: Friday, March 31, 2023 12:47 PM
> To: Ni, Ray <ray.ni@intel.com>
> Cc: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Ard Biesheuvel <ardb@kernel.org>
> Subject: Re: PR fails due to OVMF time out
> 
> On Fri, Mar 31, 2023 at 03:00:37PM +0000, Ni, Ray wrote:
> > Hi,
> > I found several of my PRs cannot pass the CI due to OVMF boot timeout.
> > e.g.: Mktme fix by niruiyu * Pull Request #4072 * tianocore/edk2
> (github.com)<https://github.com/tianocore/edk2/pull/4072>
> >
> > I also one PR from Ard (X64 text reloc fixes by ardbiesheuvel * Pull Request #4216 * tianocore/edk2
> (github.com)<https://github.com/tianocore/edk2/pull/4216>) failed as well.
> >
> > But I remember Mike increased the boot timeout from 1min to 2 mins.
> > Why does OVMF boot require more time than before?
> >
> > Is that because Gerd enabled the SMP 4 boot?
> 
> That could be the reason.  It happened to work in my CI test run,
> but maybe I was just lucky.
> 
> take care,
>   Gerd


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PR fails due to OVMF time out
  2023-04-02 18:23   ` Michael D Kinney
@ 2023-04-03  8:21     ` Ard Biesheuvel
  2023-04-03 10:37       ` [edk2-devel] " Gerd Hoffmann
  0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2023-04-03  8:21 UTC (permalink / raw)
  To: Kinney, Michael D, Laszlo Ersek
  Cc: Gerd Hoffmann, Ni, Ray, devel@edk2.groups.io

On Sun, 2 Apr 2023 at 20:23, Kinney, Michael D
<michael.d.kinney@intel.com> wrote:
>
> Hi Gerd,
>
> I have investigated this failure with enabling -smp 4.  I think this is an
> important feature that should be on by default.
>

Agreed. And given your investigation below (thanks!), we should be
able to revert the revert as long as we add '-accel
tcg,thread-single', right?

It would be nice if we could also get some coverage for KVM, but for
the time being, this seems to be an improvement already (AFAIU, we
want -smp 4 primarily for the logic used in the code, while there is
very little code that would actually run concurrently, if any)

-- 
Ard.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [edk2-devel] PR fails due to OVMF time out
  2023-04-03  8:21     ` Ard Biesheuvel
@ 2023-04-03 10:37       ` Gerd Hoffmann
  2023-04-04 20:04         ` Michael D Kinney
  0 siblings, 1 reply; 6+ messages in thread
From: Gerd Hoffmann @ 2023-04-03 10:37 UTC (permalink / raw)
  To: devel, ardb; +Cc: Kinney, Michael D, Laszlo Ersek, Ni, Ray

On Mon, Apr 03, 2023 at 10:21:27AM +0200, Ard Biesheuvel wrote:
> On Sun, 2 Apr 2023 at 20:23, Kinney, Michael D
> <michael.d.kinney@intel.com> wrote:
> >
> > Hi Gerd,
> >
> > I have investigated this failure with enabling -smp 4.  I think this is an
> > important feature that should be on by default.
> >
> 
> Agreed. And given your investigation below (thanks!), we should be
> able to revert the revert as long as we add '-accel
> tcg,thread-single', right?

That is how I read it, yes.

> It would be nice if we could also get some coverage for KVM, but for
> the time being, this seems to be an improvement already (AFAIU, we
> want -smp 4 primarily for the logic used in the code, while there is
> very little code that would actually run concurrently, if any)

Yes, it's to improve MpInit test coverage.  Most code would continue to
run single-threaded.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [edk2-devel] PR fails due to OVMF time out
  2023-04-03 10:37       ` [edk2-devel] " Gerd Hoffmann
@ 2023-04-04 20:04         ` Michael D Kinney
  0 siblings, 0 replies; 6+ messages in thread
From: Michael D Kinney @ 2023-04-04 20:04 UTC (permalink / raw)
  To: devel@edk2.groups.io, kraxel@redhat.com, ardb@kernel.org
  Cc: Laszlo Ersek, Ni, Ray, Kinney, Michael D

Hi Gerd,

Do you want to submit a new email review/PR with this suggested change?

Thanks,

Mike

> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Gerd Hoffmann
> Sent: Monday, April 3, 2023 3:38 AM
> To: devel@edk2.groups.io; ardb@kernel.org
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Laszlo Ersek <lersek@redhat.com>; Ni, Ray <ray.ni@intel.com>
> Subject: Re: [edk2-devel] PR fails due to OVMF time out
> 
> On Mon, Apr 03, 2023 at 10:21:27AM +0200, Ard Biesheuvel wrote:
> > On Sun, 2 Apr 2023 at 20:23, Kinney, Michael D
> > <michael.d.kinney@intel.com> wrote:
> > >
> > > Hi Gerd,
> > >
> > > I have investigated this failure with enabling -smp 4.  I think this is an
> > > important feature that should be on by default.
> > >
> >
> > Agreed. And given your investigation below (thanks!), we should be
> > able to revert the revert as long as we add '-accel
> > tcg,thread-single', right?
> 
> That is how I read it, yes.
> 
> > It would be nice if we could also get some coverage for KVM, but for
> > the time being, this seems to be an improvement already (AFAIU, we
> > want -smp 4 primarily for the logic used in the code, while there is
> > very little code that would actually run concurrently, if any)
> 
> Yes, it's to improve MpInit test coverage.  Most code would continue to
> run single-threaded.
> 
> take care,
>   Gerd
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-04 20:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-31 15:00 PR fails due to OVMF time out Ni, Ray
2023-03-31 19:46 ` Gerd Hoffmann
2023-04-02 18:23   ` Michael D Kinney
2023-04-03  8:21     ` Ard Biesheuvel
2023-04-03 10:37       ` [edk2-devel] " Gerd Hoffmann
2023-04-04 20:04         ` Michael D Kinney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox