From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bounce+27952+120803+7686176+12367111@groups.io>
Received: from mail05.groups.io (mail05.groups.io [45.79.224.7])
	by spool.mail.gandi.net (Postfix) with ESMTPS id CD194AC10EB
	for <rebecca@openfw.io>; Wed, 20 Nov 2024 09:35:06 +0000 (UTC)
DKIM-Signature: a=rsa-sha256; bh=jPn/SPGMH6k4KjFx2vm/1IuVc5oXJZqvuazl5NVtjLc=;
 c=relaxed/simple; d=groups.io;
 h=Subject:To:From:User-Agent:MIME-Version:Date:References:In-Reply-To:Message-ID:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Type;
 s=20240830; t=1732095306; v=1; x=1732354505;
 b=R3xFrIHqDCsNlkx9sqvmclSZD8ITjqmrF9mf1QGL2XpvJNmhTTbtHOlUZU4k56VIfELCBN9C
 J9NzUc0Wp1AG33qozdYTFTxnn6ohTG/kU0VYVtVwABT87VEn9kxNR5HSqjDYmYXUFiRnKTuhL0b
 bH7SVO9FyfpoSO36PEvTZFlEpJ+kjRcJpdDHjcELfgCFYdN4a3DOdDsz9g2Ju84HcXcvZ3qQTHD
 G64/ByN4/zlRuDS7hWntM8YvQ6q1XAA3B9v7o+HclXbBMn3zyEc3uwEPHe6NPGgmwL7Ua0poBId
 pVOZCpjZApko3iTrOLgv9eODgFFfcPBL+2M9byjk3m1vw==
X-Received: by 127.0.0.2 with SMTP id QrjqYY7687511xi9GvjWXYeO; Wed, 20 Nov 2024 01:35:05 -0800
Subject: Re: [edk2-devel] [BUG] Extremely slow boot times with CPU and GPU passthrough and host phys-bits > 40
To: mitchell.augustin@canonical.com, devel@edk2.groups.io
From: "xpahos via groups.io" <xpahos=gmail.com@groups.io>
X-Originating-Location: RU (217.15.57.160)
X-Originating-Platform: Mac Chrome 124
User-Agent: GROUPS.IO Web Poster
MIME-Version: 1.0
Date: Wed, 20 Nov 2024 01:35:04 -0800
References: <24085.1732055112128290386@groups.io>
In-Reply-To: <24085.1732055112128290386@groups.io>
Message-ID: <2000.1732095304681550888@groups.io>
Precedence: Bulk
List-Subscribe: <mailto:devel+subscribe@edk2.groups.io>
List-Help: <mailto:devel+help@edk2.groups.io>
Sender: devel@edk2.groups.io
List-Id: <devel.edk2.groups.io>
Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io
Reply-To: devel@edk2.groups.io,xpahos@gmail.com
List-Unsubscribe-Post: List-Unsubscribe=One-Click
List-Unsubscribe: <https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/plugh>
X-Gm-Message-State: XwjfJsCxB5cg6IgwoyXO0XWqx7686176AA=
Content-Type: multipart/alternative; boundary="IaT18p1qnKXecQirw9fj"
X-GND-Status: LEGIT
Authentication-Results: spool.mail.gandi.net;
	dkim=pass header.d=groups.io header.s=20240830 header.b=R3xFrIHq;
	dmarc=pass (policy=none) header.from=groups.io;
	spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 45.79.224.7 as permitted sender) smtp.mailfrom=bounce@groups.io

--IaT18p1qnKXecQirw9fj
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Hello, Mitchell.

> Thanks for the suggestion. I'm not necessarily saying this patch itself h=
as an issue, just that it is the point in the git history at which this slo=
w boot time issue manifests for us. This may be because the patch does actu=
ally fix the other issue I described above related to BAR assignment not wo=
rking correctly in versions before that patch, despite boot being faster ba=
ck then. (in those earlier versions, the PCI devices for the GPUs were pass=
ed through, but the BAR assignment was erroneous, so we couldn't actually u=
se them - the Nvidia GPU driver would just throw errors.)

tl;dr For GPU instances, a huge amount of memory is required for the VM to =
be able to map BARs. So, the amount of memory required for MMIO could be in=
sufficient and OVMF was rejecting some PCI devices during the initialisatio=
n phase. To fix this, there is an opt/ovmf/X-PciMmio64Mb option that increa=
ses the MMIO size. This patch adds functionality that automatically adjusts=
 the MMIO size based on the number of physical bits. As a starting point, I=
 would try running an old build of OVMF and running grep on =E2=80=98reject=
ed=E2=80=99 to make sure that no GPUs were taken out of service while OVMF =
was running.

> After I initially posted here, we also discovered another kernel issue th=
at was contributing to the boot times for this config exceeding 5 minutes -=
 so with that isolated, I can say that my config only takes about a 5 minut=
es for a full boot: 1-2 minutes for `virsh start` (which scales with guest =
memory allocation), and about 2-3 minutes of time spent on PCIe initializat=
ion / BAR assignment for 2 to 4 GPUs (attached). This was still the case wh=
en I tried with my GPUs attached in the way you suggested. I'll attach the =
xml config for that and for my original VM in case I may have configured so=
mething incorrectly there.
> With that said, I have a more basic question - do you expect that it shou=
ld take upwards of 30 seconds after `virsh start` completes before I see an=
y output in `virsh console`, or that PCI devices' memory window assignments=
 in the VM should take 45-90 seconds per passed-through GPU? (given that wh=
en the same kernel on the host initializes these devices, it doesn't take n=
early this long?)

I'm not sure I can help you, we don't use virsh. But the linux kernel also =
takes a long time to initialise NVIDIA GPU using SeaBIOS. Another way to ch=
eck the boot time is to hot-plug the cards after booting. I don't know how =
this works in virsh. I made a script for expect to emulate hot-plug:

```
#!/bin/bash
CWD=3D"$(dirname "$(realpath "$0")")"
/usr/bin/expect <<EOF
spawn $CWD/qmp-shell $CWD/qmp.sock
send -- "query-pci\r"
send -- "device_add driver=3Dpci-gpu-testdev bus=3Ds30 regions=3Dmpx2M vend=
orid=3D5555 deviceid=3D4126\r"
```

> I'm going to attempt to profile ovmf next to see what part of the code pa=
th is taking up the most time, but if you already have an idea of what that=
 might be (and whether it is actually a bug or expected to take that long),=
 that insight would be appreciated.

We just started migration from SeaBIOS to UEFI/SecureBoot, so I know only s=
ome parts of the OVMF code which is used for enumeration/initialisation of =
PCI devices. I'm not core developer of edk2, just solving the same problems=
 with starting VMs with GPUs.


-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120803): https://edk2.groups.io/g/devel/message/120803
Mute This Topic: https://groups.io/mt/109651206/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-


--IaT18p1qnKXecQirw9fj
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<div><span style=3D"color: #333333; font-family: system-ui, 'Segoe UI', Rob=
oto, 'Helvetica Neue', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, '=
Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'=
; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font=
-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2=
; text-align: start; text-indent: 0px; text-transform: none; widows: 2; wor=
d-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; backgr=
ound-color: #ffffff; text-decoration-thickness: initial; text-decoration-st=
yle: initial; text-decoration-color: initial; display: inline !important; f=
loat: none;">Hello, Mitchell.</span></div>
<div>&nbsp;</div>
<div><span style=3D"color: #333333; font-family: system-ui, 'Segoe UI', Rob=
oto, 'Helvetica Neue', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, '=
Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'=
; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font=
-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2=
; text-align: start; text-indent: 0px; text-transform: none; widows: 2; wor=
d-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; backgr=
ound-color: #ffffff; text-decoration-thickness: initial; text-decoration-st=
yle: initial; text-decoration-color: initial; display: inline !important; f=
loat: none;">&gt; <span style=3D"color: #333333; font-family: system-ui, 'S=
egoe UI', Roboto, 'Helvetica Neue', 'Noto Sans', 'Liberation Sans', Arial, =
sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto=
 Color Emoji'; font-size: 16px; font-style: normal; font-variant-ligatures:=
 normal; font-variant-caps: normal; font-weight: 400; letter-spacing: norma=
l; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; w=
idows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: n=
ormal; background-color: #ffffff; text-decoration-thickness: initial; text-=
decoration-style: initial; text-decoration-color: initial; display: inline =
!important; float: none;">Thanks for the suggestion. I'm not necessarily sa=
ying this patch itself has an issue, just that it is the point in the git h=
istory at which this slow boot time issue manifests for us. This may be bec=
ause the patch does actually fix the other issue I described above related =
to BAR assignment not working correctly in versions before that patch, desp=
ite boot being faster back then. (in those earlier versions, the PCI device=
s for the GPUs were passed through, but the BAR assignment was erroneous, s=
o we couldn't actually use them - the Nvidia GPU driver would just throw er=
rors.)</span></span></div>
<div>&nbsp;</div>
<div>tl;dr For GPU instances, a huge amount of memory is required for the V=
M to be able to map BARs. So, the amount of memory required for MMIO could =
be insufficient and OVMF was rejecting some PCI devices during the initiali=
sation phase. To fix this, there is an opt/ovmf/X-PciMmio64Mb option that i=
ncreases the MMIO size. This patch adds functionality that automatically ad=
justs the MMIO size based on the number of physical bits. As a starting poi=
nt, I would try running an old build of OVMF and running grep on &lsquo;rej=
ected&rsquo; to make sure that no GPUs were taken out of service while OVMF=
 was running.</div>
<div>&nbsp;</div>
<div>&gt; <span style=3D"background-color: #ffffff;">After I initially post=
ed here, we also discovered another kernel issue that was contributing to t=
he boot times for this config exceeding 5 minutes - so with that isolated, =
I can say that my config only takes about a 5 minutes for a full boot: 1-2 =
minutes for `virsh start` (which scales with guest memory allocation), and =
about 2-3 minutes of time spent on PCIe initialization / BAR assignment for=
 2 to 4 GPUs (attached). This was still the case when I tried with my GPUs =
attached in the way you suggested. I'll attach the xml config for that and =
for my original VM in case I may have configured something incorrectly ther=
e.</span>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">&gt; With that said, I have a more basic questi=
on<span>&nbsp;</span><span style=3D"box-sizing: border-box; overflow-wrap: =
break-word !important;">- do you expect that it<span>&nbsp;</span><em style=
=3D"box-sizing: border-box; overflow-wrap: break-word !important;">should</=
em><span>&nbsp;</span>take upwards of 30 seconds after `virsh start` comple=
tes before I see any output in `virsh console`, or that PCI devices' memory=
 window assignments in the VM should take 45-90 seconds per passed-through =
GPU? (given that when the same kernel on the host initializes these devices=
, it doesn't take nearly this long?)</span></div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">&nbsp;</div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">I'm not sure I can help you, we don't use virsh=
. But the linux kernel also takes a long time to initialise NVIDIA GPU usin=
g SeaBIOS. Another way to check the boot time is to hot-plug the cards afte=
r booting. I don't know how this works in virsh. I made a script for expect=
 to emulate hot-plug:</div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">&nbsp;</div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">```</div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">
<div>#!/bin/bash</div>
<div>CWD=3D"$(dirname "$(realpath "$0")")"</div>
<div>/usr/bin/expect &lt;&lt;EOF<br />spawn $CWD/qmp-shell $CWD/qmp.sock</d=
iv>
<div>send -- "query-pci\r"</div>
<div>send -- "device_add driver=3Dpci-gpu-testdev bus=3Ds30 regions=3Dmpx2M=
 vendorid=3D5555 deviceid=3D4126\r"</div>
<div>```</div>
</div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">&nbsp;</div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">&gt; <span style=3D"color: #333333; font-family=
: system-ui, 'Segoe UI', Roboto, 'Helvetica Neue', 'Noto Sans', 'Liberation=
 Sans', Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI=
 Symbol', 'Noto Color Emoji'; font-size: 16px; font-style: normal; font-var=
iant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter=
-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-tra=
nsform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;=
 white-space: normal; background-color: #ffffff; text-decoration-thickness:=
 initial; text-decoration-style: initial; text-decoration-color: initial; d=
isplay: inline !important; float: none;">I'm going to attempt to profile ov=
mf next to see what part of the code path is taking up the most time, but i=
f you already have an idea of what that might be (and whether it is actuall=
y a bug or expected to take that long), that insight would be appreciated.<=
/span></div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">&nbsp;</div>
<div style=3D"box-sizing: border-box; overflow-wrap: break-word !important;=
 color: #333333; font-family: system-ui, 'Segoe UI', Roboto, 'Helvetica Neu=
e', 'Noto Sans', 'Liberation Sans', Arial, sans-serif, 'Apple Color Emoji',=
 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; font-size: 16px; =
font-style: normal; font-variant-ligatures: normal; font-variant-caps: norm=
al; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start=
; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -we=
bkit-text-stroke-width: 0px; white-space: normal; background-color: #ffffff=
; text-decoration-thickness: initial; text-decoration-style: initial; text-=
decoration-color: initial;">We just started migration from SeaBIOS to UEFI/=
SecureBoot, so I know only some parts of the OVMF code which is used for en=
umeration/initialisation of PCI devices. I'm not core developer of edk2, ju=
st solving the same problems with starting VMs with GPUs.&nbsp;</div>
</div>


<div width=3D"1" style=3D"color:white;clear:both">_._,_._,_</div>
<hr>


Groups.io Links:<p>


 =20
    You receive all messages sent to this group.
 =20
 =20


<p>
<a target=3D"_blank" href=3D"https://edk2.groups.io/g/devel/message/120803"=
>View/Reply Online (#120803)</a> |


 =20

|

  <a target=3D"_blank" href=3D"https://groups.io/mt/109651206/7686176">Mute=
 This Topic</a>


| <a href=3D"https://edk2.groups.io/g/devel/post">New Topic</a>

<br>


<a href=3D"https://edk2.groups.io/g/devel/editsub/7686176">Your Subscriptio=
n</a> |
<a href=3D"mailto:devel+owner@edk2.groups.io">Contact Group Owner</a> |

<a href=3D"https://edk2.groups.io/g/devel/unsub">Unsubscribe</a>

 [rebecca@openfw.io]<br>
<div width=3D"1" style=3D"color:white;clear:both">_._,_._,_</div>


--IaT18p1qnKXecQirw9fj--