From: "xpahos via groups.io" <xpahos=gmail.com@groups.io>
To: mitchell.augustin@canonical.com, devel@edk2.groups.io
Subject: Re: [edk2-devel] [BUG] Extremely slow boot times with CPU and GPU passthrough and host phys-bits > 40
Date: Wed, 20 Nov 2024 01:35:04 -0800 [thread overview]
Message-ID: <2000.1732095304681550888@groups.io> (raw)
In-Reply-To: <24085.1732055112128290386@groups.io>
[-- Attachment #1: Type: text/plain, Size: 3658 bytes --]
Hello, Mitchell.
> Thanks for the suggestion. I'm not necessarily saying this patch itself has an issue, just that it is the point in the git history at which this slow boot time issue manifests for us. This may be because the patch does actually fix the other issue I described above related to BAR assignment not working correctly in versions before that patch, despite boot being faster back then. (in those earlier versions, the PCI devices for the GPUs were passed through, but the BAR assignment was erroneous, so we couldn't actually use them - the Nvidia GPU driver would just throw errors.)
tl;dr For GPU instances, a huge amount of memory is required for the VM to be able to map BARs. So, the amount of memory required for MMIO could be insufficient and OVMF was rejecting some PCI devices during the initialisation phase. To fix this, there is an opt/ovmf/X-PciMmio64Mb option that increases the MMIO size. This patch adds functionality that automatically adjusts the MMIO size based on the number of physical bits. As a starting point, I would try running an old build of OVMF and running grep on ‘rejected’ to make sure that no GPUs were taken out of service while OVMF was running.
> After I initially posted here, we also discovered another kernel issue that was contributing to the boot times for this config exceeding 5 minutes - so with that isolated, I can say that my config only takes about a 5 minutes for a full boot: 1-2 minutes for `virsh start` (which scales with guest memory allocation), and about 2-3 minutes of time spent on PCIe initialization / BAR assignment for 2 to 4 GPUs (attached). This was still the case when I tried with my GPUs attached in the way you suggested. I'll attach the xml config for that and for my original VM in case I may have configured something incorrectly there.
> With that said, I have a more basic question - do you expect that it should take upwards of 30 seconds after `virsh start` completes before I see any output in `virsh console`, or that PCI devices' memory window assignments in the VM should take 45-90 seconds per passed-through GPU? (given that when the same kernel on the host initializes these devices, it doesn't take nearly this long?)
I'm not sure I can help you, we don't use virsh. But the linux kernel also takes a long time to initialise NVIDIA GPU using SeaBIOS. Another way to check the boot time is to hot-plug the cards after booting. I don't know how this works in virsh. I made a script for expect to emulate hot-plug:
```
#!/bin/bash
CWD="$(dirname "$(realpath "$0")")"
/usr/bin/expect <<EOF
spawn $CWD/qmp-shell $CWD/qmp.sock
send -- "query-pci\r"
send -- "device_add driver=pci-gpu-testdev bus=s30 regions=mpx2M vendorid=5555 deviceid=4126\r"
```
> I'm going to attempt to profile ovmf next to see what part of the code path is taking up the most time, but if you already have an idea of what that might be (and whether it is actually a bug or expected to take that long), that insight would be appreciated.
We just started migration from SeaBIOS to UEFI/SecureBoot, so I know only some parts of the OVMF code which is used for enumeration/initialisation of PCI devices. I'm not core developer of edk2, just solving the same problems with starting VMs with GPUs.
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120803): https://edk2.groups.io/g/devel/message/120803
Mute This Topic: https://groups.io/mt/109651206/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
[-- Attachment #2: Type: text/html, Size: 14383 bytes --]
next prev parent reply other threads:[~2024-11-20 9:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-14 16:46 [edk2-devel] [BUG] Extremely slow boot times with CPU and GPU passthrough and host phys-bits > 40 mitchell.augustin via groups.io
2024-11-18 21:14 ` xpahos via groups.io
2024-11-19 22:25 ` mitchell.augustin via groups.io
2024-11-20 9:35 ` xpahos via groups.io [this message]
2024-11-20 11:26 ` Gerd Hoffmann via groups.io
2024-11-20 15:20 ` mitchell.augustin via groups.io
2024-11-20 20:00 ` mitchell.augustin via groups.io
2024-11-21 12:32 ` Gerd Hoffmann via groups.io
2024-11-22 0:23 ` mitchell.augustin via groups.io
2024-11-22 10:35 ` Gerd Hoffmann via groups.io
2024-11-22 17:38 ` Brian J. Johnson via groups.io
2024-11-22 22:32 ` mitchell.augustin via groups.io
2024-11-24 2:05 ` mitchell.augustin via groups.io
2024-11-25 11:47 ` Gerd Hoffmann via groups.io
2024-11-25 19:58 ` mitchell.augustin via groups.io
2024-11-26 8:09 ` Gerd Hoffmann via groups.io
2024-11-26 22:27 ` mitchell.augustin via groups.io
2024-12-04 14:56 ` mitchell.augustin via groups.io
2024-11-25 11:18 ` Gerd Hoffmann via groups.io
2024-11-18 21:32 ` Ard Biesheuvel via groups.io
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2000.1732095304681550888@groups.io \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox