Hello!
We've identified an issue with OVMF that causes the boot time of VMs to be considerably slower (usually taking 10+ minutes more) under (at least) the following conditions:
 * CPU passthrough is used
 * Host has phys-bits > 40
 * GPU PCI passthrough is used
This slowdown was not present before commit https://github.com/tianocore/edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650 and is still present in the latest upstream edk2 version. Without that patch, we are only able to utilize passed-through Nvidia GPUs when the kernel options `pci=nocrs pci=realloc` are set in the guest. With the patch, we no longer need those kernel opts in the guest, but PCI enumeration and BAR assignment of the passed-through GPUs (and some other boot steps that may or may not be related) proceed extremely slowly.
I tested the following virt-install command on our DGX H100, running upstream OVMF @ https://github.com/tianocore/edk2/commit/ef4f3aa3f7e3c28c7f0e1a3c35711f1a85becd71 built with verbose debug output enabled to identify areas where the boot process appeared to be spending the most time. I have attached the full logs from that VM (hidon-slow-ovmf-verbose.txt) as well as a "human view" of what that process looked like to me, since I did not have accurate wall-clock timestamps in the console output (h100-verbose-vm-logs.txt).
 
I also confirmed that this same issue is present under the same conditions as above on our DGX Station A100 when using a slightly different VM config (which I can provide if necessary), so it likely affects any host with enough physbits, when the CPU is passed through.
 
Full lscpu output for DGX H100 is attached (as lscpu-h100.txt). In the guest VM, the address sizes were the same when CPU passthrough was used.
For the A100 station, I logged Address sizes: 43 bits physical, 48 bits virtual from lscpu (I can get the rest of the lscpu output as well if it would be relevant). Strangely though, despite CPU passthrough being enabled there as well, the guest saw Address sizes: 48 bits physical, 48 bits virtual.
 
Please let me know if there is any clarification or other information I can provide that could help you debug this issue. Thanks,
Mitchell Augustin
_._,_._,_

Groups.io Links:

You receive all messages sent to this group.

View/Reply Online (#120789) | | Mute This Topic | New Topic
Your Subscription | Contact Group Owner | Unsubscribe [rebecca@openfw.io]

_._,_._,_