Re: [edk2-devel] [BUG] Extremely slow boot times with CPU and GPU passthrough and host phys-bits > 40

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

From: "mitchell.augustin via groups.io" <mitchell.augustin=canonical.com@groups.io>
To: "Gerd Hoffmann" <kraxel@redhat.com>, devel@edk2.groups.io
Subject: Re: [edk2-devel] [BUG] Extremely slow boot times with CPU and GPU passthrough and host phys-bits > 40
Date: Wed, 20 Nov 2024 07:20:39 -0800	[thread overview]
Message-ID: <18748.1732116039858528046@groups.io> (raw)
In-Reply-To: <byqr63sxho6xgwkgivs5wfvt6t6wnantv54p4bhcrlayvhvh2p@zsl3m7hxfcyf>

[-- Attachment #1: Type: text/plain, Size: 4307 bytes --]

@Gerd

> Do you also see the slowdown without the GPU in a otherwise identical
guest configuration?

No - without the GPUs, the entire boot process takes less than 30 seconds (which is true before and after the dynamic mmio window size patch ( https://github.com/tianocore/edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650 ) ).

> Looks quite high to me.  What amount of guest memory we are talking
about?

It is a pretty large memory allocation - over 900GB - so I'm not surprised that the initial allocation during `virsh start` takes a while when PCIe devices are passed through, since that allocation has to happen at init time. `virsh start` also takes the same amount of time with or without the dynamic mmio window size patch, but its time does scale with amount of memory allocated. (although I expect that, given that the time consuming part is just that memory allocation.)

> More details would be helpful indeed.  Is that a general overall
slowdown?  Is it some specific part which takes alot of time?

The part of the kernel boot that I highlighted in https://edk2.groups.io/g/devel/attachment/120801/2/this-part-takes-2-3-minutes.txt (which I think is PCIe device initialization and BAR assignment) is the part that seems slower than it should be. Each section of that log starting with "acpiphp: Slot <slot> registered" takes probably 15 seconds, so this whole section adds up to a few minutes. That part also does not scale with memory allocation, just with number of GPUs passed through. (in this log, I had 4 GPUs attached, IIRC).

Without the dynamic mmio window size patch, if I set my guest kernel to use `pci=nocrs pci=realloc`, this boot slowdown disappears and I am able to use the GPU with some conditions (details below).

@xpahos:

> This patch adds functionality that automatically adjusts the MMIO size based on the number of physical bits. As a starting point, I would try running an old build of OVMF and running grep on ‘rejected’ to make sure that no GPUs were taken out of service while OVMF was running.

I haven't looked for this in OVMF debug output, but what you say here seems realistic, given that my VMs without the dynamic mmio window size patch throw many errors like this during guest kernel boot:
[    4.650955] pci 0000:00:01.5: BAR 15: no space for [mem size 0x3000000000 64bit pref]
[    4.651700] pci 0000:00:01.5: BAR 15: failed to assign [mem size 0x3000000000 64bit pref]

(and subsequently, the GPUs are not usable in the VMs (but the PCI devices are still present)). So it would make sense if the fast boot time in those versions is simply attributed to the kernel "giving up" on all of those right away, before the slow path starts. The only confusing part to me then is why I would not see this part ( https://edk2.groups.io/g/devel/attachment/120801/2/this-part-takes-2-3-minutes.txt ) going so slowly when I use a version of OVMF with the dynamic mmio window size patch reverted but with my guest kernel having `pci=realloc pci=nocrs` set. Under those circumstances, I have a fast boot time and my passed-through GPUs work. (although I do still see some outputs like this during linux boot:
[    4.592009] pci 0000:06:00.0: can't claim BAR 0 [mem 0xffffffffff000000-0xffffffffffffffff 64bit pref]: no compatible bridge window
[    4.593477] pci 0000:06:00.0: can't claim BAR 2 [mem 0xffffffe000000000-0xffffffffffffffff 64bit pref]: no compatible bridge window
[    4.593817] pci 0000:06:00.0: can't claim BAR 4 [mem 0xfffffffffe000000-0xffffffffffffffff 64bit pref]: no compatible bridge window
and sometimes the loading of the Nvidia driver does introduce some brief lockups ( https://pastebin.ubuntu.com/p/J3TH3S7Xhd/ ) )

> But the linux kernel also takes a long time to initialise NVIDIA GPU using SeaBIOS

This is good to know... given this and the above, I'm starting to wonder if it might actually be a kernel issue...

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120805): https://edk2.groups.io/g/devel/message/120805
Mute This Topic: https://groups.io/mt/109651206/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-

[-- Attachment #2: Type: text/html, Size: 5490 bytes --]

next prev parent reply	other threads:[~2024-11-20 15:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-14 16:46 [edk2-devel] [BUG] Extremely slow boot times with CPU and GPU passthrough and host phys-bits > 40 mitchell.augustin via groups.io
2024-11-18 21:14 ` xpahos via groups.io
2024-11-19 22:25   ` mitchell.augustin via groups.io
2024-11-20  9:35     ` xpahos via groups.io
2024-11-20 11:26     ` Gerd Hoffmann via groups.io
2024-11-20 15:20       ` mitchell.augustin via groups.io [this message]
2024-11-20 20:00         ` mitchell.augustin via groups.io
2024-11-21 12:32         ` Gerd Hoffmann via groups.io
2024-11-22  0:23           ` mitchell.augustin via groups.io
2024-11-22 10:35             ` Gerd Hoffmann via groups.io
2024-11-22 17:38               ` Brian J. Johnson via groups.io
2024-11-22 22:32               ` mitchell.augustin via groups.io
2024-11-24  2:05                 ` mitchell.augustin via groups.io
2024-11-25 11:47                   ` Gerd Hoffmann via groups.io
2024-11-25 19:58                     ` mitchell.augustin via groups.io
2024-11-26  8:09                       ` Gerd Hoffmann via groups.io
2024-11-26 22:27                         ` mitchell.augustin via groups.io
2024-12-04 14:56                           ` mitchell.augustin via groups.io
2025-02-14 23:59                             ` mitchell.augustin via groups.io
2025-03-13  0:49                               ` mitchell.augustin via groups.io
2025-06-20 20:56                                 ` mitchell.augustin via groups.io
2024-11-25 11:18                 ` Gerd Hoffmann via groups.io
2024-11-18 21:32 ` Ard Biesheuvel via groups.io

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18748.1732116039858528046@groups.io \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox