From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail05.groups.io (mail05.groups.io [45.79.224.7]) by spool.mail.gandi.net (Postfix) with ESMTPS id 29A8F740034 for ; Mon, 18 Nov 2024 21:33:00 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=sYQaxIcgAaaiXp6RYWH+2E1yCF93NP5KiXs0BN8oaOM=; c=relaxed/simple; d=groups.io; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject:To:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Resent-Date:Resent-From:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Type:Content-Transfer-Encoding; s=20240830; t=1731965579; v=1; x=1732224778; b=m2JLsGdjTlsDDNrR+PMQGGb1KXU2ShuZFEHN6S6NtDHQ6cVRELL7T32u8LTfxSA8idYQBt4D fnB6/ztHMN7nFN+NIRyKlR9lDr0cVqoMH19W401D7ff0MRn3reQFfMybdewnsHFHk55NKQ63wyD wR7D3aDoVUYd4QSI8r+AkdBQOjiNqYnZgwF03ffEL0VO8Xb6d4fS4mUz0u60B29F2YlPImy0pWC lUnmUcYrk6CuSFk5EWJ5nW2bfXjFNHo3sEIju89jQ6GfkdNNtgXWhParS9TcHKtOk6ydKTQAsAg bQF+s301EdnNSOENEFmqn2WFBzynVGXhxBniSM/zdEWIg== X-Received: by 127.0.0.2 with SMTP id Ut8bYY7687511xYASOiqZ1Xh; Mon, 18 Nov 2024 13:32:58 -0800 X-Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by mx.groups.io with SMTP id smtpd.web11.5210.1731965577606070728 for ; Mon, 18 Nov 2024 13:32:57 -0800 X-Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id EA75A5C00EF for ; Mon, 18 Nov 2024 21:32:12 +0000 (UTC) X-Received: by smtp.kernel.org (Postfix) with ESMTPSA id A9B7BC4CECC for ; Mon, 18 Nov 2024 21:32:56 +0000 (UTC) X-Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-2ff5d2b8f0eso33941741fa.3 for ; Mon, 18 Nov 2024 13:32:56 -0800 (PST) X-Gm-Message-State: jjBOwQRLG4zC19frIYCxSHlDx7686176AA= X-Google-Smtp-Source: AGHT+IEs02cEKcrbkTL8rZfAzhc6fnfTZAZWZDJOaJWad7TrFjuZVr/OlXE4U9UuyGxAydDE5leFYHsSBUcU6sqROEI= X-Received: by 2002:a05:651c:1543:b0:2fb:407b:1702 with SMTP id 38308e7fff4ca-2ff606618ecmr58867361fa.20.1731965575045; Mon, 18 Nov 2024 13:32:55 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: "Ard Biesheuvel via groups.io" Date: Mon, 18 Nov 2024 22:32:44 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [edk2-devel] [BUG] Extremely slow boot times with CPU and GPU passthrough and host phys-bits > 40 To: devel@edk2.groups.io, mitchell.augustin@canonical.com, Gerd Hoffmann Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Resent-Date: Mon, 18 Nov 2024 13:32:57 -0800 Resent-From: ardb@kernel.org Reply-To: devel@edk2.groups.io,ardb@kernel.org List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20240830 header.b=m2JLsGdj; dmarc=pass (policy=none) header.from=groups.io; spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 45.79.224.7 as permitted sender) smtp.mailfrom=bounce@groups.io (cc Gerd) On Mon, 18 Nov 2024 at 20:26, mitchell.augustin via groups.io wrote: > > Hello! > We've identified an issue with OVMF that causes the boot time of VMs to b= e considerably slower (usually taking 10+ minutes more) under (at least) th= e following conditions: > * CPU passthrough is used > * Host has phys-bits > 40 > * GPU PCI passthrough is used > This slowdown was not present before commit https://github.com/tianocore/= edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650 and is still present i= n the latest upstream edk2 version. Without that patch, we are only able to= utilize passed-through Nvidia GPUs when the kernel options `pci=3Dnocrs pc= i=3Drealloc` are set in the guest. With the patch, we no longer need those = kernel opts in the guest, but PCI enumeration and BAR assignment of the pas= sed-through GPUs (and some other boot steps that may or may not be related)= proceed extremely slowly. > I tested the following virt-install command on our DGX H100, running upst= ream OVMF @ https://github.com/tianocore/edk2/commit/ef4f3aa3f7e3c28c7f0e1a= 3c35711f1a85becd71 built with verbose debug output enabled to identify area= s where the boot process appeared to be spending the most time. I have atta= ched the full logs from that VM (hidon-slow-ovmf-verbose.txt) as well as a = "human view" of what that process looked like to me, since I did not have a= ccurate wall-clock timestamps in the console output (h100-verbose-vm-logs.t= xt). > > I also confirmed that this same issue is present under the same condition= s as above on our DGX Station A100 when using a slightly different VM confi= g (which I can provide if necessary), so it likely affects any host with en= ough physbits, when the CPU is passed through. > > Full lscpu output for DGX H100 is attached (as lscpu-h100.txt). In the gu= est VM, the address sizes were the same when CPU passthrough was used. > For the A100 station, I logged Address sizes: 43 bits physical, 48 bits v= irtual from lscpu (I can get the rest of the lscpu output as well if it wou= ld be relevant). Strangely though, despite CPU passthrough being enabled th= ere as well, the guest saw Address sizes: 48 bits physical, 48 bits virtual= . > > Please let me know if there is any clarification or other information I c= an provide that could help you debug this issue. Thanks, > Mitchell Augustin > > Attachments: > > hidon-slow-ovmf-verbose.txt > h100-verbose-vm-logs.txt > lscpu-h100.txt > >=20 -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120795): https://edk2.groups.io/g/devel/message/120795 Mute This Topic: https://groups.io/mt/109651206/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-