From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by spool.mail.gandi.net (Postfix) with ESMTPS id 2049CD804CC for ; Tue, 31 Oct 2023 22:36:43 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=xtkT9tFx1EESBwO94XQur/oM/fQ5u9A7IRwnl8V82TI=; c=relaxed/simple; d=groups.io; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:Reply-To:References:In-Reply-To:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Language:Content-Type:Content-Transfer-Encoding; s=20140610; t=1698791802; v=1; b=q9CbgPgFNbLI//gzz6JPVwfaYgJG/SMfOvUde1OHqfSODzLBntDcxGQlB7CeHpEDxLznZZRn KvxZLV4jNHAei07uqPzBdoejCmzK0a4FXdetd7F6GISDDgp7aUbp538Oxfw12qm2ihXLORcgzSG vE/8z/jXS/Th9X3cSNDVLuS8= X-Received: by 127.0.0.2 with SMTP id Y94OYY7687511xm5iyrCNs8A; Tue, 31 Oct 2023 15:36:42 -0700 X-Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web10.9084.1698791802099997655 for ; Tue, 31 Oct 2023 15:36:42 -0700 X-Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-529-ycKPT5QxNK6A5jPIgSv1tg-1; Tue, 31 Oct 2023 18:36:37 -0400 X-MC-Unique: ycKPT5QxNK6A5jPIgSv1tg-1 X-Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 72C751C18CCA; Tue, 31 Oct 2023 22:36:36 +0000 (UTC) X-Received: from [10.39.195.34] (unknown [10.39.195.34]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B9C3210F51; Tue, 31 Oct 2023 22:36:34 +0000 (UTC) Message-ID: Date: Tue, 31 Oct 2023 23:36:33 +0100 MIME-Version: 1.0 Subject: Re: [edk2-devel] [PATCH 0/7] Support Tdx and sev in BaseIoLibIntrinsic and remove BaseIoLibIntrinsicSev From: "Laszlo Ersek" To: devel@edk2.groups.io, dun.tan@intel.com, "thomas.lendacky@amd.com" , "leo.duran@amd.com" , "brijesh.singh@amd.com" , "Chang, Abner" , "michael.roth@amd.com" , "Attar, AbdulLateef (Abdul Lateef)" Cc: "Ni, Ray" , "Yao, Jiewen" Reply-To: devel@edk2.groups.io,lersek@redhat.com References: <20231027054300.1382-1-dun.tan@intel.com> <3692a402-fede-9534-0517-72298ff2aff8@redhat.com> <0a9e392d-3c51-1992-9c18-f11a0aaf9fe6@redhat.com> In-Reply-To: <0a9e392d-3c51-1992-9c18-f11a0aaf9fe6@redhat.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: X-Gm-Message-State: FH6X5NUvbAZDDc6mVZzzljaAx7686176AA= Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20140610 header.b=q9CbgPgF; spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=redhat.com (policy=none) On 10/31/23 18:06, Laszlo Ersek wrote: > (3) During all this discussion here, I have been *distinctly* > recalling that I've experienced an *actual performance regression* > sometime, after we had unwittingly switched from the REP variant to > the non-REP variant. > > And now I have actually found it. I'm going to quote the commit > message in full: > >> commit fb8b54694c53e73e1b6a098686e908f54f9bb7a9 >> Author: Laszlo Ersek >> Date: Thu Apr 7 22:28:38 2016 +0200 >> >> UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and writes of IO ports >> >> * Short description: >> >> The CpuIoServiceRead() and CpuIoServiceWrite() functions transfer = data >> between memory and IO ports with individual Io(Read|Write)(8|16|32= ) >> function calls, each in an appropriately set up loop. >> >> On the Ia32 and X64 platforms however, FIFO reads and writes can b= e >> optimized, by coding them in assembly, and delegating the loop to = the >> CPU, with the REP prefix. >> >> On KVM virtualization hosts, this difference has a huge performanc= e >> impact: if the loop is open-coded, then the virtual machine traps = to the >> hypervisor on every single UINT8 / UINT16 / UINT32 transfer, where= as >> with the REP prefix, KVM can transfer up to a page of data per VM = trap. >> This is especially noticeable with IDE PIO transfers, where all th= e data >> are squeezed through IO ports. >> >> * Long description: >> >> The RootBridgeIoIoRW() function in >> >> PcAtChipsetPkg/PciHostBridgeDxe/PciRootBridgeIo.c >> >> used to have the exact same IO port acces optimization, dating bac= k >> verbatim to commit 1fd376d9792: >> >> PcAtChipsetPkg/PciHostBridgeDxe: Improve KVM FIFO I/O read/write >> performance >> >> OvmfPkg cloned the "PcAtChipsetPkg/PciHostBridgeDxe" driver (for >> unrelated reasons), and inherited the optimization from PcAtChipse= tPkg. >> >> The "PcAtChipsetPkg/PciHostBridgeDxe" driver was ultimately remove= d in >> commit 111d79db47: >> >> PcAtChipsetPkg/PciHostBridge: Remove PciHostBridge driver >> >> and OvmfPkg too was rebased to the new core Pci Host Bridge Driver= , in >> commit 4014885ffd: >> >> OvmfPkg: switch to MdeModulePkg/Bus/Pci/PciHostBridgeDxe >> >> This caused the optimization to go lost. Namely, the >> RootBridgeIoIoRead() and RootBridgeIoIoWrite() functions in the ne= w core >> Pci Host Bridge Driver delegate IO port accesses to >> EFI_CPU_IO2_PROTOCOL. And, in OvmfPkg (and likely most other Ia32 = / X64 >> edk2 platforms), this protocol is provided by "UefiCpuPkg/CpuIo2Dx= e", >> which lacks the optimization. >> >> Therefore, this patch ports the C source code logic from commit >> 1fd376d9792 (see above) to "UefiCpuPkg/CpuIo2Dxe", plus it ports t= he >> NASM-converted assembly helper functions from OvmfPkg commits >> 6026bf460037 and ace1d0517b65: >> >> OvmfPkg PciHostBridgeDxe: Convert Ia32/IoFifo.asm to NASM >> >> OvmfPkg PciHostBridgeDxe: Convert X64/IoFifo.asm to NASM >> >> In order to support the MSFT and INTEL toolchains as well, the *.a= sm >> files are ported from OvmfPkg as well, immediately from before the= above >> conversion (that is, at 6026bf460037^). >> >> * Notes about the port: >> >> - The write and read branches from commit 1fd376d9792 are split to= the >> separate functions CpuIoServiceWrite() and CpuIoServiceRead(). >> >> - The EfiPciWidthUintXX constants are replaced with EfiCpuIoWidthU= intXX. >> >> - The cast expression "(UINTN) Address" is replaced with >> "(UINTN)Address" (i.e., no space), because that's how the receiv= ing >> functions spell it as well. >> >> - The labels in the switch statements are unindented by one level,= to >> match the edk2 coding style (and the rest of UefiCpuPkg) better. >> >> * The first signoff belongs to Jordan, because he authored all of >> 1fd376d9792, 6026bf460037 and ace1d0517b65. >> >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Jordan Justen >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Laszlo Ersek >> Ref: https://www.redhat.com/archives/vfio-users/2016-April/msg00029.= html >> Reported-by: Mark >> Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/10424/focus= =3D10432 >> Reported-by: Jordan Justen >> Cc: Jordan Justen >> Cc: Ruiyu Ni >> Cc: Jeff Fan >> Cc: Mark >> Tested-by: Mark >> Reviewed-by: Jeff Fan > > This is "hard evidence". Per bullet (2), Jordan had added the FIFO / > REP variants to EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL, for some > (non-specific) performance boost. and when that optimization was later > lost, due to a mistake, we saw IDE PIO performance *tank* on QEMU/KVM. > > Therefore, I am now being reminded that the best performance > regression test for this particular series is to boot an operating > system using IDE (PIO) emulation, with OVMF! Because CpuIo2Dxe still > calls the FIFO APIs. > > That's what I'm going to do next -- later this week, if everything > goes well. I ask for your patience until I complete that regression > test. Thankfully, we captured the original bug report in the commit message above, in the first "Ref:" tag. That link still works today, and it is helpful for setting up a reproducer. So, this is my reproducer: (0) The QEMU version is a recent development version (built at git commit a95260486aa7, "Merge tag 'pull-tcg-20231023' of https://gitlab.com/rth7680/qemu into staging", 2023-10-23). My host kernel (KVM) is from RHEL-9.2: version 5.14.0-284.30.1.el9_2.x86_64. (1) build OVMF X64 for the DEBUG target (2) launch QEMU as follows: qemu-system-x86_64 \ -M pc \ -enable-kvm \ -m 4096 \ -drive if=3Dpflash,format=3Draw,unit=3D0,file=3DOVMF_CODE.4m.fd,readonly= =3Don \ -drive if=3Dpflash,format=3Draw,unit=3D1,file=3DOVMF_VARS.4m.fd,snapshot= =3Don \ -debugcon file:debug.log \ -global isa-debugcon.iobase=3D0x402 \ -cdrom en-us_windows_server_2019_updated_aug_2021_x64_dvd_a6431a28.iso The original bug report used a Windows 7 installer ISO; my reproducer uses a more recent Windows ISO: Windows Server 2019. Steps (1) and (2) were once performed at current edk2 master (ccbe2e938386), and then another time with this patch series applied on top. When the Windows ISO boots, and I hit Enter to confirm booting from the ISO, two "Loading files..." progress bars are displayed, one after the other. I measured the duration of these progress bars, in both scenarios. With OVMF built at edk2 master, - the first "Loading files..." progress bar is hardly noticeable to the human eye, - the second "Loading files..." progress bar takes 28 seconds to complete. With this patch set applied on top of master, - the first "Loading files..." progress bar takes 2.3 seconds, - the second "Loading files..." progress bar takes 7 minutes and 44.4 seconds (464.4 seconds), meaning a slowdown factor of ~16.5. So... please don't merge this. The bottleneck still exists. We still need the "rep outsb" optimization on KVM. Thanks, Laszlo -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#110455): https://edk2.groups.io/g/devel/message/110455 Mute This Topic: https://groups.io/mt/102215661/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/19134562= 12/xyzzy [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-