public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Laszlo Ersek" <lersek@redhat.com>
To: devel@edk2.groups.io, dun.tan@intel.com,
	"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
	"leo.duran@amd.com" <leo.duran@amd.com>,
	"brijesh.singh@amd.com" <brijesh.singh@amd.com>,
	"Chang, Abner" <abner.chang@amd.com>,
	"michael.roth@amd.com" <michael.roth@amd.com>,
	"Attar, AbdulLateef (Abdul Lateef)" <AbdulLateef.Attar@amd.com>
Cc: "Ni, Ray" <ray.ni@intel.com>, "Yao, Jiewen" <jiewen.yao@intel.com>
Subject: Re: [edk2-devel] [PATCH 0/7] Support Tdx and sev in BaseIoLibIntrinsic and remove BaseIoLibIntrinsicSev
Date: Tue, 31 Oct 2023 23:36:33 +0100	[thread overview]
Message-ID: <da6d80a7-9e3a-7c3f-fe17-1257b0f90009@redhat.com> (raw)
In-Reply-To: <0a9e392d-3c51-1992-9c18-f11a0aaf9fe6@redhat.com>

On 10/31/23 18:06, Laszlo Ersek wrote:

> (3) During all this discussion here, I have been *distinctly*
> recalling that I've experienced an *actual performance regression*
> sometime, after we had unwittingly switched from the REP variant to
> the non-REP variant.
>
> And now I have actually found it. I'm going to quote the commit
> message in full:
>
>> commit fb8b54694c53e73e1b6a098686e908f54f9bb7a9
>> Author: Laszlo Ersek <lersek@redhat.com>
>> Date:   Thu Apr 7 22:28:38 2016 +0200
>>
>>     UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and writes of IO ports
>>
>>     * Short description:
>>
>>       The CpuIoServiceRead() and CpuIoServiceWrite() functions transfer data
>>       between memory and IO ports with individual Io(Read|Write)(8|16|32)
>>       function calls, each in an appropriately set up loop.
>>
>>       On the Ia32 and X64 platforms however, FIFO reads and writes can be
>>       optimized, by coding them in assembly, and delegating the loop to the
>>       CPU, with the REP prefix.
>>
>>       On KVM virtualization hosts, this difference has a huge performance
>>       impact: if the loop is open-coded, then the virtual machine traps to the
>>       hypervisor on every single UINT8 / UINT16 / UINT32 transfer, whereas
>>       with the REP prefix, KVM can transfer up to a page of data per VM trap.
>>       This is especially noticeable with IDE PIO transfers, where all the data
>>       are squeezed through IO ports.
>>
>>     * Long description:
>>
>>       The RootBridgeIoIoRW() function in
>>
>>         PcAtChipsetPkg/PciHostBridgeDxe/PciRootBridgeIo.c
>>
>>       used to have the exact same IO port acces optimization, dating back
>>       verbatim to commit 1fd376d9792:
>>
>>         PcAtChipsetPkg/PciHostBridgeDxe: Improve KVM FIFO I/O read/write
>>           performance
>>
>>       OvmfPkg cloned the "PcAtChipsetPkg/PciHostBridgeDxe" driver (for
>>       unrelated reasons), and inherited the optimization from PcAtChipsetPkg.
>>
>>       The "PcAtChipsetPkg/PciHostBridgeDxe" driver was ultimately removed in
>>       commit 111d79db47:
>>
>>         PcAtChipsetPkg/PciHostBridge: Remove PciHostBridge driver
>>
>>       and OvmfPkg too was rebased to the new core Pci Host Bridge Driver, in
>>       commit 4014885ffd:
>>
>>         OvmfPkg: switch to MdeModulePkg/Bus/Pci/PciHostBridgeDxe
>>
>>       This caused the optimization to go lost. Namely, the
>>       RootBridgeIoIoRead() and RootBridgeIoIoWrite() functions in the new core
>>       Pci Host Bridge Driver delegate IO port accesses to
>>       EFI_CPU_IO2_PROTOCOL. And, in OvmfPkg (and likely most other Ia32 / X64
>>       edk2 platforms), this protocol is provided by "UefiCpuPkg/CpuIo2Dxe",
>>       which lacks the optimization.
>>
>>       Therefore, this patch ports the C source code logic from commit
>>       1fd376d9792 (see above) to "UefiCpuPkg/CpuIo2Dxe", plus it ports the
>>       NASM-converted assembly helper functions from OvmfPkg commits
>>       6026bf460037 and ace1d0517b65:
>>
>>         OvmfPkg PciHostBridgeDxe: Convert Ia32/IoFifo.asm to NASM
>>
>>         OvmfPkg PciHostBridgeDxe: Convert X64/IoFifo.asm to NASM
>>
>>       In order to support the MSFT and INTEL toolchains as well, the *.asm
>>       files are ported from OvmfPkg as well, immediately from before the above
>>       conversion (that is, at 6026bf460037^).
>>
>>     * Notes about the port:
>>
>>       - The write and read branches from commit 1fd376d9792 are split to the
>>         separate functions CpuIoServiceWrite() and CpuIoServiceRead().
>>
>>       - The EfiPciWidthUintXX constants are replaced with EfiCpuIoWidthUintXX.
>>
>>       - The cast expression "(UINTN) Address" is replaced with
>>         "(UINTN)Address" (i.e., no space), because that's how the receiving
>>         functions spell it as well.
>>
>>       - The labels in the switch statements are unindented by one level, to
>>         match the edk2 coding style (and the rest of UefiCpuPkg) better.
>>
>>     * The first signoff belongs to Jordan, because he authored all of
>>       1fd376d9792, 6026bf460037 and ace1d0517b65.
>>
>>     Contributed-under: TianoCore Contribution Agreement 1.0
>>     Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
>>     Contributed-under: TianoCore Contribution Agreement 1.0
>>     Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>     Ref: https://www.redhat.com/archives/vfio-users/2016-April/msg00029.html
>>     Reported-by: Mark <kram321@gmail.com>
>>     Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/10424/focus=10432
>>     Reported-by: Jordan Justen <jordan.l.justen@intel.com>
>>     Cc: Jordan Justen <jordan.l.justen@intel.com>
>>     Cc: Ruiyu Ni <ruiyu.ni@intel.com>
>>     Cc: Jeff Fan <jeff.fan@intel.com>
>>     Cc: Mark <kram321@gmail.com>
>>     Tested-by: Mark <kram321@gmail.com>
>>     Reviewed-by: Jeff Fan <jeff.fan@intel.com>
>
> This is "hard evidence". Per bullet (2), Jordan had added the FIFO /
> REP variants to EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL, for some
> (non-specific) performance boost. and when that optimization was later
> lost, due to a mistake, we saw IDE PIO performance *tank* on QEMU/KVM.
>
> Therefore, I am now being reminded that the best performance
> regression test for this particular series is to boot an operating
> system using IDE (PIO) emulation, with OVMF! Because CpuIo2Dxe still
> calls the FIFO APIs.
>
> That's what I'm going to do next -- later this week, if everything
> goes well. I ask for your patience until I complete that regression
> test.

Thankfully, we captured the original bug report in the commit message
above, in the first "Ref:" tag. That link still works today, and it is
helpful for setting up a reproducer.

So, this is my reproducer:

(0) The QEMU version is a recent development version (built at git
commit a95260486aa7, "Merge tag 'pull-tcg-20231023' of
https://gitlab.com/rth7680/qemu into staging", 2023-10-23).

My host kernel (KVM) is from RHEL-9.2: version
5.14.0-284.30.1.el9_2.x86_64.

(1) build OVMF X64 for the DEBUG target

(2) launch QEMU as follows:

qemu-system-x86_64 \
  -M pc \
  -enable-kvm \
  -m 4096 \
  -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.4m.fd,readonly=on \
  -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.4m.fd,snapshot=on \
  -debugcon file:debug.log \
  -global isa-debugcon.iobase=0x402 \
  -cdrom en-us_windows_server_2019_updated_aug_2021_x64_dvd_a6431a28.iso

The original bug report used a Windows 7 installer ISO; my reproducer
uses a more recent Windows ISO: Windows Server 2019.

Steps (1) and (2) were once performed at current edk2 master
(ccbe2e938386), and then another time with this patch series applied on
top.

When the Windows ISO boots, and I hit Enter to confirm booting from the
ISO, two "Loading files..." progress bars are displayed, one after the
other. I measured the duration of these progress bars, in both
scenarios.

With OVMF built at edk2 master,

- the first "Loading files..." progress bar is hardly noticeable to the
  human eye,

- the second "Loading files..." progress bar takes 28 seconds to
  complete.

With this patch set applied on top of master,

- the first "Loading files..." progress bar takes 2.3 seconds,

- the second "Loading files..." progress bar takes 7 minutes and 44.4
  seconds (464.4 seconds), meaning a slowdown factor of ~16.5.

So... please don't merge this. The bottleneck still exists. We still
need the "rep outsb" optimization on KVM.

Thanks,
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#110455): https://edk2.groups.io/g/devel/message/110455
Mute This Topic: https://groups.io/mt/102215661/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



  reply	other threads:[~2023-10-31 22:36 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-27  5:42 [edk2-devel] [PATCH 0/7] Support Tdx and sev in BaseIoLibIntrinsic and remove BaseIoLibIntrinsicSev duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 1/7] MdePkg: Create TdxLibNull.inf instance duntan
2023-10-27  5:54   ` Ni, Ray
2023-10-27  5:56     ` Ni, Ray
2023-10-27  6:32       ` duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 2/7] MdePkg: Add CcProbeLibNull and TdxLibNull implement duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 3/7] MdePkg: simplify IoRead/WriteFifo in IoLibFifo.c duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 4/7] MdePkg:support Tdx and sev in BaseIoLibIntrinsic duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 5/7] OvmfPkg: Add CcProbeLib in PlatformInitLib.inf duntan
2023-10-30 14:37   ` Ard Biesheuvel
2023-10-27  5:42 ` [edk2-devel] [PATCH 6/7] OvmfPkg: use BaseIoLibIntrinsic.inf in dsc files duntan
2023-10-30 14:36   ` Ard Biesheuvel
2023-10-31 12:30     ` Laszlo Ersek
2023-10-31 13:19       ` Ard Biesheuvel
2023-10-27  5:43 ` [edk2-devel] [PATCH 7/7] MdePkg:remove BaseIoLibIntrinsicSev related code duntan
2023-10-27  5:49 ` [edk2-devel] [PATCH 0/7] Support Tdx and sev in BaseIoLibIntrinsic and remove BaseIoLibIntrinsicSev Yao, Jiewen
2023-10-27  6:31   ` duntan
2023-10-27  7:06     ` Yao, Jiewen
2023-10-27  7:34       ` duntan
2023-10-27  8:05         ` duntan
2023-10-27 21:31           ` Lendacky, Thomas via groups.io
2023-10-28 11:40             ` Laszlo Ersek
2023-10-31  9:56               ` duntan
2023-10-31 17:06                 ` Laszlo Ersek
2023-10-31 22:36                   ` Laszlo Ersek [this message]
2023-11-02  8:49                     ` duntan
2023-11-02 10:42                       ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da6d80a7-9e3a-7c3f-fe17-1257b0f90009@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox