public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Laszlo Ersek" <lersek@redhat.com>
To: devel@edk2.groups.io, dun.tan@intel.com,
	"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
	"leo.duran@amd.com" <leo.duran@amd.com>,
	"brijesh.singh@amd.com" <brijesh.singh@amd.com>,
	"Chang, Abner" <abner.chang@amd.com>,
	"michael.roth@amd.com" <michael.roth@amd.com>,
	"Attar, AbdulLateef (Abdul Lateef)" <AbdulLateef.Attar@amd.com>
Cc: "Ni, Ray" <ray.ni@intel.com>, "Yao, Jiewen" <jiewen.yao@intel.com>
Subject: Re: [edk2-devel] [PATCH 0/7] Support Tdx and sev in BaseIoLibIntrinsic and remove BaseIoLibIntrinsicSev
Date: Tue, 31 Oct 2023 18:06:22 +0100	[thread overview]
Message-ID: <0a9e392d-3c51-1992-9c18-f11a0aaf9fe6@redhat.com> (raw)
In-Reply-To: <BN9PR11MB548375E76BDEF23AD5693B17E5A0A@BN9PR11MB5483.namprd11.prod.outlook.com>

On 10/31/23 10:56, duntan wrote:
> Hi Tom,
> Thanks for your testing and comments. I'll modify the patch commit
> message and code according to your comments.
> Yes the patch changes the behavior for IoReadFifo/ IoWriteFifo API for
> non-Tdx and non-SEV case. Looking forward to more comments from the
> community.
>
> Hi Laszlo,
> May I know the actually performance gap data in the case that you
> mentioned?

Good call to have me measure this.

I cannot reproduce the performance hit *at all* with the isa-debugcon
device (that is, through the OVMF debug log).

I built OVMF X64 with PcdDebugPrintErrorLevel set to 0x8040004F, at
commit a671a14e63fd. That was the baseline. I built both NOOPT and
DEBUG.

Then I repeated the same, with your patches applied.

I cannot show any human-perceptible boot time difference.

Note that the NOOPT comparison excludes LTO (i.e., compiler advances) as
the potential factor that might mask the impact of your patches. I also
disassembled some binaries producing debug output, and indeed, without
your patches, we have rep outsb in them, and with your patches, we
don't. Yet, there is no performance difference.

This is striking news.

I want to understand why we have thought thus far that individual IO
Port traps are extremely expensive. I've now found evidence going back
as far as 2012 -- the year when I joined edk2!

Here are some facts:


(1) in commit f1ec65ba24e1 ("OvmfPkg: Add QemuFwCfgLib library class and
implementation", 2012-05-30) -- note the date! --, Jordan added the
initial implementation of QemuFwCfgLib. That version already used an
open-coded IoReadFifo8 function.

There was no particular explanation captured in the commit message.

I also happen to have the entire 2012 edk2-devel mailing list traffic
saved locally (well, the messages posted after I joined), and the
discussions around the various versions of the patch in question don't
seem to specifically justify the REP variant. So, we cannot conclude
much from this piece of evidence.


(2) Then I decided to search that entire year's traffic (i.e., 2012's)
for the word "fifo". I found relevant hits:

- In commit 1fd376d97922 ("PcAtChipsetPkg/PciHostBridgeDxe: Improve KVM
  FIFO I/O read/write performance", 2012-06-01) -- note the year again!
  --, Jordan brought the FIFO variants to the
  EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL implementation in edk2. The code was
  duplicated.

  Note that Jordan clearly stated in the commit message:

> KVM can substantially boost the speed of the rep insb/insw/insl
> and rep outsb/outsw/outsl instructions by transferring up to
> a page of data per VM trap.

- On the same day when Jordan posted the patch that would become
  above-noted commit 1fd376d97922, Jordan also asked on the list
  (subject "IoLib I/O FIFO read/write routines"):

> Regarding the routines now defined in PcAtChipsetPkg/PciHostBridgeDxe/IoFifo.h.
> IoReadFifo8, IoReadFifo16, IoReadFifo32,
> IoWriteFifo8, IoWriteFifo16, and IoWriteFifo32
>
> Would it make sense to consider adding these to MdePkg/Include/Library/IoLib.h?

  Again, this was still in 2012.

So here we can conclude that Jordan saw some specific perf boost due to
employing the FIFO variants in EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL, but the
use case was not documented.


(3) During all this discussion here, I have been *distinctly* recalling
that I've experienced an *actual performance regression* sometime, after
we had unwittingly switched from the REP variant to the non-REP variant.

And now I have actually found it. I'm going to quote the commit message
in full:

> commit fb8b54694c53e73e1b6a098686e908f54f9bb7a9
> Author: Laszlo Ersek <lersek@redhat.com>
> Date:   Thu Apr 7 22:28:38 2016 +0200
>
>     UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and writes of IO ports
>
>     * Short description:
>
>       The CpuIoServiceRead() and CpuIoServiceWrite() functions transfer data
>       between memory and IO ports with individual Io(Read|Write)(8|16|32)
>       function calls, each in an appropriately set up loop.
>
>       On the Ia32 and X64 platforms however, FIFO reads and writes can be
>       optimized, by coding them in assembly, and delegating the loop to the
>       CPU, with the REP prefix.
>
>       On KVM virtualization hosts, this difference has a huge performance
>       impact: if the loop is open-coded, then the virtual machine traps to the
>       hypervisor on every single UINT8 / UINT16 / UINT32 transfer, whereas
>       with the REP prefix, KVM can transfer up to a page of data per VM trap.
>       This is especially noticeable with IDE PIO transfers, where all the data
>       are squeezed through IO ports.
>
>     * Long description:
>
>       The RootBridgeIoIoRW() function in
>
>         PcAtChipsetPkg/PciHostBridgeDxe/PciRootBridgeIo.c
>
>       used to have the exact same IO port acces optimization, dating back
>       verbatim to commit 1fd376d9792:
>
>         PcAtChipsetPkg/PciHostBridgeDxe: Improve KVM FIFO I/O read/write
>           performance
>
>       OvmfPkg cloned the "PcAtChipsetPkg/PciHostBridgeDxe" driver (for
>       unrelated reasons), and inherited the optimization from PcAtChipsetPkg.
>
>       The "PcAtChipsetPkg/PciHostBridgeDxe" driver was ultimately removed in
>       commit 111d79db47:
>
>         PcAtChipsetPkg/PciHostBridge: Remove PciHostBridge driver
>
>       and OvmfPkg too was rebased to the new core Pci Host Bridge Driver, in
>       commit 4014885ffd:
>
>         OvmfPkg: switch to MdeModulePkg/Bus/Pci/PciHostBridgeDxe
>
>       This caused the optimization to go lost. Namely, the
>       RootBridgeIoIoRead() and RootBridgeIoIoWrite() functions in the new core
>       Pci Host Bridge Driver delegate IO port accesses to
>       EFI_CPU_IO2_PROTOCOL. And, in OvmfPkg (and likely most other Ia32 / X64
>       edk2 platforms), this protocol is provided by "UefiCpuPkg/CpuIo2Dxe",
>       which lacks the optimization.
>
>       Therefore, this patch ports the C source code logic from commit
>       1fd376d9792 (see above) to "UefiCpuPkg/CpuIo2Dxe", plus it ports the
>       NASM-converted assembly helper functions from OvmfPkg commits
>       6026bf460037 and ace1d0517b65:
>
>         OvmfPkg PciHostBridgeDxe: Convert Ia32/IoFifo.asm to NASM
>
>         OvmfPkg PciHostBridgeDxe: Convert X64/IoFifo.asm to NASM
>
>       In order to support the MSFT and INTEL toolchains as well, the *.asm
>       files are ported from OvmfPkg as well, immediately from before the above
>       conversion (that is, at 6026bf460037^).
>
>     * Notes about the port:
>
>       - The write and read branches from commit 1fd376d9792 are split to the
>         separate functions CpuIoServiceWrite() and CpuIoServiceRead().
>
>       - The EfiPciWidthUintXX constants are replaced with EfiCpuIoWidthUintXX.
>
>       - The cast expression "(UINTN) Address" is replaced with
>         "(UINTN)Address" (i.e., no space), because that's how the receiving
>         functions spell it as well.
>
>       - The labels in the switch statements are unindented by one level, to
>         match the edk2 coding style (and the rest of UefiCpuPkg) better.
>
>     * The first signoff belongs to Jordan, because he authored all of
>       1fd376d9792, 6026bf460037 and ace1d0517b65.
>
>     Contributed-under: TianoCore Contribution Agreement 1.0
>     Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
>     Contributed-under: TianoCore Contribution Agreement 1.0
>     Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>     Ref: https://www.redhat.com/archives/vfio-users/2016-April/msg00029.html
>     Reported-by: Mark <kram321@gmail.com>
>     Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/10424/focus=10432
>     Reported-by: Jordan Justen <jordan.l.justen@intel.com>
>     Cc: Jordan Justen <jordan.l.justen@intel.com>
>     Cc: Ruiyu Ni <ruiyu.ni@intel.com>
>     Cc: Jeff Fan <jeff.fan@intel.com>
>     Cc: Mark <kram321@gmail.com>
>     Tested-by: Mark <kram321@gmail.com>
>     Reviewed-by: Jeff Fan <jeff.fan@intel.com>

This is "hard evidence". Per bullet (2), Jordan had added the FIFO / REP
variants to EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL, for some (non-specific)
performance boost. and when that optimization was later lost, due to a
mistake, we saw IDE PIO performance *tank* on QEMU/KVM.

Therefore, I am now being reminded that the best performance regression
test for this particular series is to boot an operating system using IDE
(PIO) emulation, with OVMF! Because CpuIo2Dxe still calls the FIFO APIs.

That's what I'm going to do next -- later this week, if everything goes
well. I ask for your patience until I complete that regression test.


(4) In commit 19c6d9feaaf8 ("MdePkg: Expand BaseIoLibIntrinsic (IoLib
class) library", 2017-01-17), the duplicated FIFO variants were
upstreamed to the IoLib instance(s) in MdePkg.

Subsequently, commit b6d11d7c4678 ("MdePkg: BaseIoLibIntrinsic (IoLib
class) library", 2017-04-13) introduced the SEV fallback.

And then commit 97353a9c914d ("OvmfPkg: Update dsc to use IoLib from
BaseIoLibIntrinsicSev.inf", 2017-07-10) put the SEV fallback to use in
OVMF.

In commit 80886a695377 ("OvmfPkg/PlatformDebugLibIoPort: write messages
with IoWriteFifo8()", 2017-09-11), we switched PlatformDebugLibIoPort as
well over to the REP variant (which had been by then centralized). In
fact, using the REP fifo write in OVMF's DebugLib instance was
recommended by Jordan already in 2012! He wrote (while reviewing the
patch that would become commit b90aefa9e46c, "OvmfPkg: add support for
debug console on port 0x402", 2012-07-26):

> I'm not certain, but I think it might be worth-while to make the
> str-len call, and then use the I/O fifo call.
>
> It could reduce the trapping count by ~30x for writing a debug string,
> and we do write quite a lot of DEBUG... (I'm not sure it is enough to
> make a noticeable difference though.)

So that was what commit 80886a695377 implemented some five years later,
citing that "In the general case this speeds up logging".


(5) I also distinctly remember witnessing the debug logging speed
difference between SEV and non-SEV guests. (Well, this is not a "fact",
but a memory.)

In commit c09d9571300a ("OvmfPkg: save on I/O port accesses when the
debug port is not in use", 2017-11-17), we even quantified this. The
commit message says, "therefore, in SEV guests, the boot time impact is
huge (about 45 seconds _additional_ time spent writing to the debug
port)".


So, minimally commit fb8b54694c53 from bullet (3) is *hard evidence*
that this difference in performance used to exist, regardless of SEV.

As I said above, I will try to retest the same use case soon, with this
patch set. If the performance difference turns out to be invisible now,
then I can only chalk that up to QEMU/KVM technology having advanced a
lot -- and in that case, I will rescind my NACK.

To summarize, my operating memory was correct, and I wasn't wrong to
NACK this patch set; at the same time, you are also very correct to
point out that we need to measure *afresh*.

Back to your email:

On 10/31/23 10:56, duntan wrote:
> Also, if a DEBUG binary is used, I think the boot time impact might be
> acceptable?

We exclusively build and ship DEBUG binaries. RELEASE binaries contain
no ASSERTs. Many edk2 core modules -- unfortunately -- only employ
ASSERTs as error checks, so a RELEASE build means that those core
modules are built without error checks. Which is unacceptable.

(But even if all those modules had proper, explicit error checking, we'd
still want to preserve all the ASSERT()s.)

I'll report back later with the IDE PIO performance test results.

Thanks,
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#110436): https://edk2.groups.io/g/devel/message/110436
Mute This Topic: https://groups.io/mt/102215661/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



  reply	other threads:[~2023-10-31 17:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-27  5:42 [edk2-devel] [PATCH 0/7] Support Tdx and sev in BaseIoLibIntrinsic and remove BaseIoLibIntrinsicSev duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 1/7] MdePkg: Create TdxLibNull.inf instance duntan
2023-10-27  5:54   ` Ni, Ray
2023-10-27  5:56     ` Ni, Ray
2023-10-27  6:32       ` duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 2/7] MdePkg: Add CcProbeLibNull and TdxLibNull implement duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 3/7] MdePkg: simplify IoRead/WriteFifo in IoLibFifo.c duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 4/7] MdePkg:support Tdx and sev in BaseIoLibIntrinsic duntan
2023-10-27  5:42 ` [edk2-devel] [PATCH 5/7] OvmfPkg: Add CcProbeLib in PlatformInitLib.inf duntan
2023-10-30 14:37   ` Ard Biesheuvel
2023-10-27  5:42 ` [edk2-devel] [PATCH 6/7] OvmfPkg: use BaseIoLibIntrinsic.inf in dsc files duntan
2023-10-30 14:36   ` Ard Biesheuvel
2023-10-31 12:30     ` Laszlo Ersek
2023-10-31 13:19       ` Ard Biesheuvel
2023-10-27  5:43 ` [edk2-devel] [PATCH 7/7] MdePkg:remove BaseIoLibIntrinsicSev related code duntan
2023-10-27  5:49 ` [edk2-devel] [PATCH 0/7] Support Tdx and sev in BaseIoLibIntrinsic and remove BaseIoLibIntrinsicSev Yao, Jiewen
2023-10-27  6:31   ` duntan
2023-10-27  7:06     ` Yao, Jiewen
2023-10-27  7:34       ` duntan
2023-10-27  8:05         ` duntan
2023-10-27 21:31           ` Lendacky, Thomas via groups.io
2023-10-28 11:40             ` Laszlo Ersek
2023-10-31  9:56               ` duntan
2023-10-31 17:06                 ` Laszlo Ersek [this message]
2023-10-31 22:36                   ` Laszlo Ersek
2023-11-02  8:49                     ` duntan
2023-11-02 10:42                       ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a9e392d-3c51-1992-9c18-f11a0aaf9fe6@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox