* [PATCH v3 00/16] ArmVirtPkg/ArmVirtQemu: Performance streamlining
@ 2022-09-26 8:23 Ard Biesheuvel
0 siblings, 0 replies; 3+ messages in thread
From: Ard Biesheuvel @ 2022-09-26 8:23 UTC (permalink / raw)
To: devel; +Cc: Ard Biesheuvel, Leif Lindholm, Alexander Graf
We currently do a substantial amount of processing before enabling the
MMU and caches, which is bad for performance, but also fragile, as it
requires cache coherency to be managed in software.
It also means that when running under virtualization, the hypervisor
must do a non-trivial amount of work to ensure that the host's cached
view of memory is consistent with the guest's uncached view.
So let's update the ArmVirtQemu early boot sequence to improve the
situation:
- modify the page table building logic to avoid the MMU disable/enable
unless really necessary, i.e., only when the entry in question maps
itself, or the code that performs the actual update;
- map any regions that cover page tables in memory eagerly down to
pages, so that we will not need to split them later, and be forced to
go through the MMU-off path to unmap and remap them;
- allow the asm helper routine that lives in the MemoryInit XIP PEIM to
be exposed via a HOB so we can fall back to it from DXE;
- use a compile time generated ID map that covers the first bank of NOR
flash, the first MMIO region (for the UART), and the first 128 MiB of
DRAM, and switch to it straight out of reset.
The resulting build no longer performs any non-coherent memory accesses
via the data side, and only relies on instruction fetches before the MMU
is enabled. It also avoids any cache maintenance to the PoC.
Changes since v2:
- drop shadow page table approach - it only works at EL1, and is a bit
more intrusive than needed; instead, do a proper break-before-make
(BBM) unless the break unmaps the page table itself or the code that
is modifying it;
- add a couple of only tangentially related performance streamlining
changes, to avoid dispatching and shadowing drivers that we don't need
Changes since v1:
- coding style tweaks to placate our CI overlord
- drop -mstrict-align which is no longer needed now that all C code runs
with the MMU and caches on
Cc: Leif Lindholm <quic_llindhol@quicinc.com>
Cc: Alexander Graf <agraf@csgraf.de>
Ard Biesheuvel (16):
ArmVirtPkg: remove EbcDxe from all platforms
ArmVirtPkg: do not enable iSCSI driver by default
ArmVirtPkg: make EFI_LOADER_DATA non-executable
ArmVirtPkg/ArmVirtQemu: wire up timeout PCD to Timeout variable
ArmPkg/ArmMmuLib: don't replace table entries with block entries
ArmPkg/ArmMmuLib: Disable and re-enable MMU only when needed
ArmPkg/ArmMmuLib: permit initial configuration with MMU enabled
ArmPkg/ArmMmuLib: Reuse XIP MMU routines when splitting entries
ArmPlatformPkg/PrePeiCore: permit entry with the MMU enabled
ArmVirtPkg/ArmVirtQemu: implement ArmPlatformLib with static ID map
ArmVirtPkg/ArmVirtQemu: use first 128 MiB as permanent PEI memory
ArmVirtPkg/ArmVirtQemu: enable initial ID map at early boot
ArmVirtPkg/ArmVirtQemu: Drop unused variable PEIM
ArmVirtPkg/ArmVirtQemu: avoid shadowing PEIMs unless necessary
ArmVirtPkg/QemuVirtMemInfoLib: use HOB not PCD to record the memory
size
ArmVirtPkg/ArmVirtQemu: omit PCD PEIM unless TPM support is enabled
ArmPkg/ArmPkg.dec | 2 +
ArmPkg/Include/Library/ArmMmuLib.h | 7 +-
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c | 191 +++++++++++++-------
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S | 43 ++++-
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuPeiLibConstructor.c | 17 ++
ArmPkg/Library/ArmMmuLib/ArmMmuBaseLib.inf | 4 +
ArmPkg/Library/ArmMmuLib/ArmMmuPeiLib.inf | 4 +
ArmPlatformPkg/PrePeiCore/PrePeiCore.c | 22 ++-
ArmVirtPkg/ArmVirt.dsc.inc | 7 +-
ArmVirtPkg/ArmVirtCloudHv.fdf | 5 -
ArmVirtPkg/ArmVirtPkg.dec | 1 +
ArmVirtPkg/ArmVirtQemu.dsc | 53 ++++--
ArmVirtPkg/ArmVirtQemu.fdf | 5 +-
ArmVirtPkg/ArmVirtQemuFvMain.fdf.inc | 5 -
ArmVirtPkg/ArmVirtQemuKernel.dsc | 1 -
ArmVirtPkg/ArmVirtXen.fdf | 5 -
ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S | 115 ++++++++++++
ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.c | 64 +++++++
ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.inf | 40 ++++
ArmVirtPkg/Library/ArmPlatformLibQemu/IdMap.S | 57 ++++++
ArmVirtPkg/Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.c | 14 +-
ArmVirtPkg/Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf | 1 +
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoLib.c | 35 +++-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoLib.inf | 5 +-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoPeiLib.inf | 8 +-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoPeiLibConstructor.c | 30 +--
ArmVirtPkg/MemoryInitPei/MemoryInitPeim.c | 104 +++++++++++
ArmVirtPkg/{Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf => MemoryInitPei/MemoryInitPeim.inf} | 36 ++--
28 files changed, 714 insertions(+), 167 deletions(-)
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.c
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.inf
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/IdMap.S
create mode 100644 ArmVirtPkg/MemoryInitPei/MemoryInitPeim.c
copy ArmVirtPkg/{Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf => MemoryInitPei/MemoryInitPeim.inf} (64%)
--
2.35.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v3 00/16] ArmVirtPkg/ArmVirtQemu: Performance streamlining
@ 2022-09-26 8:24 Ard Biesheuvel
0 siblings, 0 replies; 3+ messages in thread
From: Ard Biesheuvel @ 2022-09-26 8:24 UTC (permalink / raw)
To: devel; +Cc: Ard Biesheuvel, Leif Lindholm, Alexander Graf
We currently do a substantial amount of processing before enabling the
MMU and caches, which is bad for performance, but also fragile, as it
requires cache coherency to be managed in software.
It also means that when running under virtualization, the hypervisor
must do a non-trivial amount of work to ensure that the host's cached
view of memory is consistent with the guest's uncached view.
So let's update the ArmVirtQemu early boot sequence to improve the
situation:
- modify the page table building logic to avoid the MMU disable/enable
unless really necessary, i.e., only when the entry in question maps
itself, or the code that performs the actual update;
- map any regions that cover page tables in memory eagerly down to
pages, so that we will not need to split them later, and be forced to
go through the MMU-off path to unmap and remap them;
- allow the asm helper routine that lives in the MemoryInit XIP PEIM to
be exposed via a HOB so we can fall back to it from DXE;
- use a compile time generated ID map that covers the first bank of NOR
flash, the first MMIO region (for the UART), and the first 128 MiB of
DRAM, and switch to it straight out of reset.
The resulting build no longer performs any non-coherent memory accesses
via the data side, and only relies on instruction fetches before the MMU
is enabled. It also avoids any cache maintenance to the PoC.
Changes since v2:
- drop shadow page table approach - it only works at EL1, and is a bit
more intrusive than needed; instead, do a proper break-before-make
(BBM) unless the break unmaps the page table itself or the code that
is modifying it;
- add a couple of only tangentially related performance streamlining
changes, to avoid dispatching and shadowing drivers that we don't need
Changes since v1:
- coding style tweaks to placate our CI overlord
- drop -mstrict-align which is no longer needed now that all C code runs
with the MMU and caches on
Cc: Leif Lindholm <quic_llindhol@quicinc.com>
Cc: Alexander Graf <agraf@csgraf.de>
Ard Biesheuvel (16):
ArmVirtPkg: remove EbcDxe from all platforms
ArmVirtPkg: do not enable iSCSI driver by default
ArmVirtPkg: make EFI_LOADER_DATA non-executable
ArmVirtPkg/ArmVirtQemu: wire up timeout PCD to Timeout variable
ArmPkg/ArmMmuLib: don't replace table entries with block entries
ArmPkg/ArmMmuLib: Disable and re-enable MMU only when needed
ArmPkg/ArmMmuLib: permit initial configuration with MMU enabled
ArmPkg/ArmMmuLib: Reuse XIP MMU routines when splitting entries
ArmPlatformPkg/PrePeiCore: permit entry with the MMU enabled
ArmVirtPkg/ArmVirtQemu: implement ArmPlatformLib with static ID map
ArmVirtPkg/ArmVirtQemu: use first 128 MiB as permanent PEI memory
ArmVirtPkg/ArmVirtQemu: enable initial ID map at early boot
ArmVirtPkg/ArmVirtQemu: Drop unused variable PEIM
ArmVirtPkg/ArmVirtQemu: avoid shadowing PEIMs unless necessary
ArmVirtPkg/QemuVirtMemInfoLib: use HOB not PCD to record the memory
size
ArmVirtPkg/ArmVirtQemu: omit PCD PEIM unless TPM support is enabled
ArmPkg/ArmPkg.dec | 2 +
ArmPkg/Include/Library/ArmMmuLib.h | 7 +-
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c | 191 +++++++++++++-------
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S | 43 ++++-
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuPeiLibConstructor.c | 17 ++
ArmPkg/Library/ArmMmuLib/ArmMmuBaseLib.inf | 4 +
ArmPkg/Library/ArmMmuLib/ArmMmuPeiLib.inf | 4 +
ArmPlatformPkg/PrePeiCore/PrePeiCore.c | 22 ++-
ArmVirtPkg/ArmVirt.dsc.inc | 7 +-
ArmVirtPkg/ArmVirtCloudHv.fdf | 5 -
ArmVirtPkg/ArmVirtPkg.dec | 1 +
ArmVirtPkg/ArmVirtQemu.dsc | 53 ++++--
ArmVirtPkg/ArmVirtQemu.fdf | 5 +-
ArmVirtPkg/ArmVirtQemuFvMain.fdf.inc | 5 -
ArmVirtPkg/ArmVirtQemuKernel.dsc | 1 -
ArmVirtPkg/ArmVirtXen.fdf | 5 -
ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S | 115 ++++++++++++
ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.c | 64 +++++++
ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.inf | 40 ++++
ArmVirtPkg/Library/ArmPlatformLibQemu/IdMap.S | 57 ++++++
ArmVirtPkg/Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.c | 14 +-
ArmVirtPkg/Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf | 1 +
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoLib.c | 35 +++-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoLib.inf | 5 +-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoPeiLib.inf | 8 +-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoPeiLibConstructor.c | 30 +--
ArmVirtPkg/MemoryInitPei/MemoryInitPeim.c | 104 +++++++++++
ArmVirtPkg/{Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf => MemoryInitPei/MemoryInitPeim.inf} | 36 ++--
28 files changed, 714 insertions(+), 167 deletions(-)
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.c
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.inf
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/IdMap.S
create mode 100644 ArmVirtPkg/MemoryInitPei/MemoryInitPeim.c
copy ArmVirtPkg/{Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf => MemoryInitPei/MemoryInitPeim.inf} (64%)
--
2.35.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v3 00/16] ArmVirtPkg/ArmVirtQemu: Performance streamlining
@ 2022-09-26 8:24 Ard Biesheuvel
0 siblings, 0 replies; 3+ messages in thread
From: Ard Biesheuvel @ 2022-09-26 8:24 UTC (permalink / raw)
To: devel; +Cc: Ard Biesheuvel, Leif Lindholm, Alexander Graf
We currently do a substantial amount of processing before enabling the
MMU and caches, which is bad for performance, but also fragile, as it
requires cache coherency to be managed in software.
It also means that when running under virtualization, the hypervisor
must do a non-trivial amount of work to ensure that the host's cached
view of memory is consistent with the guest's uncached view.
So let's update the ArmVirtQemu early boot sequence to improve the
situation:
- modify the page table building logic to avoid the MMU disable/enable
unless really necessary, i.e., only when the entry in question maps
itself, or the code that performs the actual update;
- map any regions that cover page tables in memory eagerly down to
pages, so that we will not need to split them later, and be forced to
go through the MMU-off path to unmap and remap them;
- allow the asm helper routine that lives in the MemoryInit XIP PEIM to
be exposed via a HOB so we can fall back to it from DXE;
- use a compile time generated ID map that covers the first bank of NOR
flash, the first MMIO region (for the UART), and the first 128 MiB of
DRAM, and switch to it straight out of reset.
The resulting build no longer performs any non-coherent memory accesses
via the data side, and only relies on instruction fetches before the MMU
is enabled. It also avoids any cache maintenance to the PoC.
Changes since v2:
- drop shadow page table approach - it only works at EL1, and is a bit
more intrusive than needed; instead, do a proper break-before-make
(BBM) unless the break unmaps the page table itself or the code that
is modifying it;
- add a couple of only tangentially related performance streamlining
changes, to avoid dispatching and shadowing drivers that we don't need
Changes since v1:
- coding style tweaks to placate our CI overlord
- drop -mstrict-align which is no longer needed now that all C code runs
with the MMU and caches on
Cc: Leif Lindholm <quic_llindhol@quicinc.com>
Cc: Alexander Graf <agraf@csgraf.de>
Ard Biesheuvel (16):
ArmVirtPkg: remove EbcDxe from all platforms
ArmVirtPkg: do not enable iSCSI driver by default
ArmVirtPkg: make EFI_LOADER_DATA non-executable
ArmVirtPkg/ArmVirtQemu: wire up timeout PCD to Timeout variable
ArmPkg/ArmMmuLib: don't replace table entries with block entries
ArmPkg/ArmMmuLib: Disable and re-enable MMU only when needed
ArmPkg/ArmMmuLib: permit initial configuration with MMU enabled
ArmPkg/ArmMmuLib: Reuse XIP MMU routines when splitting entries
ArmPlatformPkg/PrePeiCore: permit entry with the MMU enabled
ArmVirtPkg/ArmVirtQemu: implement ArmPlatformLib with static ID map
ArmVirtPkg/ArmVirtQemu: use first 128 MiB as permanent PEI memory
ArmVirtPkg/ArmVirtQemu: enable initial ID map at early boot
ArmVirtPkg/ArmVirtQemu: Drop unused variable PEIM
ArmVirtPkg/ArmVirtQemu: avoid shadowing PEIMs unless necessary
ArmVirtPkg/QemuVirtMemInfoLib: use HOB not PCD to record the memory
size
ArmVirtPkg/ArmVirtQemu: omit PCD PEIM unless TPM support is enabled
ArmPkg/ArmPkg.dec | 2 +
ArmPkg/Include/Library/ArmMmuLib.h | 7 +-
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c | 191 +++++++++++++-------
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S | 43 ++++-
ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuPeiLibConstructor.c | 17 ++
ArmPkg/Library/ArmMmuLib/ArmMmuBaseLib.inf | 4 +
ArmPkg/Library/ArmMmuLib/ArmMmuPeiLib.inf | 4 +
ArmPlatformPkg/PrePeiCore/PrePeiCore.c | 22 ++-
ArmVirtPkg/ArmVirt.dsc.inc | 7 +-
ArmVirtPkg/ArmVirtCloudHv.fdf | 5 -
ArmVirtPkg/ArmVirtPkg.dec | 1 +
ArmVirtPkg/ArmVirtQemu.dsc | 53 ++++--
ArmVirtPkg/ArmVirtQemu.fdf | 5 +-
ArmVirtPkg/ArmVirtQemuFvMain.fdf.inc | 5 -
ArmVirtPkg/ArmVirtQemuKernel.dsc | 1 -
ArmVirtPkg/ArmVirtXen.fdf | 5 -
ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S | 115 ++++++++++++
ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.c | 64 +++++++
ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.inf | 40 ++++
ArmVirtPkg/Library/ArmPlatformLibQemu/IdMap.S | 57 ++++++
ArmVirtPkg/Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.c | 14 +-
ArmVirtPkg/Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf | 1 +
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoLib.c | 35 +++-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoLib.inf | 5 +-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoPeiLib.inf | 8 +-
ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoPeiLibConstructor.c | 30 +--
ArmVirtPkg/MemoryInitPei/MemoryInitPeim.c | 104 +++++++++++
ArmVirtPkg/{Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf => MemoryInitPei/MemoryInitPeim.inf} | 36 ++--
28 files changed, 714 insertions(+), 167 deletions(-)
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.c
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/ArmPlatformLibQemu.inf
create mode 100644 ArmVirtPkg/Library/ArmPlatformLibQemu/IdMap.S
create mode 100644 ArmVirtPkg/MemoryInitPei/MemoryInitPeim.c
copy ArmVirtPkg/{Library/ArmVirtMemoryInitPeiLib/ArmVirtMemoryInitPeiLib.inf => MemoryInitPei/MemoryInitPeim.inf} (64%)
--
2.35.1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-26 8:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-26 8:23 [PATCH v3 00/16] ArmVirtPkg/ArmVirtQemu: Performance streamlining Ard Biesheuvel
-- strict thread matches above, loose matches on Subject: below --
2022-09-26 8:24 Ard Biesheuvel
2022-09-26 8:24 Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox