* [PATCH V2 0/6] Enable SMM page level protection.
@ 2016-11-04 9:30 Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
` (7 more replies)
0 siblings, 8 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04 9:30 UTC (permalink / raw)
To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek
==== below is V2 description ====
1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
2) PiSmmCpu: Add debug info on StartupAp() fails.
3) PiSmmCpu: Add ASSERT for AllocatePages().
4) PiSmmCpu: Add protection detail in commit message.
5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
==== below is V1 description ====
This series patch enables SMM page level protection.
Features are:
1) PiSmmCore reports SMM PE image code/data information
in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
and set XD for data page and RO for code page.
3) PiSmmCpu enables Static Paging for X64 according to
PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
is used as long as it is supported.
4) PiSmmCpu sets importance data structure to be read only,
such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
tested platform:
1) Intel internal platform (X64).
2) EDKII Quark IA32
3) EDKII Vlv2 X64
4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
Jiewen Yao (6):
MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
QuarkPlatformPkg/dsc: enable Smm paging protection.
MdeModulePkg/Core/PiSmmCore/Dispatcher.c | 66 +
MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c | 1509 ++++++++++++++++++++
MdeModulePkg/Core/PiSmmCore/Page.c | 775 +++++++++-
MdeModulePkg/Core/PiSmmCore/PiSmmCore.c | 40 +
MdeModulePkg/Core/PiSmmCore/PiSmmCore.h | 91 ++
MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf | 2 +
MdeModulePkg/Core/PiSmmCore/Pool.c | 16 +
MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h | 51 +
MdeModulePkg/MdeModulePkg.dec | 3 +
QuarkPlatformPkg/Quark.dsc | 6 +
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c | 71 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S | 67 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm | 68 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm | 70 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S | 226 +--
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm | 36 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm | 36 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 37 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c | 4 +-
UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 127 +-
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c | 142 +-
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 156 +-
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf | 5 +-
UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c | 871 +++++++++++
UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c | 39 +-
UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h | 15 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c | 274 +++-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S | 51 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm | 54 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm | 61 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S | 250 +---
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm | 35 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm | 31 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 30 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c | 7 +-
UefiCpuPkg/UefiCpuPkg.dec | 8 +
36 files changed, 4529 insertions(+), 801 deletions(-)
create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
--
2.7.4.windows.1
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
@ 2016-11-04 9:30 ` Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid Jiewen Yao
` (6 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04 9:30 UTC (permalink / raw)
To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek
This table describes the SMM memory attributes.
Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h | 51 ++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h b/MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
new file mode 100644
index 0000000..317eae1
--- /dev/null
+++ b/MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
@@ -0,0 +1,51 @@
+/** @file
+ Define the GUID of the EDKII PI SMM memory attribute table, which
+ is published by PI SMM Core.
+
+Copyright (c) 2016, Intel Corporation. All rights reserved.<BR>
+This program and the accompanying materials are licensed and made available under
+the terms and conditions of the BSD License that accompanies this distribution.
+The full text of the license may be found at
+http://opensource.org/licenses/bsd-license.php.
+
+THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+
+**/
+
+#ifndef _PI_SMM_MEMORY_ATTRIBUTES_TABLE_H_
+#define _PI_SMM_MEMORY_ATTRIBUTES_TABLE_H_
+
+#define EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE_GUID {\
+ 0x6b9fd3f7, 0x16df, 0x45e8, {0xbd, 0x39, 0xb9, 0x4a, 0x66, 0x54, 0x1a, 0x5d} \
+}
+
+//
+// The PI SMM memory attribute table contains the SMM memory map for SMM image.
+//
+// This table is installed to SMST as SMM configuration table.
+//
+// This table is published at gEfiSmmEndOfDxeProtocolGuid notification, because
+// there should be no more SMM driver loaded after that. The EfiRuntimeServicesCode
+// region should not be changed any more.
+//
+// This table is published, if and only if all SMM PE/COFF have aligned section
+// as specified in UEFI specification Section 2.3. For example, IA32/X64 alignment is 4KiB.
+//
+// If this table is published, the EfiRuntimeServicesCode contains code only
+// and it is EFI_MEMORY_RO; the EfiRuntimeServicesData contains data only
+// and it is EFI_MEMORY_XP.
+//
+typedef struct {
+ UINT32 Version;
+ UINT32 NumberOfEntries;
+ UINT32 DescriptorSize;
+ UINT32 Reserved;
+//EFI_MEMORY_DESCRIPTOR Entry[1];
+} EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE;
+
+#define EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE_VERSION 0x00000001
+
+extern EFI_GUID gEdkiiPiSmmMemoryAttributesTableGuid;
+
+#endif
--
2.7.4.windows.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
@ 2016-11-04 9:30 ` Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support Jiewen Yao
` (5 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04 9:30 UTC (permalink / raw)
To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek
This table describes the SMM memory attributes.
Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
MdeModulePkg/MdeModulePkg.dec | 3 +++
1 file changed, 3 insertions(+)
diff --git a/MdeModulePkg/MdeModulePkg.dec b/MdeModulePkg/MdeModulePkg.dec
index 74b8700..99a028f 100644
--- a/MdeModulePkg/MdeModulePkg.dec
+++ b/MdeModulePkg/MdeModulePkg.dec
@@ -355,6 +355,9 @@
## Include/Guid/PiSmmCommunicationRegionTable.h
gEdkiiPiSmmCommunicationRegionTableGuid = { 0x4e28ca50, 0xd582, 0x44ac, {0xa1, 0x1f, 0xe3, 0xd5, 0x65, 0x26, 0xdb, 0x34}}
+ ## Include/Guid/PiSmmMemoryAttributesTable.h
+ gEdkiiPiSmmMemoryAttributesTableGuid = { 0x6b9fd3f7, 0x16df, 0x45e8, {0xbd, 0x39, 0xb9, 0x4a, 0x66, 0x54, 0x1a, 0x5d}}
+
[Ppis]
## Include/Ppi/AtaController.h
gPeiAtaControllerPpiGuid = { 0xa45e60d1, 0xc719, 0x44aa, { 0xb0, 0x7a, 0xaa, 0x77, 0x7f, 0x85, 0x90, 0x6d }}
--
2.7.4.windows.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid Jiewen Yao
@ 2016-11-04 9:30 ` Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable Jiewen Yao
` (4 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04 9:30 UTC (permalink / raw)
To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek
1) This patch installs LoadedImage protocol to SMM
protocol database, so that the SMM image info can be
got easily to construct the PiSmmMemoryAttributes table.
This table is produced at SmmEndOfDxe event.
So that the consumer (PiSmmCpu) may consult this table
to set memory attribute in page table.
Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
MdeModulePkg/Core/PiSmmCore/Dispatcher.c | 66 +
MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c | 1509 ++++++++++++++++++++
MdeModulePkg/Core/PiSmmCore/Page.c | 775 +++++++++-
MdeModulePkg/Core/PiSmmCore/PiSmmCore.c | 40 +
MdeModulePkg/Core/PiSmmCore/PiSmmCore.h | 91 ++
MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf | 2 +
MdeModulePkg/Core/PiSmmCore/Pool.c | 16 +
7 files changed, 2473 insertions(+), 26 deletions(-)
diff --git a/MdeModulePkg/Core/PiSmmCore/Dispatcher.c b/MdeModulePkg/Core/PiSmmCore/Dispatcher.c
index 87f4617..1bddaf1 100644
--- a/MdeModulePkg/Core/PiSmmCore/Dispatcher.c
+++ b/MdeModulePkg/Core/PiSmmCore/Dispatcher.c
@@ -580,6 +580,11 @@ SmmLoadImage (
DriverEntry->LoadedImage->SystemTable = gST;
DriverEntry->LoadedImage->DeviceHandle = DeviceHandle;
+ DriverEntry->SmmLoadedImage.Revision = EFI_LOADED_IMAGE_PROTOCOL_REVISION;
+ DriverEntry->SmmLoadedImage.ParentHandle = gSmmCorePrivate->SmmIplImageHandle;
+ DriverEntry->SmmLoadedImage.SystemTable = gST;
+ DriverEntry->SmmLoadedImage.DeviceHandle = DeviceHandle;
+
//
// Make an EfiBootServicesData buffer copy of FilePath
//
@@ -599,6 +604,25 @@ SmmLoadImage (
DriverEntry->LoadedImage->ImageDataType = EfiRuntimeServicesData;
//
+ // Make a buffer copy of FilePath
+ //
+ Status = SmmAllocatePool (EfiRuntimeServicesData, GetDevicePathSize(FilePath), (VOID **)&DriverEntry->SmmLoadedImage.FilePath);
+ if (EFI_ERROR (Status)) {
+ if (Buffer != NULL) {
+ gBS->FreePool (Buffer);
+ }
+ gBS->FreePool (DriverEntry->LoadedImage->FilePath);
+ SmmFreePages (DstBuffer, PageCount);
+ return Status;
+ }
+ CopyMem (DriverEntry->SmmLoadedImage.FilePath, FilePath, GetDevicePathSize(FilePath));
+
+ DriverEntry->SmmLoadedImage.ImageBase = (VOID *)(UINTN)DriverEntry->ImageBuffer;
+ DriverEntry->SmmLoadedImage.ImageSize = ImageContext.ImageSize;
+ DriverEntry->SmmLoadedImage.ImageCodeType = EfiRuntimeServicesCode;
+ DriverEntry->SmmLoadedImage.ImageDataType = EfiRuntimeServicesData;
+
+ //
// Create a new image handle in the UEFI handle database for the SMM Driver
//
DriverEntry->ImageHandle = NULL;
@@ -608,6 +632,17 @@ SmmLoadImage (
NULL
);
+ //
+ // Create a new image handle in the SMM handle database for the SMM Driver
+ //
+ DriverEntry->SmmImageHandle = NULL;
+ Status = SmmInstallProtocolInterface (
+ &DriverEntry->SmmImageHandle,
+ &gEfiLoadedImageProtocolGuid,
+ EFI_NATIVE_INTERFACE,
+ &DriverEntry->SmmLoadedImage
+ );
+
PERF_START (DriverEntry->ImageHandle, "LoadImage:", NULL, Tick);
PERF_END (DriverEntry->ImageHandle, "LoadImage:", NULL, 0);
@@ -896,6 +931,16 @@ SmmDispatcher (
}
gBS->FreePool (DriverEntry->LoadedImage);
}
+ Status = SmmUninstallProtocolInterface (
+ DriverEntry->SmmImageHandle,
+ &gEfiLoadedImageProtocolGuid,
+ &DriverEntry->SmmLoadedImage
+ );
+ if (!EFI_ERROR(Status)) {
+ if (DriverEntry->SmmLoadedImage.FilePath != NULL) {
+ SmmFreePool (DriverEntry->SmmLoadedImage.FilePath);
+ }
+ }
}
REPORT_STATUS_CODE_WITH_EXTENDED_DATA (
@@ -1327,6 +1372,27 @@ SmmDriverDispatchHandler (
mSmmCoreLoadedImage->DeviceHandle = FvHandle;
}
+ if (mSmmCoreDriverEntry->SmmLoadedImage.FilePath == NULL) {
+ //
+ // Maybe one special FV contains only one SMM_CORE module, so its device path must
+ // be initialized completely.
+ //
+ EfiInitializeFwVolDevicepathNode (&mFvDevicePath.File, &NameGuid);
+ SetDevicePathEndNode (&mFvDevicePath.End);
+
+ //
+ // Make a buffer copy FilePath
+ //
+ Status = SmmAllocatePool (
+ EfiRuntimeServicesData,
+ GetDevicePathSize ((EFI_DEVICE_PATH_PROTOCOL *)&mFvDevicePath),
+ (VOID **)&mSmmCoreDriverEntry->SmmLoadedImage.FilePath
+ );
+ ASSERT_EFI_ERROR (Status);
+ CopyMem (mSmmCoreDriverEntry->SmmLoadedImage.FilePath, &mFvDevicePath, GetDevicePathSize((EFI_DEVICE_PATH_PROTOCOL *)&mFvDevicePath));
+
+ mSmmCoreDriverEntry->SmmLoadedImage.DeviceHandle = FvHandle;
+ }
} else {
SmmAddToDriverList (Fv, FvHandle, &NameGuid);
}
diff --git a/MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c b/MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
new file mode 100644
index 0000000..3a5a2c8
--- /dev/null
+++ b/MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
@@ -0,0 +1,1509 @@
+/** @file
+ PI SMM MemoryAttributes support
+
+Copyright (c) 2016, Intel Corporation. All rights reserved.<BR>
+This program and the accompanying materials
+are licensed and made available under the terms and conditions of the BSD License
+which accompanies this distribution. The full text of the license may be found at
+http://opensource.org/licenses/bsd-license.php
+
+THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+
+**/
+
+#include <PiDxe.h>
+#include <Library/BaseLib.h>
+#include <Library/BaseMemoryLib.h>
+#include <Library/MemoryAllocationLib.h>
+#include <Library/UefiBootServicesTableLib.h>
+#include <Library/SmmServicesTableLib.h>
+#include <Library/DebugLib.h>
+#include <Library/PcdLib.h>
+
+#include <Library/PeCoffLib.h>
+#include <Library/PeCoffGetEntryPointLib.h>
+
+#include <Guid/PiSmmMemoryAttributesTable.h>
+
+#include "PiSmmCore.h"
+
+#define PREVIOUS_MEMORY_DESCRIPTOR(MemoryDescriptor, Size) \
+ ((EFI_MEMORY_DESCRIPTOR *)((UINT8 *)(MemoryDescriptor) - (Size)))
+
+#define IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE SIGNATURE_32 ('I','P','R','C')
+
+typedef struct {
+ UINT32 Signature;
+ LIST_ENTRY Link;
+ EFI_PHYSICAL_ADDRESS CodeSegmentBase;
+ UINT64 CodeSegmentSize;
+} IMAGE_PROPERTIES_RECORD_CODE_SECTION;
+
+#define IMAGE_PROPERTIES_RECORD_SIGNATURE SIGNATURE_32 ('I','P','R','D')
+
+typedef struct {
+ UINT32 Signature;
+ LIST_ENTRY Link;
+ EFI_PHYSICAL_ADDRESS ImageBase;
+ UINT64 ImageSize;
+ UINTN CodeSegmentCount;
+ LIST_ENTRY CodeSegmentList;
+} IMAGE_PROPERTIES_RECORD;
+
+#define IMAGE_PROPERTIES_PRIVATE_DATA_SIGNATURE SIGNATURE_32 ('I','P','P','D')
+
+typedef struct {
+ UINT32 Signature;
+ UINTN ImageRecordCount;
+ UINTN CodeSegmentCountMax;
+ LIST_ENTRY ImageRecordList;
+} IMAGE_PROPERTIES_PRIVATE_DATA;
+
+IMAGE_PROPERTIES_PRIVATE_DATA mImagePropertiesPrivateData = {
+ IMAGE_PROPERTIES_PRIVATE_DATA_SIGNATURE,
+ 0,
+ 0,
+ INITIALIZE_LIST_HEAD_VARIABLE (mImagePropertiesPrivateData.ImageRecordList)
+};
+
+#define EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA BIT0
+
+UINT64 mMemoryProtectionAttribute = EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA;
+
+//
+// Below functions are for MemoryMap
+//
+
+/**
+ Converts a number of EFI_PAGEs to a size in bytes.
+
+ NOTE: Do not use EFI_PAGES_TO_SIZE because it handles UINTN only.
+
+ @param[in] Pages The number of EFI_PAGES.
+
+ @return The number of bytes associated with the number of EFI_PAGEs specified
+ by Pages.
+**/
+STATIC
+UINT64
+EfiPagesToSize (
+ IN UINT64 Pages
+ )
+{
+ return LShiftU64 (Pages, EFI_PAGE_SHIFT);
+}
+
+/**
+ Converts a size, in bytes, to a number of EFI_PAGESs.
+
+ NOTE: Do not use EFI_SIZE_TO_PAGES because it handles UINTN only.
+
+ @param[in] Size A size in bytes.
+
+ @return The number of EFI_PAGESs associated with the number of bytes specified
+ by Size.
+
+**/
+STATIC
+UINT64
+EfiSizeToPages (
+ IN UINT64 Size
+ )
+{
+ return RShiftU64 (Size, EFI_PAGE_SHIFT) + ((((UINTN)Size) & EFI_PAGE_MASK) ? 1 : 0);
+}
+
+/**
+ Check the consistency of Smm memory attributes table.
+
+ @param[in] MemoryAttributesTable PI SMM memory attributes table
+**/
+VOID
+SmmMemoryAttributesTableConsistencyCheck (
+ IN EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE *MemoryAttributesTable
+ )
+{
+ EFI_MEMORY_DESCRIPTOR *MemoryMap;
+ UINTN MemoryMapEntryCount;
+ UINTN DescriptorSize;
+ UINTN Index;
+ UINT64 Address;
+
+ Address = 0;
+ MemoryMapEntryCount = MemoryAttributesTable->NumberOfEntries;
+ DescriptorSize = MemoryAttributesTable->DescriptorSize;
+ MemoryMap = (EFI_MEMORY_DESCRIPTOR *)(MemoryAttributesTable + 1);
+ for (Index = 0; Index < MemoryMapEntryCount; Index++) {
+ if (Address != 0) {
+ ASSERT (Address == MemoryMap->PhysicalStart);
+ }
+ Address = MemoryMap->PhysicalStart + EFI_PAGES_TO_SIZE(MemoryMap->NumberOfPages);
+ MemoryMap = NEXT_MEMORY_DESCRIPTOR(MemoryMap, DescriptorSize);
+ }
+}
+
+/**
+ Sort memory map entries based upon PhysicalStart, from low to high.
+
+ @param[in] MemoryMap A pointer to the buffer in which firmware places
+ the current memory map.
+ @param[in] MemoryMapSize Size, in bytes, of the MemoryMap buffer.
+ @param[in] DescriptorSize Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+VOID
+SortMemoryMap (
+ IN OUT EFI_MEMORY_DESCRIPTOR *MemoryMap,
+ IN UINTN MemoryMapSize,
+ IN UINTN DescriptorSize
+ )
+{
+ EFI_MEMORY_DESCRIPTOR *MemoryMapEntry;
+ EFI_MEMORY_DESCRIPTOR *NextMemoryMapEntry;
+ EFI_MEMORY_DESCRIPTOR *MemoryMapEnd;
+ EFI_MEMORY_DESCRIPTOR TempMemoryMap;
+
+ MemoryMapEntry = MemoryMap;
+ NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+ MemoryMapEnd = (EFI_MEMORY_DESCRIPTOR *) ((UINT8 *) MemoryMap + MemoryMapSize);
+ while (MemoryMapEntry < MemoryMapEnd) {
+ while (NextMemoryMapEntry < MemoryMapEnd) {
+ if (MemoryMapEntry->PhysicalStart > NextMemoryMapEntry->PhysicalStart) {
+ CopyMem (&TempMemoryMap, MemoryMapEntry, sizeof(EFI_MEMORY_DESCRIPTOR));
+ CopyMem (MemoryMapEntry, NextMemoryMapEntry, sizeof(EFI_MEMORY_DESCRIPTOR));
+ CopyMem (NextMemoryMapEntry, &TempMemoryMap, sizeof(EFI_MEMORY_DESCRIPTOR));
+ }
+
+ NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (NextMemoryMapEntry, DescriptorSize);
+ }
+
+ MemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+ NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+ }
+
+ return ;
+}
+
+/**
+ Merge continous memory map entries whose have same attributes.
+
+ @param[in, out] MemoryMap A pointer to the buffer in which firmware places
+ the current memory map.
+ @param[in, out] MemoryMapSize A pointer to the size, in bytes, of the
+ MemoryMap buffer. On input, this is the size of
+ the current memory map. On output,
+ it is the size of new memory map after merge.
+ @param[in] DescriptorSize Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+VOID
+MergeMemoryMap (
+ IN OUT EFI_MEMORY_DESCRIPTOR *MemoryMap,
+ IN OUT UINTN *MemoryMapSize,
+ IN UINTN DescriptorSize
+ )
+{
+ EFI_MEMORY_DESCRIPTOR *MemoryMapEntry;
+ EFI_MEMORY_DESCRIPTOR *MemoryMapEnd;
+ UINT64 MemoryBlockLength;
+ EFI_MEMORY_DESCRIPTOR *NewMemoryMapEntry;
+ EFI_MEMORY_DESCRIPTOR *NextMemoryMapEntry;
+
+ MemoryMapEntry = MemoryMap;
+ NewMemoryMapEntry = MemoryMap;
+ MemoryMapEnd = (EFI_MEMORY_DESCRIPTOR *) ((UINT8 *) MemoryMap + *MemoryMapSize);
+ while ((UINTN)MemoryMapEntry < (UINTN)MemoryMapEnd) {
+ CopyMem (NewMemoryMapEntry, MemoryMapEntry, sizeof(EFI_MEMORY_DESCRIPTOR));
+ NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+
+ do {
+ MemoryBlockLength = (UINT64) (EfiPagesToSize (MemoryMapEntry->NumberOfPages));
+ if (((UINTN)NextMemoryMapEntry < (UINTN)MemoryMapEnd) &&
+ (MemoryMapEntry->Type == NextMemoryMapEntry->Type) &&
+ (MemoryMapEntry->Attribute == NextMemoryMapEntry->Attribute) &&
+ ((MemoryMapEntry->PhysicalStart + MemoryBlockLength) == NextMemoryMapEntry->PhysicalStart)) {
+ MemoryMapEntry->NumberOfPages += NextMemoryMapEntry->NumberOfPages;
+ if (NewMemoryMapEntry != MemoryMapEntry) {
+ NewMemoryMapEntry->NumberOfPages += NextMemoryMapEntry->NumberOfPages;
+ }
+
+ NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (NextMemoryMapEntry, DescriptorSize);
+ continue;
+ } else {
+ MemoryMapEntry = PREVIOUS_MEMORY_DESCRIPTOR (NextMemoryMapEntry, DescriptorSize);
+ break;
+ }
+ } while (TRUE);
+
+ MemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+ NewMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (NewMemoryMapEntry, DescriptorSize);
+ }
+
+ *MemoryMapSize = (UINTN)NewMemoryMapEntry - (UINTN)MemoryMap;
+
+ return ;
+}
+
+/**
+ Enforce memory map attributes.
+ This function will set EfiRuntimeServicesData/EfiMemoryMappedIO/EfiMemoryMappedIOPortSpace to be EFI_MEMORY_XP.
+
+ @param[in, out] MemoryMap A pointer to the buffer in which firmware places
+ the current memory map.
+ @param[in] MemoryMapSize Size, in bytes, of the MemoryMap buffer.
+ @param[in] DescriptorSize Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+VOID
+EnforceMemoryMapAttribute (
+ IN OUT EFI_MEMORY_DESCRIPTOR *MemoryMap,
+ IN UINTN MemoryMapSize,
+ IN UINTN DescriptorSize
+ )
+{
+ EFI_MEMORY_DESCRIPTOR *MemoryMapEntry;
+ EFI_MEMORY_DESCRIPTOR *MemoryMapEnd;
+
+ MemoryMapEntry = MemoryMap;
+ MemoryMapEnd = (EFI_MEMORY_DESCRIPTOR *) ((UINT8 *) MemoryMap + MemoryMapSize);
+ while ((UINTN)MemoryMapEntry < (UINTN)MemoryMapEnd) {
+ switch (MemoryMapEntry->Type) {
+ case EfiRuntimeServicesCode:
+ MemoryMapEntry->Attribute |= EFI_MEMORY_RO;
+ break;
+ case EfiRuntimeServicesData:
+ MemoryMapEntry->Attribute |= EFI_MEMORY_XP;
+ break;
+ }
+
+ MemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+ }
+
+ return ;
+}
+
+/**
+ Return the first image record, whose [ImageBase, ImageSize] covered by [Buffer, Length].
+
+ @param[in] Buffer Start Address
+ @param[in] Length Address length
+
+ @return first image record covered by [buffer, length]
+**/
+STATIC
+IMAGE_PROPERTIES_RECORD *
+GetImageRecordByAddress (
+ IN EFI_PHYSICAL_ADDRESS Buffer,
+ IN UINT64 Length
+ )
+{
+ IMAGE_PROPERTIES_RECORD *ImageRecord;
+ LIST_ENTRY *ImageRecordLink;
+ LIST_ENTRY *ImageRecordList;
+
+ ImageRecordList = &mImagePropertiesPrivateData.ImageRecordList;
+
+ for (ImageRecordLink = ImageRecordList->ForwardLink;
+ ImageRecordLink != ImageRecordList;
+ ImageRecordLink = ImageRecordLink->ForwardLink) {
+ ImageRecord = CR (
+ ImageRecordLink,
+ IMAGE_PROPERTIES_RECORD,
+ Link,
+ IMAGE_PROPERTIES_RECORD_SIGNATURE
+ );
+
+ if ((Buffer <= ImageRecord->ImageBase) &&
+ (Buffer + Length >= ImageRecord->ImageBase + ImageRecord->ImageSize)) {
+ return ImageRecord;
+ }
+ }
+
+ return NULL;
+}
+
+/**
+ Set the memory map to new entries, according to one old entry,
+ based upon PE code section and data section in image record
+
+ @param[in] ImageRecord An image record whose [ImageBase, ImageSize] covered
+ by old memory map entry.
+ @param[in, out] NewRecord A pointer to several new memory map entries.
+ The caller gurantee the buffer size be 1 +
+ (SplitRecordCount * DescriptorSize) calculated
+ below.
+ @param[in] OldRecord A pointer to one old memory map entry.
+ @param[in] DescriptorSize Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+UINTN
+SetNewRecord (
+ IN IMAGE_PROPERTIES_RECORD *ImageRecord,
+ IN OUT EFI_MEMORY_DESCRIPTOR *NewRecord,
+ IN EFI_MEMORY_DESCRIPTOR *OldRecord,
+ IN UINTN DescriptorSize
+ )
+{
+ EFI_MEMORY_DESCRIPTOR TempRecord;
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION *ImageRecordCodeSection;
+ LIST_ENTRY *ImageRecordCodeSectionLink;
+ LIST_ENTRY *ImageRecordCodeSectionEndLink;
+ LIST_ENTRY *ImageRecordCodeSectionList;
+ UINTN NewRecordCount;
+ UINT64 PhysicalEnd;
+ UINT64 ImageEnd;
+
+ CopyMem (&TempRecord, OldRecord, sizeof(EFI_MEMORY_DESCRIPTOR));
+ PhysicalEnd = TempRecord.PhysicalStart + EfiPagesToSize(TempRecord.NumberOfPages);
+ NewRecordCount = 0;
+
+ ImageRecordCodeSectionList = &ImageRecord->CodeSegmentList;
+
+ ImageRecordCodeSectionLink = ImageRecordCodeSectionList->ForwardLink;
+ ImageRecordCodeSectionEndLink = ImageRecordCodeSectionList;
+ while (ImageRecordCodeSectionLink != ImageRecordCodeSectionEndLink) {
+ ImageRecordCodeSection = CR (
+ ImageRecordCodeSectionLink,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+ Link,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+ );
+ ImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+
+ if (TempRecord.PhysicalStart <= ImageRecordCodeSection->CodeSegmentBase) {
+ //
+ // DATA
+ //
+ NewRecord->Type = EfiRuntimeServicesData;
+ NewRecord->PhysicalStart = TempRecord.PhysicalStart;
+ NewRecord->VirtualStart = 0;
+ NewRecord->NumberOfPages = EfiSizeToPages(ImageRecordCodeSection->CodeSegmentBase - NewRecord->PhysicalStart);
+ NewRecord->Attribute = TempRecord.Attribute | EFI_MEMORY_XP;
+ if (NewRecord->NumberOfPages != 0) {
+ NewRecord = NEXT_MEMORY_DESCRIPTOR (NewRecord, DescriptorSize);
+ NewRecordCount ++;
+ }
+
+ //
+ // CODE
+ //
+ NewRecord->Type = EfiRuntimeServicesCode;
+ NewRecord->PhysicalStart = ImageRecordCodeSection->CodeSegmentBase;
+ NewRecord->VirtualStart = 0;
+ NewRecord->NumberOfPages = EfiSizeToPages(ImageRecordCodeSection->CodeSegmentSize);
+ NewRecord->Attribute = (TempRecord.Attribute & (~EFI_MEMORY_XP)) | EFI_MEMORY_RO;
+ if (NewRecord->NumberOfPages != 0) {
+ NewRecord = NEXT_MEMORY_DESCRIPTOR (NewRecord, DescriptorSize);
+ NewRecordCount ++;
+ }
+
+ TempRecord.PhysicalStart = ImageRecordCodeSection->CodeSegmentBase + EfiPagesToSize (EfiSizeToPages(ImageRecordCodeSection->CodeSegmentSize));
+ TempRecord.NumberOfPages = EfiSizeToPages(PhysicalEnd - TempRecord.PhysicalStart);
+ if (TempRecord.NumberOfPages == 0) {
+ break;
+ }
+ }
+ }
+
+ ImageEnd = ImageRecord->ImageBase + ImageRecord->ImageSize;
+
+ //
+ // Final DATA
+ //
+ if (TempRecord.PhysicalStart < ImageEnd) {
+ NewRecord->Type = EfiRuntimeServicesData;
+ NewRecord->PhysicalStart = TempRecord.PhysicalStart;
+ NewRecord->VirtualStart = 0;
+ NewRecord->NumberOfPages = EfiSizeToPages (ImageEnd - TempRecord.PhysicalStart);
+ NewRecord->Attribute = TempRecord.Attribute | EFI_MEMORY_XP;
+ NewRecordCount ++;
+ }
+
+ return NewRecordCount;
+}
+
+/**
+ Return the max number of new splitted entries, according to one old entry,
+ based upon PE code section and data section.
+
+ @param[in] OldRecord A pointer to one old memory map entry.
+
+ @retval 0 no entry need to be splitted.
+ @return the max number of new splitted entries
+**/
+STATIC
+UINTN
+GetMaxSplitRecordCount (
+ IN EFI_MEMORY_DESCRIPTOR *OldRecord
+ )
+{
+ IMAGE_PROPERTIES_RECORD *ImageRecord;
+ UINTN SplitRecordCount;
+ UINT64 PhysicalStart;
+ UINT64 PhysicalEnd;
+
+ SplitRecordCount = 0;
+ PhysicalStart = OldRecord->PhysicalStart;
+ PhysicalEnd = OldRecord->PhysicalStart + EfiPagesToSize(OldRecord->NumberOfPages);
+
+ do {
+ ImageRecord = GetImageRecordByAddress (PhysicalStart, PhysicalEnd - PhysicalStart);
+ if (ImageRecord == NULL) {
+ break;
+ }
+ SplitRecordCount += (2 * ImageRecord->CodeSegmentCount + 1);
+ PhysicalStart = ImageRecord->ImageBase + ImageRecord->ImageSize;
+ } while ((ImageRecord != NULL) && (PhysicalStart < PhysicalEnd));
+
+ if (SplitRecordCount != 0) {
+ SplitRecordCount--;
+ }
+
+ return SplitRecordCount;
+}
+
+/**
+ Split the memory map to new entries, according to one old entry,
+ based upon PE code section and data section.
+
+ @param[in] OldRecord A pointer to one old memory map entry.
+ @param[in, out] NewRecord A pointer to several new memory map entries.
+ The caller gurantee the buffer size be 1 +
+ (SplitRecordCount * DescriptorSize) calculated
+ below.
+ @param[in] MaxSplitRecordCount The max number of splitted entries
+ @param[in] DescriptorSize Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+
+ @retval 0 no entry is splitted.
+ @return the real number of splitted record.
+**/
+STATIC
+UINTN
+SplitRecord (
+ IN EFI_MEMORY_DESCRIPTOR *OldRecord,
+ IN OUT EFI_MEMORY_DESCRIPTOR *NewRecord,
+ IN UINTN MaxSplitRecordCount,
+ IN UINTN DescriptorSize
+ )
+{
+ EFI_MEMORY_DESCRIPTOR TempRecord;
+ IMAGE_PROPERTIES_RECORD *ImageRecord;
+ IMAGE_PROPERTIES_RECORD *NewImageRecord;
+ UINT64 PhysicalStart;
+ UINT64 PhysicalEnd;
+ UINTN NewRecordCount;
+ UINTN TotalNewRecordCount;
+
+ if (MaxSplitRecordCount == 0) {
+ CopyMem (NewRecord, OldRecord, DescriptorSize);
+ return 0;
+ }
+
+ TotalNewRecordCount = 0;
+
+ //
+ // Override previous record
+ //
+ CopyMem (&TempRecord, OldRecord, sizeof(EFI_MEMORY_DESCRIPTOR));
+ PhysicalStart = TempRecord.PhysicalStart;
+ PhysicalEnd = TempRecord.PhysicalStart + EfiPagesToSize(TempRecord.NumberOfPages);
+
+ ImageRecord = NULL;
+ do {
+ NewImageRecord = GetImageRecordByAddress (PhysicalStart, PhysicalEnd - PhysicalStart);
+ if (NewImageRecord == NULL) {
+ //
+ // No more image covered by this range, stop
+ //
+ if ((PhysicalEnd > PhysicalStart) && (ImageRecord != NULL)) {
+ //
+ // If this is still address in this record, need record.
+ //
+ NewRecord = PREVIOUS_MEMORY_DESCRIPTOR (NewRecord, DescriptorSize);
+ if (NewRecord->Type == EfiRuntimeServicesData) {
+ //
+ // Last record is DATA, just merge it.
+ //
+ NewRecord->NumberOfPages = EfiSizeToPages(PhysicalEnd - NewRecord->PhysicalStart);
+ } else {
+ //
+ // Last record is CODE, create a new DATA entry.
+ //
+ NewRecord = NEXT_MEMORY_DESCRIPTOR (NewRecord, DescriptorSize);
+ NewRecord->Type = EfiRuntimeServicesData;
+ NewRecord->PhysicalStart = TempRecord.PhysicalStart;
+ NewRecord->VirtualStart = 0;
+ NewRecord->NumberOfPages = TempRecord.NumberOfPages;
+ NewRecord->Attribute = TempRecord.Attribute | EFI_MEMORY_XP;
+ TotalNewRecordCount ++;
+ }
+ }
+ break;
+ }
+ ImageRecord = NewImageRecord;
+
+ //
+ // Set new record
+ //
+ NewRecordCount = SetNewRecord (ImageRecord, NewRecord, &TempRecord, DescriptorSize);
+ TotalNewRecordCount += NewRecordCount;
+ NewRecord = (EFI_MEMORY_DESCRIPTOR *)((UINT8 *)NewRecord + NewRecordCount * DescriptorSize);
+
+ //
+ // Update PhysicalStart, in order to exclude the image buffer already splitted.
+ //
+ PhysicalStart = ImageRecord->ImageBase + ImageRecord->ImageSize;
+ TempRecord.PhysicalStart = PhysicalStart;
+ TempRecord.NumberOfPages = EfiSizeToPages (PhysicalEnd - PhysicalStart);
+ } while ((ImageRecord != NULL) && (PhysicalStart < PhysicalEnd));
+
+ return TotalNewRecordCount - 1;
+}
+
+/**
+ Split the original memory map, and add more entries to describe PE code section and data section.
+ This function will set EfiRuntimeServicesData to be EFI_MEMORY_XP.
+ This function will merge entries with same attributes finally.
+
+ NOTE: It assumes PE code/data section are page aligned.
+ NOTE: It assumes enough entry is prepared for new memory map.
+
+ Split table:
+ +---------------+
+ | Record X |
+ +---------------+
+ | Record RtCode |
+ +---------------+
+ | Record Y |
+ +---------------+
+ ==>
+ +---------------+
+ | Record X |
+ +---------------+ ----
+ | Record RtData | |
+ +---------------+ |
+ | Record RtCode | |-> PE/COFF1
+ +---------------+ |
+ | Record RtData | |
+ +---------------+ ----
+ | Record RtData | |
+ +---------------+ |
+ | Record RtCode | |-> PE/COFF2
+ +---------------+ |
+ | Record RtData | |
+ +---------------+ ----
+ | Record Y |
+ +---------------+
+
+ @param[in, out] MemoryMapSize A pointer to the size, in bytes, of the
+ MemoryMap buffer. On input, this is the size of
+ old MemoryMap before split. The actual buffer
+ size of MemoryMap is MemoryMapSize +
+ (AdditionalRecordCount * DescriptorSize) calculated
+ below. On output, it is the size of new MemoryMap
+ after split.
+ @param[in, out] MemoryMap A pointer to the buffer in which firmware places
+ the current memory map.
+ @param[in] DescriptorSize Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+VOID
+SplitTable (
+ IN OUT UINTN *MemoryMapSize,
+ IN OUT EFI_MEMORY_DESCRIPTOR *MemoryMap,
+ IN UINTN DescriptorSize
+ )
+{
+ INTN IndexOld;
+ INTN IndexNew;
+ UINTN MaxSplitRecordCount;
+ UINTN RealSplitRecordCount;
+ UINTN TotalSplitRecordCount;
+ UINTN AdditionalRecordCount;
+
+ AdditionalRecordCount = (2 * mImagePropertiesPrivateData.CodeSegmentCountMax + 1) * mImagePropertiesPrivateData.ImageRecordCount;
+
+ TotalSplitRecordCount = 0;
+ //
+ // Let old record point to end of valid MemoryMap buffer.
+ //
+ IndexOld = ((*MemoryMapSize) / DescriptorSize) - 1;
+ //
+ // Let new record point to end of full MemoryMap buffer.
+ //
+ IndexNew = ((*MemoryMapSize) / DescriptorSize) - 1 + AdditionalRecordCount;
+ for (; IndexOld >= 0; IndexOld--) {
+ MaxSplitRecordCount = GetMaxSplitRecordCount ((EFI_MEMORY_DESCRIPTOR *)((UINT8 *)MemoryMap + IndexOld * DescriptorSize));
+ //
+ // Split this MemoryMap record
+ //
+ IndexNew -= MaxSplitRecordCount;
+ RealSplitRecordCount = SplitRecord (
+ (EFI_MEMORY_DESCRIPTOR *)((UINT8 *)MemoryMap + IndexOld * DescriptorSize),
+ (EFI_MEMORY_DESCRIPTOR *)((UINT8 *)MemoryMap + IndexNew * DescriptorSize),
+ MaxSplitRecordCount,
+ DescriptorSize
+ );
+ //
+ // Adjust IndexNew according to real split.
+ //
+ CopyMem (
+ ((UINT8 *)MemoryMap + (IndexNew + MaxSplitRecordCount - RealSplitRecordCount) * DescriptorSize),
+ ((UINT8 *)MemoryMap + IndexNew * DescriptorSize),
+ RealSplitRecordCount * DescriptorSize
+ );
+ IndexNew = IndexNew + MaxSplitRecordCount - RealSplitRecordCount;
+ TotalSplitRecordCount += RealSplitRecordCount;
+ IndexNew --;
+ }
+ //
+ // Move all records to the beginning.
+ //
+ CopyMem (
+ MemoryMap,
+ (UINT8 *)MemoryMap + (AdditionalRecordCount - TotalSplitRecordCount) * DescriptorSize,
+ (*MemoryMapSize) + TotalSplitRecordCount * DescriptorSize
+ );
+
+ *MemoryMapSize = (*MemoryMapSize) + DescriptorSize * TotalSplitRecordCount;
+
+ //
+ // Sort from low to high (Just in case)
+ //
+ SortMemoryMap (MemoryMap, *MemoryMapSize, DescriptorSize);
+
+ //
+ // Set RuntimeData to XP
+ //
+ EnforceMemoryMapAttribute (MemoryMap, *MemoryMapSize, DescriptorSize);
+
+ //
+ // Merge same type to save entry size
+ //
+ MergeMemoryMap (MemoryMap, MemoryMapSize, DescriptorSize);
+
+ return ;
+}
+
+/**
+ This function for GetMemoryMap() with memory attributes table.
+
+ It calls original GetMemoryMap() to get the original memory map information. Then
+ plus the additional memory map entries for PE Code/Data seperation.
+
+ @param[in, out] MemoryMapSize A pointer to the size, in bytes, of the
+ MemoryMap buffer. On input, this is the size of
+ the buffer allocated by the caller. On output,
+ it is the size of the buffer returned by the
+ firmware if the buffer was large enough, or the
+ size of the buffer needed to contain the map if
+ the buffer was too small.
+ @param[in, out] MemoryMap A pointer to the buffer in which firmware places
+ the current memory map.
+ @param[out] MapKey A pointer to the location in which firmware
+ returns the key for the current memory map.
+ @param[out] DescriptorSize A pointer to the location in which firmware
+ returns the size, in bytes, of an individual
+ EFI_MEMORY_DESCRIPTOR.
+ @param[out] DescriptorVersion A pointer to the location in which firmware
+ returns the version number associated with the
+ EFI_MEMORY_DESCRIPTOR.
+
+ @retval EFI_SUCCESS The memory map was returned in the MemoryMap
+ buffer.
+ @retval EFI_BUFFER_TOO_SMALL The MemoryMap buffer was too small. The current
+ buffer size needed to hold the memory map is
+ returned in MemoryMapSize.
+ @retval EFI_INVALID_PARAMETER One of the parameters has an invalid value.
+
+**/
+STATIC
+EFI_STATUS
+EFIAPI
+SmmCoreGetMemoryMapMemoryAttributesTable (
+ IN OUT UINTN *MemoryMapSize,
+ IN OUT EFI_MEMORY_DESCRIPTOR *MemoryMap,
+ OUT UINTN *MapKey,
+ OUT UINTN *DescriptorSize,
+ OUT UINT32 *DescriptorVersion
+ )
+{
+ EFI_STATUS Status;
+ UINTN OldMemoryMapSize;
+ UINTN AdditionalRecordCount;
+
+ //
+ // If PE code/data is not aligned, just return.
+ //
+ if ((mMemoryProtectionAttribute & EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA) == 0) {
+ return SmmCoreGetMemoryMap (MemoryMapSize, MemoryMap, MapKey, DescriptorSize, DescriptorVersion);
+ }
+
+ if (MemoryMapSize == NULL) {
+ return EFI_INVALID_PARAMETER;
+ }
+
+ AdditionalRecordCount = (2 * mImagePropertiesPrivateData.CodeSegmentCountMax + 1) * mImagePropertiesPrivateData.ImageRecordCount;
+
+ OldMemoryMapSize = *MemoryMapSize;
+ Status = SmmCoreGetMemoryMap (MemoryMapSize, MemoryMap, MapKey, DescriptorSize, DescriptorVersion);
+ if (Status == EFI_BUFFER_TOO_SMALL) {
+ *MemoryMapSize = *MemoryMapSize + (*DescriptorSize) * AdditionalRecordCount;
+ } else if (Status == EFI_SUCCESS) {
+ if (OldMemoryMapSize - *MemoryMapSize < (*DescriptorSize) * AdditionalRecordCount) {
+ *MemoryMapSize = *MemoryMapSize + (*DescriptorSize) * AdditionalRecordCount;
+ //
+ // Need update status to buffer too small
+ //
+ Status = EFI_BUFFER_TOO_SMALL;
+ } else {
+ //
+ // Split PE code/data
+ //
+ ASSERT(MemoryMap != NULL);
+ SplitTable (MemoryMapSize, MemoryMap, *DescriptorSize);
+ }
+ }
+
+ return Status;
+}
+
+//
+// Below functions are for ImageRecord
+//
+
+/**
+ Set MemoryProtectionAttribute accroding to PE/COFF image section alignment.
+
+ @param[in] SectionAlignment PE/COFF section alignment
+**/
+STATIC
+VOID
+SetMemoryAttributesTableSectionAlignment (
+ IN UINT32 SectionAlignment
+ )
+{
+ if (((SectionAlignment & (EFI_ACPI_RUNTIME_PAGE_ALLOCATION_ALIGNMENT - 1)) != 0) &&
+ ((mMemoryProtectionAttribute & EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA) != 0)) {
+ DEBUG ((DEBUG_VERBOSE, "SMM SetMemoryAttributesTableSectionAlignment - Clear\n"));
+ mMemoryProtectionAttribute &= ~((UINT64)EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA);
+ }
+}
+
+/**
+ Swap two code sections in image record.
+
+ @param[in] FirstImageRecordCodeSection first code section in image record
+ @param[in] SecondImageRecordCodeSection second code section in image record
+**/
+STATIC
+VOID
+SwapImageRecordCodeSection (
+ IN IMAGE_PROPERTIES_RECORD_CODE_SECTION *FirstImageRecordCodeSection,
+ IN IMAGE_PROPERTIES_RECORD_CODE_SECTION *SecondImageRecordCodeSection
+ )
+{
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION TempImageRecordCodeSection;
+
+ TempImageRecordCodeSection.CodeSegmentBase = FirstImageRecordCodeSection->CodeSegmentBase;
+ TempImageRecordCodeSection.CodeSegmentSize = FirstImageRecordCodeSection->CodeSegmentSize;
+
+ FirstImageRecordCodeSection->CodeSegmentBase = SecondImageRecordCodeSection->CodeSegmentBase;
+ FirstImageRecordCodeSection->CodeSegmentSize = SecondImageRecordCodeSection->CodeSegmentSize;
+
+ SecondImageRecordCodeSection->CodeSegmentBase = TempImageRecordCodeSection.CodeSegmentBase;
+ SecondImageRecordCodeSection->CodeSegmentSize = TempImageRecordCodeSection.CodeSegmentSize;
+}
+
+/**
+ Sort code section in image record, based upon CodeSegmentBase from low to high.
+
+ @param[in] ImageRecord image record to be sorted
+**/
+STATIC
+VOID
+SortImageRecordCodeSection (
+ IN IMAGE_PROPERTIES_RECORD *ImageRecord
+ )
+{
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION *ImageRecordCodeSection;
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION *NextImageRecordCodeSection;
+ LIST_ENTRY *ImageRecordCodeSectionLink;
+ LIST_ENTRY *NextImageRecordCodeSectionLink;
+ LIST_ENTRY *ImageRecordCodeSectionEndLink;
+ LIST_ENTRY *ImageRecordCodeSectionList;
+
+ ImageRecordCodeSectionList = &ImageRecord->CodeSegmentList;
+
+ ImageRecordCodeSectionLink = ImageRecordCodeSectionList->ForwardLink;
+ NextImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+ ImageRecordCodeSectionEndLink = ImageRecordCodeSectionList;
+ while (ImageRecordCodeSectionLink != ImageRecordCodeSectionEndLink) {
+ ImageRecordCodeSection = CR (
+ ImageRecordCodeSectionLink,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+ Link,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+ );
+ while (NextImageRecordCodeSectionLink != ImageRecordCodeSectionEndLink) {
+ NextImageRecordCodeSection = CR (
+ NextImageRecordCodeSectionLink,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+ Link,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+ );
+ if (ImageRecordCodeSection->CodeSegmentBase > NextImageRecordCodeSection->CodeSegmentBase) {
+ SwapImageRecordCodeSection (ImageRecordCodeSection, NextImageRecordCodeSection);
+ }
+ NextImageRecordCodeSectionLink = NextImageRecordCodeSectionLink->ForwardLink;
+ }
+
+ ImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+ NextImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+ }
+}
+
+/**
+ Check if code section in image record is valid.
+
+ @param[in] ImageRecord image record to be checked
+
+ @retval TRUE image record is valid
+ @retval FALSE image record is invalid
+**/
+STATIC
+BOOLEAN
+IsImageRecordCodeSectionValid (
+ IN IMAGE_PROPERTIES_RECORD *ImageRecord
+ )
+{
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION *ImageRecordCodeSection;
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION *LastImageRecordCodeSection;
+ LIST_ENTRY *ImageRecordCodeSectionLink;
+ LIST_ENTRY *ImageRecordCodeSectionEndLink;
+ LIST_ENTRY *ImageRecordCodeSectionList;
+
+ DEBUG ((DEBUG_VERBOSE, "SMM ImageCode SegmentCount - 0x%x\n", ImageRecord->CodeSegmentCount));
+
+ ImageRecordCodeSectionList = &ImageRecord->CodeSegmentList;
+
+ ImageRecordCodeSectionLink = ImageRecordCodeSectionList->ForwardLink;
+ ImageRecordCodeSectionEndLink = ImageRecordCodeSectionList;
+ LastImageRecordCodeSection = NULL;
+ while (ImageRecordCodeSectionLink != ImageRecordCodeSectionEndLink) {
+ ImageRecordCodeSection = CR (
+ ImageRecordCodeSectionLink,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+ Link,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+ );
+ if (ImageRecordCodeSection->CodeSegmentSize == 0) {
+ return FALSE;
+ }
+ if (ImageRecordCodeSection->CodeSegmentBase < ImageRecord->ImageBase) {
+ return FALSE;
+ }
+ if (ImageRecordCodeSection->CodeSegmentBase >= MAX_ADDRESS - ImageRecordCodeSection->CodeSegmentSize) {
+ return FALSE;
+ }
+ if ((ImageRecordCodeSection->CodeSegmentBase + ImageRecordCodeSection->CodeSegmentSize) > (ImageRecord->ImageBase + ImageRecord->ImageSize)) {
+ return FALSE;
+ }
+ if (LastImageRecordCodeSection != NULL) {
+ if ((LastImageRecordCodeSection->CodeSegmentBase + LastImageRecordCodeSection->CodeSegmentSize) > ImageRecordCodeSection->CodeSegmentBase) {
+ return FALSE;
+ }
+ }
+
+ LastImageRecordCodeSection = ImageRecordCodeSection;
+ ImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+ }
+
+ return TRUE;
+}
+
+/**
+ Swap two image records.
+
+ @param[in] FirstImageRecord first image record.
+ @param[in] SecondImageRecord second image record.
+**/
+STATIC
+VOID
+SwapImageRecord (
+ IN IMAGE_PROPERTIES_RECORD *FirstImageRecord,
+ IN IMAGE_PROPERTIES_RECORD *SecondImageRecord
+ )
+{
+ IMAGE_PROPERTIES_RECORD TempImageRecord;
+
+ TempImageRecord.ImageBase = FirstImageRecord->ImageBase;
+ TempImageRecord.ImageSize = FirstImageRecord->ImageSize;
+ TempImageRecord.CodeSegmentCount = FirstImageRecord->CodeSegmentCount;
+
+ FirstImageRecord->ImageBase = SecondImageRecord->ImageBase;
+ FirstImageRecord->ImageSize = SecondImageRecord->ImageSize;
+ FirstImageRecord->CodeSegmentCount = SecondImageRecord->CodeSegmentCount;
+
+ SecondImageRecord->ImageBase = TempImageRecord.ImageBase;
+ SecondImageRecord->ImageSize = TempImageRecord.ImageSize;
+ SecondImageRecord->CodeSegmentCount = TempImageRecord.CodeSegmentCount;
+
+ SwapListEntries (&FirstImageRecord->CodeSegmentList, &SecondImageRecord->CodeSegmentList);
+}
+
+/**
+ Sort image record based upon the ImageBase from low to high.
+**/
+STATIC
+VOID
+SortImageRecord (
+ VOID
+ )
+{
+ IMAGE_PROPERTIES_RECORD *ImageRecord;
+ IMAGE_PROPERTIES_RECORD *NextImageRecord;
+ LIST_ENTRY *ImageRecordLink;
+ LIST_ENTRY *NextImageRecordLink;
+ LIST_ENTRY *ImageRecordEndLink;
+ LIST_ENTRY *ImageRecordList;
+
+ ImageRecordList = &mImagePropertiesPrivateData.ImageRecordList;
+
+ ImageRecordLink = ImageRecordList->ForwardLink;
+ NextImageRecordLink = ImageRecordLink->ForwardLink;
+ ImageRecordEndLink = ImageRecordList;
+ while (ImageRecordLink != ImageRecordEndLink) {
+ ImageRecord = CR (
+ ImageRecordLink,
+ IMAGE_PROPERTIES_RECORD,
+ Link,
+ IMAGE_PROPERTIES_RECORD_SIGNATURE
+ );
+ while (NextImageRecordLink != ImageRecordEndLink) {
+ NextImageRecord = CR (
+ NextImageRecordLink,
+ IMAGE_PROPERTIES_RECORD,
+ Link,
+ IMAGE_PROPERTIES_RECORD_SIGNATURE
+ );
+ if (ImageRecord->ImageBase > NextImageRecord->ImageBase) {
+ SwapImageRecord (ImageRecord, NextImageRecord);
+ }
+ NextImageRecordLink = NextImageRecordLink->ForwardLink;
+ }
+
+ ImageRecordLink = ImageRecordLink->ForwardLink;
+ NextImageRecordLink = ImageRecordLink->ForwardLink;
+ }
+}
+
+/**
+ Dump image record.
+**/
+STATIC
+VOID
+DumpImageRecord (
+ VOID
+ )
+{
+ IMAGE_PROPERTIES_RECORD *ImageRecord;
+ LIST_ENTRY *ImageRecordLink;
+ LIST_ENTRY *ImageRecordList;
+ UINTN Index;
+
+ ImageRecordList = &mImagePropertiesPrivateData.ImageRecordList;
+
+ for (ImageRecordLink = ImageRecordList->ForwardLink, Index= 0;
+ ImageRecordLink != ImageRecordList;
+ ImageRecordLink = ImageRecordLink->ForwardLink, Index++) {
+ ImageRecord = CR (
+ ImageRecordLink,
+ IMAGE_PROPERTIES_RECORD,
+ Link,
+ IMAGE_PROPERTIES_RECORD_SIGNATURE
+ );
+ DEBUG ((DEBUG_VERBOSE, "SMM Image[%d]: 0x%016lx - 0x%016lx\n", Index, ImageRecord->ImageBase, ImageRecord->ImageSize));
+ }
+}
+
+/**
+ Insert image record.
+
+ @param[in] DriverEntry Driver information
+**/
+VOID
+SmmInsertImageRecord (
+ IN EFI_SMM_DRIVER_ENTRY *DriverEntry
+ )
+{
+ VOID *ImageAddress;
+ EFI_IMAGE_DOS_HEADER *DosHdr;
+ UINT32 PeCoffHeaderOffset;
+ UINT32 SectionAlignment;
+ EFI_IMAGE_SECTION_HEADER *Section;
+ EFI_IMAGE_OPTIONAL_HEADER_PTR_UNION Hdr;
+ UINT8 *Name;
+ UINTN Index;
+ IMAGE_PROPERTIES_RECORD *ImageRecord;
+ CHAR8 *PdbPointer;
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION *ImageRecordCodeSection;
+ UINT16 Magic;
+
+ DEBUG ((DEBUG_VERBOSE, "SMM InsertImageRecord - 0x%x\n", DriverEntry));
+ DEBUG ((DEBUG_VERBOSE, "SMM InsertImageRecord - 0x%016lx - 0x%08x\n", DriverEntry->ImageBuffer, DriverEntry->NumberOfPage));
+
+ ImageRecord = AllocatePool (sizeof(*ImageRecord));
+ if (ImageRecord == NULL) {
+ return ;
+ }
+ ImageRecord->Signature = IMAGE_PROPERTIES_RECORD_SIGNATURE;
+
+ DEBUG ((DEBUG_VERBOSE, "SMM ImageRecordCount - 0x%x\n", mImagePropertiesPrivateData.ImageRecordCount));
+
+ //
+ // Step 1: record whole region
+ //
+ ImageRecord->ImageBase = DriverEntry->ImageBuffer;
+ ImageRecord->ImageSize = EFI_PAGES_TO_SIZE(DriverEntry->NumberOfPage);
+
+ ImageAddress = (VOID *)(UINTN)DriverEntry->ImageBuffer;
+
+ PdbPointer = PeCoffLoaderGetPdbPointer ((VOID*) (UINTN) ImageAddress);
+ if (PdbPointer != NULL) {
+ DEBUG ((DEBUG_VERBOSE, "SMM Image - %a\n", PdbPointer));
+ }
+
+ //
+ // Check PE/COFF image
+ //
+ DosHdr = (EFI_IMAGE_DOS_HEADER *) (UINTN) ImageAddress;
+ PeCoffHeaderOffset = 0;
+ if (DosHdr->e_magic == EFI_IMAGE_DOS_SIGNATURE) {
+ PeCoffHeaderOffset = DosHdr->e_lfanew;
+ }
+
+ Hdr.Pe32 = (EFI_IMAGE_NT_HEADERS32 *)((UINT8 *) (UINTN) ImageAddress + PeCoffHeaderOffset);
+ if (Hdr.Pe32->Signature != EFI_IMAGE_NT_SIGNATURE) {
+ DEBUG ((DEBUG_VERBOSE, "SMM Hdr.Pe32->Signature invalid - 0x%x\n", Hdr.Pe32->Signature));
+ goto Finish;
+ }
+
+ //
+ // Get SectionAlignment
+ //
+ if (Hdr.Pe32->FileHeader.Machine == IMAGE_FILE_MACHINE_IA64 && Hdr.Pe32->OptionalHeader.Magic == EFI_IMAGE_NT_OPTIONAL_HDR32_MAGIC) {
+ //
+ // NOTE: Some versions of Linux ELILO for Itanium have an incorrect magic value
+ // in the PE/COFF Header. If the MachineType is Itanium(IA64) and the
+ // Magic value in the OptionalHeader is EFI_IMAGE_NT_OPTIONAL_HDR32_MAGIC
+ // then override the magic value to EFI_IMAGE_NT_OPTIONAL_HDR64_MAGIC
+ //
+ Magic = EFI_IMAGE_NT_OPTIONAL_HDR64_MAGIC;
+ } else {
+ //
+ // Get the magic value from the PE/COFF Optional Header
+ //
+ Magic = Hdr.Pe32->OptionalHeader.Magic;
+ }
+ if (Magic == EFI_IMAGE_NT_OPTIONAL_HDR32_MAGIC) {
+ SectionAlignment = Hdr.Pe32->OptionalHeader.SectionAlignment;
+ } else {
+ SectionAlignment = Hdr.Pe32Plus->OptionalHeader.SectionAlignment;
+ }
+
+ SetMemoryAttributesTableSectionAlignment (SectionAlignment);
+ if ((SectionAlignment & (EFI_ACPI_RUNTIME_PAGE_ALLOCATION_ALIGNMENT - 1)) != 0) {
+ DEBUG ((DEBUG_ERROR, "SMM !!!!!!!! InsertImageRecord - Section Alignment(0x%x) is not %dK !!!!!!!!\n",
+ SectionAlignment, EFI_ACPI_RUNTIME_PAGE_ALLOCATION_ALIGNMENT >> 10));
+ PdbPointer = PeCoffLoaderGetPdbPointer ((VOID*) (UINTN) ImageAddress);
+ if (PdbPointer != NULL) {
+ DEBUG ((DEBUG_ERROR, "SMM !!!!!!!! Image - %a !!!!!!!!\n", PdbPointer));
+ }
+ goto Finish;
+ }
+
+ Section = (EFI_IMAGE_SECTION_HEADER *) (
+ (UINT8 *) (UINTN) ImageAddress +
+ PeCoffHeaderOffset +
+ sizeof(UINT32) +
+ sizeof(EFI_IMAGE_FILE_HEADER) +
+ Hdr.Pe32->FileHeader.SizeOfOptionalHeader
+ );
+ ImageRecord->CodeSegmentCount = 0;
+ InitializeListHead (&ImageRecord->CodeSegmentList);
+ for (Index = 0; Index < Hdr.Pe32->FileHeader.NumberOfSections; Index++) {
+ Name = Section[Index].Name;
+ DEBUG ((
+ DEBUG_VERBOSE,
+ "SMM Section - '%c%c%c%c%c%c%c%c'\n",
+ Name[0],
+ Name[1],
+ Name[2],
+ Name[3],
+ Name[4],
+ Name[5],
+ Name[6],
+ Name[7]
+ ));
+
+ if ((Section[Index].Characteristics & EFI_IMAGE_SCN_CNT_CODE) != 0) {
+ DEBUG ((DEBUG_VERBOSE, "SMM VirtualSize - 0x%08x\n", Section[Index].Misc.VirtualSize));
+ DEBUG ((DEBUG_VERBOSE, "SMM VirtualAddress - 0x%08x\n", Section[Index].VirtualAddress));
+ DEBUG ((DEBUG_VERBOSE, "SMM SizeOfRawData - 0x%08x\n", Section[Index].SizeOfRawData));
+ DEBUG ((DEBUG_VERBOSE, "SMM PointerToRawData - 0x%08x\n", Section[Index].PointerToRawData));
+ DEBUG ((DEBUG_VERBOSE, "SMM PointerToRelocations - 0x%08x\n", Section[Index].PointerToRelocations));
+ DEBUG ((DEBUG_VERBOSE, "SMM PointerToLinenumbers - 0x%08x\n", Section[Index].PointerToLinenumbers));
+ DEBUG ((DEBUG_VERBOSE, "SMM NumberOfRelocations - 0x%08x\n", Section[Index].NumberOfRelocations));
+ DEBUG ((DEBUG_VERBOSE, "SMM NumberOfLinenumbers - 0x%08x\n", Section[Index].NumberOfLinenumbers));
+ DEBUG ((DEBUG_VERBOSE, "SMM Characteristics - 0x%08x\n", Section[Index].Characteristics));
+
+ //
+ // Step 2: record code section
+ //
+ ImageRecordCodeSection = AllocatePool (sizeof(*ImageRecordCodeSection));
+ if (ImageRecordCodeSection == NULL) {
+ return ;
+ }
+ ImageRecordCodeSection->Signature = IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE;
+
+ ImageRecordCodeSection->CodeSegmentBase = (UINTN)ImageAddress + Section[Index].VirtualAddress;
+ ImageRecordCodeSection->CodeSegmentSize = Section[Index].SizeOfRawData;
+
+ DEBUG ((DEBUG_VERBOSE, "SMM ImageCode: 0x%016lx - 0x%016lx\n", ImageRecordCodeSection->CodeSegmentBase, ImageRecordCodeSection->CodeSegmentSize));
+
+ InsertTailList (&ImageRecord->CodeSegmentList, &ImageRecordCodeSection->Link);
+ ImageRecord->CodeSegmentCount++;
+ }
+ }
+
+ if (ImageRecord->CodeSegmentCount == 0) {
+ SetMemoryAttributesTableSectionAlignment (1);
+ DEBUG ((DEBUG_ERROR, "SMM !!!!!!!! InsertImageRecord - CodeSegmentCount is 0 !!!!!!!!\n"));
+ PdbPointer = PeCoffLoaderGetPdbPointer ((VOID*) (UINTN) ImageAddress);
+ if (PdbPointer != NULL) {
+ DEBUG ((DEBUG_ERROR, "SMM !!!!!!!! Image - %a !!!!!!!!\n", PdbPointer));
+ }
+ goto Finish;
+ }
+
+ //
+ // Final
+ //
+ SortImageRecordCodeSection (ImageRecord);
+ //
+ // Check overlap all section in ImageBase/Size
+ //
+ if (!IsImageRecordCodeSectionValid (ImageRecord)) {
+ DEBUG ((DEBUG_ERROR, "SMM IsImageRecordCodeSectionValid - FAIL\n"));
+ goto Finish;
+ }
+
+ InsertTailList (&mImagePropertiesPrivateData.ImageRecordList, &ImageRecord->Link);
+ mImagePropertiesPrivateData.ImageRecordCount++;
+
+ SortImageRecord ();
+
+ if (mImagePropertiesPrivateData.CodeSegmentCountMax < ImageRecord->CodeSegmentCount) {
+ mImagePropertiesPrivateData.CodeSegmentCountMax = ImageRecord->CodeSegmentCount;
+ }
+
+Finish:
+ return ;
+}
+
+/**
+ Find image record accroding to image base and size.
+
+ @param[in] ImageBase Base of PE image
+ @param[in] ImageSize Size of PE image
+
+ @return image record
+**/
+STATIC
+IMAGE_PROPERTIES_RECORD *
+FindImageRecord (
+ IN EFI_PHYSICAL_ADDRESS ImageBase,
+ IN UINT64 ImageSize
+ )
+{
+ IMAGE_PROPERTIES_RECORD *ImageRecord;
+ LIST_ENTRY *ImageRecordLink;
+ LIST_ENTRY *ImageRecordList;
+
+ ImageRecordList = &mImagePropertiesPrivateData.ImageRecordList;
+
+ for (ImageRecordLink = ImageRecordList->ForwardLink;
+ ImageRecordLink != ImageRecordList;
+ ImageRecordLink = ImageRecordLink->ForwardLink) {
+ ImageRecord = CR (
+ ImageRecordLink,
+ IMAGE_PROPERTIES_RECORD,
+ Link,
+ IMAGE_PROPERTIES_RECORD_SIGNATURE
+ );
+
+ if ((ImageBase == ImageRecord->ImageBase) &&
+ (ImageSize == ImageRecord->ImageSize)) {
+ return ImageRecord;
+ }
+ }
+
+ return NULL;
+}
+
+/**
+ Remove Image record.
+
+ @param[in] DriverEntry Driver information
+**/
+VOID
+SmmRemoveImageRecord (
+ IN EFI_SMM_DRIVER_ENTRY *DriverEntry
+ )
+{
+ IMAGE_PROPERTIES_RECORD *ImageRecord;
+ LIST_ENTRY *CodeSegmentListHead;
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION *ImageRecordCodeSection;
+
+ DEBUG ((DEBUG_VERBOSE, "SMM RemoveImageRecord - 0x%x\n", DriverEntry));
+ DEBUG ((DEBUG_VERBOSE, "SMM RemoveImageRecord - 0x%016lx - 0x%016lx\n", DriverEntry->ImageBuffer, DriverEntry->NumberOfPage));
+
+ ImageRecord = FindImageRecord (DriverEntry->ImageBuffer, EFI_PAGES_TO_SIZE(DriverEntry->NumberOfPage));
+ if (ImageRecord == NULL) {
+ DEBUG ((DEBUG_ERROR, "SMM !!!!!!!! ImageRecord not found !!!!!!!!\n"));
+ return ;
+ }
+
+ CodeSegmentListHead = &ImageRecord->CodeSegmentList;
+ while (!IsListEmpty (CodeSegmentListHead)) {
+ ImageRecordCodeSection = CR (
+ CodeSegmentListHead->ForwardLink,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+ Link,
+ IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+ );
+ RemoveEntryList (&ImageRecordCodeSection->Link);
+ FreePool (ImageRecordCodeSection);
+ }
+
+ RemoveEntryList (&ImageRecord->Link);
+ FreePool (ImageRecord);
+ mImagePropertiesPrivateData.ImageRecordCount--;
+}
+
+/**
+ Publish MemoryAttributesTable to SMM configuration table.
+**/
+VOID
+PublishMemoryAttributesTable (
+ VOID
+ )
+{
+ UINTN MemoryMapSize;
+ EFI_MEMORY_DESCRIPTOR *MemoryMap;
+ UINTN MapKey;
+ UINTN DescriptorSize;
+ UINT32 DescriptorVersion;
+ UINTN Index;
+ EFI_STATUS Status;
+ UINTN RuntimeEntryCount;
+ EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE *MemoryAttributesTable;
+ EFI_MEMORY_DESCRIPTOR *MemoryAttributesEntry;
+ UINTN MemoryAttributesTableSize;
+
+ MemoryMapSize = 0;
+ MemoryMap = NULL;
+ Status = SmmCoreGetMemoryMapMemoryAttributesTable (
+ &MemoryMapSize,
+ MemoryMap,
+ &MapKey,
+ &DescriptorSize,
+ &DescriptorVersion
+ );
+ ASSERT (Status == EFI_BUFFER_TOO_SMALL);
+
+ do {
+ DEBUG ((DEBUG_INFO, "MemoryMapSize - 0x%x\n", MemoryMapSize));
+ MemoryMap = AllocatePool (MemoryMapSize);
+ ASSERT (MemoryMap != NULL);
+ DEBUG ((DEBUG_INFO, "MemoryMap - 0x%x\n", MemoryMap));
+
+ Status = SmmCoreGetMemoryMapMemoryAttributesTable (
+ &MemoryMapSize,
+ MemoryMap,
+ &MapKey,
+ &DescriptorSize,
+ &DescriptorVersion
+ );
+ if (EFI_ERROR (Status)) {
+ FreePool (MemoryMap);
+ }
+ } while (Status == EFI_BUFFER_TOO_SMALL);
+
+ //
+ // Allocate MemoryAttributesTable
+ //
+ RuntimeEntryCount = MemoryMapSize/DescriptorSize;
+ MemoryAttributesTableSize = sizeof(EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE) + DescriptorSize * RuntimeEntryCount;
+ MemoryAttributesTable = AllocatePool (sizeof(EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE) + DescriptorSize * RuntimeEntryCount);
+ ASSERT (MemoryAttributesTable != NULL);
+ MemoryAttributesTable->Version = EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE_VERSION;
+ MemoryAttributesTable->NumberOfEntries = (UINT32)RuntimeEntryCount;
+ MemoryAttributesTable->DescriptorSize = (UINT32)DescriptorSize;
+ MemoryAttributesTable->Reserved = 0;
+ DEBUG ((DEBUG_INFO, "MemoryAttributesTable:\n"));
+ DEBUG ((DEBUG_INFO, " Version - 0x%08x\n", MemoryAttributesTable->Version));
+ DEBUG ((DEBUG_INFO, " NumberOfEntries - 0x%08x\n", MemoryAttributesTable->NumberOfEntries));
+ DEBUG ((DEBUG_INFO, " DescriptorSize - 0x%08x\n", MemoryAttributesTable->DescriptorSize));
+ MemoryAttributesEntry = (EFI_MEMORY_DESCRIPTOR *)(MemoryAttributesTable + 1);
+ for (Index = 0; Index < MemoryMapSize/DescriptorSize; Index++) {
+ CopyMem (MemoryAttributesEntry, MemoryMap, DescriptorSize);
+ DEBUG ((DEBUG_INFO, "Entry (0x%x)\n", MemoryAttributesEntry));
+ DEBUG ((DEBUG_INFO, " Type - 0x%x\n", MemoryAttributesEntry->Type));
+ DEBUG ((DEBUG_INFO, " PhysicalStart - 0x%016lx\n", MemoryAttributesEntry->PhysicalStart));
+ DEBUG ((DEBUG_INFO, " VirtualStart - 0x%016lx\n", MemoryAttributesEntry->VirtualStart));
+ DEBUG ((DEBUG_INFO, " NumberOfPages - 0x%016lx\n", MemoryAttributesEntry->NumberOfPages));
+ DEBUG ((DEBUG_INFO, " Attribute - 0x%016lx\n", MemoryAttributesEntry->Attribute));
+ MemoryAttributesEntry = NEXT_MEMORY_DESCRIPTOR(MemoryAttributesEntry, DescriptorSize);
+
+ MemoryMap = NEXT_MEMORY_DESCRIPTOR(MemoryMap, DescriptorSize);
+ }
+
+ Status = gSmst->SmmInstallConfigurationTable (gSmst, &gEdkiiPiSmmMemoryAttributesTableGuid, MemoryAttributesTable, MemoryAttributesTableSize);
+ ASSERT_EFI_ERROR (Status);
+}
+
+/**
+ This function returns if image is inside SMRAM.
+
+ @param[in] LoadedImage LoadedImage protocol instance for an image.
+
+ @retval TRUE the image is inside SMRAM.
+ @retval FALSE the image is outside SMRAM.
+**/
+BOOLEAN
+IsImageInsideSmram (
+ IN EFI_LOADED_IMAGE_PROTOCOL *LoadedImage
+ )
+{
+ UINTN Index;
+
+ for (Index = 0; Index < mFullSmramRangeCount; Index++) {
+ if ((mFullSmramRanges[Index].PhysicalStart <= (UINTN)LoadedImage->ImageBase)&&
+ (mFullSmramRanges[Index].PhysicalStart + mFullSmramRanges[Index].PhysicalSize >= (UINTN)LoadedImage->ImageBase + LoadedImage->ImageSize)) {
+ return TRUE;
+ }
+ }
+
+ return FALSE;
+}
+
+/**
+ This function installs all SMM image record information.
+**/
+VOID
+SmmInstallImageRecord (
+ VOID
+ )
+{
+ EFI_STATUS Status;
+ UINTN NoHandles;
+ EFI_HANDLE *HandleBuffer;
+ EFI_LOADED_IMAGE_PROTOCOL *LoadedImage;
+ UINTN Index;
+ EFI_SMM_DRIVER_ENTRY DriverEntry;
+
+ Status = SmmLocateHandleBuffer (
+ ByProtocol,
+ &gEfiLoadedImageProtocolGuid,
+ NULL,
+ &NoHandles,
+ &HandleBuffer
+ );
+ if (EFI_ERROR (Status)) {
+ return ;
+ }
+
+ for (Index = 0; Index < NoHandles; Index++) {
+ Status = gSmst->SmmHandleProtocol (
+ HandleBuffer[Index],
+ &gEfiLoadedImageProtocolGuid,
+ (VOID **)&LoadedImage
+ );
+ if (EFI_ERROR (Status)) {
+ continue;
+ }
+ DEBUG ((DEBUG_VERBOSE, "LoadedImage - 0x%x 0x%x ", LoadedImage->ImageBase, LoadedImage->ImageSize));
+ {
+ VOID *PdbPointer;
+ PdbPointer = PeCoffLoaderGetPdbPointer (LoadedImage->ImageBase);
+ if (PdbPointer != NULL) {
+ DEBUG ((DEBUG_VERBOSE, "(%a) ", PdbPointer));
+ }
+ }
+ DEBUG ((DEBUG_VERBOSE, "\n"));
+ ZeroMem (&DriverEntry, sizeof(DriverEntry));
+ DriverEntry.ImageBuffer = (UINTN)LoadedImage->ImageBase;
+ DriverEntry.NumberOfPage = EFI_SIZE_TO_PAGES((UINTN)LoadedImage->ImageSize);
+ SmmInsertImageRecord (&DriverEntry);
+ }
+
+ FreePool (HandleBuffer);
+}
+
+/**
+ Install MemoryAttributesTable.
+
+ @param[in] Protocol Points to the protocol's unique identifier.
+ @param[in] Interface Points to the interface instance.
+ @param[in] Handle The handle on which the interface was installed.
+
+ @retval EFI_SUCCESS Notification runs successfully.
+**/
+EFI_STATUS
+EFIAPI
+SmmInstallMemoryAttributesTable (
+ IN CONST EFI_GUID *Protocol,
+ IN VOID *Interface,
+ IN EFI_HANDLE Handle
+ )
+{
+ SmmInstallImageRecord ();
+
+ DEBUG ((DEBUG_INFO, "SMM MemoryProtectionAttribute - 0x%016lx\n", mMemoryProtectionAttribute));
+ if ((mMemoryProtectionAttribute & EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA) == 0) {
+ return EFI_SUCCESS;
+ }
+
+ DEBUG ((DEBUG_VERBOSE, "SMM Total Image Count - 0x%x\n", mImagePropertiesPrivateData.ImageRecordCount));
+ DEBUG ((DEBUG_VERBOSE, "SMM Dump ImageRecord:\n"));
+ DumpImageRecord ();
+
+ PublishMemoryAttributesTable ();
+
+ return EFI_SUCCESS;
+}
+
+/**
+ Initialize MemoryAttributesTable support.
+**/
+VOID
+EFIAPI
+SmmCoreInitializeMemoryAttributesTable (
+ VOID
+ )
+{
+ EFI_STATUS Status;
+ VOID *Registration;
+
+ Status = gSmst->SmmRegisterProtocolNotify (
+ &gEfiSmmEndOfDxeProtocolGuid,
+ SmmInstallMemoryAttributesTable,
+ &Registration
+ );
+ ASSERT_EFI_ERROR (Status);
+
+ return ;
+}
diff --git a/MdeModulePkg/Core/PiSmmCore/Page.c b/MdeModulePkg/Core/PiSmmCore/Page.c
index 5c04e8c..5f19d7e 100644
--- a/MdeModulePkg/Core/PiSmmCore/Page.c
+++ b/MdeModulePkg/Core/PiSmmCore/Page.c
@@ -2,22 +2,572 @@
SMM Memory page management functions.
Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
- This program and the accompanying materials are licensed and made available
- under the terms and conditions of the BSD License which accompanies this
- distribution. The full text of the license may be found at
- http://opensource.org/licenses/bsd-license.php
+ This program and the accompanying materials are licensed and made available
+ under the terms and conditions of the BSD License which accompanies this
+ distribution. The full text of the license may be found at
+ http://opensource.org/licenses/bsd-license.php
- THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
- WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+ THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
**/
#include "PiSmmCore.h"
+#include <Library/SmmServicesTableLib.h>
#define TRUNCATE_TO_PAGES(a) ((a) >> EFI_PAGE_SHIFT)
LIST_ENTRY mSmmMemoryMap = INITIALIZE_LIST_HEAD_VARIABLE (mSmmMemoryMap);
+//
+// For GetMemoryMap()
+//
+
+#define MEMORY_MAP_SIGNATURE SIGNATURE_32('m','m','a','p')
+typedef struct {
+ UINTN Signature;
+ LIST_ENTRY Link;
+
+ BOOLEAN FromStack;
+ EFI_MEMORY_TYPE Type;
+ UINT64 Start;
+ UINT64 End;
+
+} MEMORY_MAP;
+
+LIST_ENTRY gMemoryMap = INITIALIZE_LIST_HEAD_VARIABLE (gMemoryMap);
+
+
+#define MAX_MAP_DEPTH 6
+
+///
+/// mMapDepth - depth of new descriptor stack
+///
+UINTN mMapDepth = 0;
+///
+/// mMapStack - space to use as temp storage to build new map descriptors
+///
+MEMORY_MAP mMapStack[MAX_MAP_DEPTH];
+UINTN mFreeMapStack = 0;
+///
+/// This list maintain the free memory map list
+///
+LIST_ENTRY mFreeMemoryMapEntryList = INITIALIZE_LIST_HEAD_VARIABLE (mFreeMemoryMapEntryList);
+
+/**
+ Allocates pages from the memory map.
+
+ @param[in] Type The type of allocation to perform.
+ @param[in] MemoryType The type of memory to turn the allocated pages
+ into.
+ @param[in] NumberOfPages The number of pages to allocate.
+ @param[out] Memory A pointer to receive the base allocated memory
+ address.
+ @param[in] AddRegion If this memory is new added region.
+
+ @retval EFI_INVALID_PARAMETER Parameters violate checking rules defined in spec.
+ @retval EFI_NOT_FOUND Could not allocate pages match the requirement.
+ @retval EFI_OUT_OF_RESOURCES No enough pages to allocate.
+ @retval EFI_SUCCESS Pages successfully allocated.
+
+**/
+EFI_STATUS
+SmmInternalAllocatePagesEx (
+ IN EFI_ALLOCATE_TYPE Type,
+ IN EFI_MEMORY_TYPE MemoryType,
+ IN UINTN NumberOfPages,
+ OUT EFI_PHYSICAL_ADDRESS *Memory,
+ IN BOOLEAN AddRegion
+ );
+
+/**
+ Internal function. Deque a descriptor entry from the mFreeMemoryMapEntryList.
+ If the list is emtry, then allocate a new page to refuel the list.
+ Please Note this algorithm to allocate the memory map descriptor has a property
+ that the memory allocated for memory entries always grows, and will never really be freed.
+
+ @return The Memory map descriptor dequed from the mFreeMemoryMapEntryList
+
+**/
+MEMORY_MAP *
+AllocateMemoryMapEntry (
+ VOID
+ )
+{
+ EFI_PHYSICAL_ADDRESS Mem;
+ EFI_STATUS Status;
+ MEMORY_MAP* FreeDescriptorEntries;
+ MEMORY_MAP* Entry;
+ UINTN Index;
+
+ //DEBUG((DEBUG_INFO, "AllocateMemoryMapEntry\n"));
+
+ if (IsListEmpty (&mFreeMemoryMapEntryList)) {
+ //DEBUG((DEBUG_INFO, "mFreeMemoryMapEntryList is empty\n"));
+ //
+ // The list is empty, to allocate one page to refuel the list
+ //
+ Status = SmmInternalAllocatePagesEx (
+ AllocateAnyPages,
+ EfiRuntimeServicesData,
+ EFI_SIZE_TO_PAGES(DEFAULT_PAGE_ALLOCATION),
+ &Mem,
+ TRUE
+ );
+ ASSERT_EFI_ERROR (Status);
+ if(!EFI_ERROR (Status)) {
+ FreeDescriptorEntries = (MEMORY_MAP *)(UINTN)Mem;
+ //DEBUG((DEBUG_INFO, "New FreeDescriptorEntries - 0x%x\n", FreeDescriptorEntries));
+ //
+ // Enque the free memmory map entries into the list
+ //
+ for (Index = 0; Index< DEFAULT_PAGE_ALLOCATION / sizeof(MEMORY_MAP); Index++) {
+ FreeDescriptorEntries[Index].Signature = MEMORY_MAP_SIGNATURE;
+ InsertTailList (&mFreeMemoryMapEntryList, &FreeDescriptorEntries[Index].Link);
+ }
+ } else {
+ return NULL;
+ }
+ }
+ //
+ // dequeue the first descriptor from the list
+ //
+ Entry = CR (mFreeMemoryMapEntryList.ForwardLink, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+ RemoveEntryList (&Entry->Link);
+
+ return Entry;
+}
+
+
+/**
+ Internal function. Moves any memory descriptors that are on the
+ temporary descriptor stack to heap.
+
+**/
+VOID
+CoreFreeMemoryMapStack (
+ VOID
+ )
+{
+ MEMORY_MAP *Entry;
+
+ //
+ // If already freeing the map stack, then return
+ //
+ if (mFreeMapStack != 0) {
+ ASSERT (FALSE);
+ return ;
+ }
+
+ //
+ // Move the temporary memory descriptor stack into pool
+ //
+ mFreeMapStack += 1;
+
+ while (mMapDepth != 0) {
+ //
+ // Deque an memory map entry from mFreeMemoryMapEntryList
+ //
+ Entry = AllocateMemoryMapEntry ();
+ ASSERT (Entry);
+
+ //
+ // Update to proper entry
+ //
+ mMapDepth -= 1;
+
+ if (mMapStack[mMapDepth].Link.ForwardLink != NULL) {
+
+ CopyMem (Entry , &mMapStack[mMapDepth], sizeof (MEMORY_MAP));
+ Entry->FromStack = FALSE;
+
+ //
+ // Move this entry to general memory
+ //
+ InsertTailList (&mMapStack[mMapDepth].Link, &Entry->Link);
+ RemoveEntryList (&mMapStack[mMapDepth].Link);
+ mMapStack[mMapDepth].Link.ForwardLink = NULL;
+ }
+ }
+
+ mFreeMapStack -= 1;
+}
+
+/**
+ Insert new entry from memory map.
+
+ @param[in] Link The old memory map entry to be linked.
+ @param[in] Start The start address of new memory map entry.
+ @param[in] End The end address of new memory map entry.
+ @param[in] Type The type of new memory map entry.
+ @param[in] Next If new entry is inserted to the next of old entry.
+ @param[in] AddRegion If this memory is new added region.
+**/
+VOID
+InsertNewEntry (
+ IN LIST_ENTRY *Link,
+ IN UINT64 Start,
+ IN UINT64 End,
+ IN EFI_MEMORY_TYPE Type,
+ IN BOOLEAN Next,
+ IN BOOLEAN AddRegion
+ )
+{
+ MEMORY_MAP *Entry;
+
+ Entry = &mMapStack[mMapDepth];
+ mMapDepth += 1;
+ ASSERT (mMapDepth < MAX_MAP_DEPTH);
+ Entry->FromStack = TRUE;
+
+ Entry->Signature = MEMORY_MAP_SIGNATURE;
+ Entry->Type = Type;
+ Entry->Start = Start;
+ Entry->End = End;
+ if (Next) {
+ InsertHeadList (Link, &Entry->Link);
+ } else {
+ InsertTailList (Link, &Entry->Link);
+ }
+}
+
+/**
+ Remove old entry from memory map.
+
+ @param[in] Entry Memory map entry to be removed.
+**/
+VOID
+RemoveOldEntry (
+ IN MEMORY_MAP *Entry
+ )
+{
+ RemoveEntryList (&Entry->Link);
+ if (!Entry->FromStack) {
+ InsertTailList (&mFreeMemoryMapEntryList, &Entry->Link);
+ }
+}
+
+/**
+ Update SMM memory map entry.
+
+ @param[in] Type The type of allocation to perform.
+ @param[in] Memory The base of memory address.
+ @param[in] NumberOfPages The number of pages to allocate.
+ @param[in] AddRegion If this memory is new added region.
+**/
+VOID
+ConvertSmmMemoryMapEntry (
+ IN EFI_MEMORY_TYPE Type,
+ IN EFI_PHYSICAL_ADDRESS Memory,
+ IN UINTN NumberOfPages,
+ IN BOOLEAN AddRegion
+ )
+{
+ LIST_ENTRY *Link;
+ MEMORY_MAP *Entry;
+ MEMORY_MAP *NextEntry;
+ LIST_ENTRY *NextLink;
+ MEMORY_MAP *PreviousEntry;
+ LIST_ENTRY *PreviousLink;
+ EFI_PHYSICAL_ADDRESS Start;
+ EFI_PHYSICAL_ADDRESS End;
+
+ Start = Memory;
+ End = Memory + EFI_PAGES_TO_SIZE(NumberOfPages) - 1;
+
+ //
+ // Exclude memory region
+ //
+ Link = gMemoryMap.ForwardLink;
+ while (Link != &gMemoryMap) {
+ Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+ Link = Link->ForwardLink;
+
+ //
+ // ---------------------------------------------------
+ // | +----------+ +------+ +------+ +------+ |
+ // ---|gMemoryMep|---|Entry1|---|Entry2|---|Entry3|---
+ // +----------+ ^ +------+ +------+ +------+
+ // |
+ // +------+
+ // |EntryX|
+ // +------+
+ //
+ if (Entry->Start > End) {
+ if ((Entry->Start == End + 1) && (Entry->Type == Type)) {
+ Entry->Start = Start;
+ return ;
+ }
+ InsertNewEntry (
+ &Entry->Link,
+ Start,
+ End,
+ Type,
+ FALSE,
+ AddRegion
+ );
+ return ;
+ }
+
+ if ((Entry->Start <= Start) && (Entry->End >= End)) {
+ if (Entry->Type != Type) {
+ if (Entry->Start < Start) {
+ //
+ // ---------------------------------------------------
+ // | +----------+ +------+ +------+ +------+ |
+ // ---|gMemoryMep|---|Entry1|---|EntryX|---|Entry3|---
+ // +----------+ +------+ ^ +------+ +------+
+ // |
+ // +------+
+ // |EntryA|
+ // +------+
+ //
+ InsertNewEntry (
+ &Entry->Link,
+ Entry->Start,
+ Start - 1,
+ Entry->Type,
+ FALSE,
+ AddRegion
+ );
+ }
+ if (Entry->End > End) {
+ //
+ // ---------------------------------------------------
+ // | +----------+ +------+ +------+ +------+ |
+ // ---|gMemoryMep|---|Entry1|---|EntryX|---|Entry3|---
+ // +----------+ +------+ +------+ ^ +------+
+ // |
+ // +------+
+ // |EntryZ|
+ // +------+
+ //
+ InsertNewEntry (
+ &Entry->Link,
+ End + 1,
+ Entry->End,
+ Entry->Type,
+ TRUE,
+ AddRegion
+ );
+ }
+ //
+ // Update this node
+ //
+ Entry->Start = Start;
+ Entry->End = End;
+ Entry->Type = Type;
+
+ //
+ // Check adjacent
+ //
+ NextLink = Entry->Link.ForwardLink;
+ if (NextLink != &gMemoryMap) {
+ NextEntry = CR (NextLink, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+ //
+ // ---------------------------------------------------
+ // | +----------+ +------+ +-----------------+ |
+ // ---|gMemoryMep|---|Entry1|---|EntryX Entry3|---
+ // +----------+ +------+ +-----------------+
+ //
+ if ((Entry->Type == NextEntry->Type) && (Entry->End + 1 == NextEntry->Start)) {
+ Entry->End = NextEntry->End;
+ RemoveOldEntry (NextEntry);
+ }
+ }
+ PreviousLink = Entry->Link.BackLink;
+ if (PreviousLink != &gMemoryMap) {
+ PreviousEntry = CR (PreviousLink, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+ //
+ // ---------------------------------------------------
+ // | +----------+ +-----------------+ +------+ |
+ // ---|gMemoryMep|---|Entry1 EntryX|---|Entry3|---
+ // +----------+ +-----------------+ +------+
+ //
+ if ((PreviousEntry->Type == Entry->Type) && (PreviousEntry->End + 1 == Entry->Start)) {
+ PreviousEntry->End = Entry->End;
+ RemoveOldEntry (Entry);
+ }
+ }
+ }
+ return ;
+ }
+ }
+
+ //
+ // ---------------------------------------------------
+ // | +----------+ +------+ +------+ +------+ |
+ // ---|gMemoryMep|---|Entry1|---|Entry2|---|Entry3|---
+ // +----------+ +------+ +------+ +------+ ^
+ // |
+ // +------+
+ // |EntryX|
+ // +------+
+ //
+ Link = gMemoryMap.BackLink;
+ if (Link != &gMemoryMap) {
+ Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+ if ((Entry->End + 1 == Start) && (Entry->Type == Type)) {
+ Entry->End = End;
+ return ;
+ }
+ }
+ InsertNewEntry (
+ &gMemoryMap,
+ Start,
+ End,
+ Type,
+ FALSE,
+ AddRegion
+ );
+ return ;
+}
+
+/**
+ Return the count of Smm memory map entry.
+
+ @return The count of Smm memory map entry.
+**/
+UINTN
+GetSmmMemoryMapEntryCount (
+ VOID
+ )
+{
+ LIST_ENTRY *Link;
+ UINTN Count;
+
+ Count = 0;
+ Link = gMemoryMap.ForwardLink;
+ while (Link != &gMemoryMap) {
+ Link = Link->ForwardLink;
+ Count++;
+ }
+ return Count;
+}
+
+/**
+ Dump Smm memory map entry.
+**/
+VOID
+DumpSmmMemoryMapEntry (
+ VOID
+ )
+{
+ LIST_ENTRY *Link;
+ MEMORY_MAP *Entry;
+ EFI_PHYSICAL_ADDRESS Last;
+
+ Last = 0;
+ DEBUG ((DEBUG_INFO, "DumpSmmMemoryMapEntry:\n"));
+ Link = gMemoryMap.ForwardLink;
+ while (Link != &gMemoryMap) {
+ Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+ Link = Link->ForwardLink;
+
+ if ((Last != 0) && (Last != (UINT64)-1)) {
+ if (Last + 1 != Entry->Start) {
+ Last = (UINT64)-1;
+ } else {
+ Last = Entry->End;
+ }
+ } else if (Last == 0) {
+ Last = Entry->End;
+ }
+
+ DEBUG ((DEBUG_INFO, "Entry (Link - 0x%x)\n", &Entry->Link));
+ DEBUG ((DEBUG_INFO, " Signature - 0x%x\n", Entry->Signature));
+ DEBUG ((DEBUG_INFO, " Link.ForwardLink - 0x%x\n", Entry->Link.ForwardLink));
+ DEBUG ((DEBUG_INFO, " Link.BackLink - 0x%x\n", Entry->Link.BackLink));
+ DEBUG ((DEBUG_INFO, " Type - 0x%x\n", Entry->Type));
+ DEBUG ((DEBUG_INFO, " Start - 0x%016lx\n", Entry->Start));
+ DEBUG ((DEBUG_INFO, " End - 0x%016lx\n", Entry->End));
+ }
+
+ ASSERT (Last != (UINT64)-1);
+}
+
+/**
+ Dump Smm memory map.
+**/
+VOID
+DumpSmmMemoryMap (
+ VOID
+ )
+{
+ LIST_ENTRY *Node;
+ FREE_PAGE_LIST *Pages;
+
+ DEBUG ((DEBUG_INFO, "DumpSmmMemoryMap\n"));
+
+ Pages = NULL;
+ Node = mSmmMemoryMap.ForwardLink;
+ while (Node != &mSmmMemoryMap) {
+ Pages = BASE_CR (Node, FREE_PAGE_LIST, Link);
+ DEBUG ((DEBUG_INFO, "Pages - 0x%x\n", Pages));
+ DEBUG ((DEBUG_INFO, "Pages->NumberOfPages - 0x%x\n", Pages->NumberOfPages));
+ Node = Node->ForwardLink;
+ }
+}
+
+/**
+ Check if a Smm base~length is in Smm memory map.
+
+ @param[in] Base The base address of Smm memory to be checked.
+ @param[in] Length THe length of Smm memory to be checked.
+
+ @retval TRUE Smm base~length is in smm memory map.
+ @retval FALSE Smm base~length is in smm memory map.
+**/
+BOOLEAN
+SmmMemoryMapConsistencyCheckRange (
+ IN EFI_PHYSICAL_ADDRESS Base,
+ IN UINTN Length
+ )
+{
+ LIST_ENTRY *Link;
+ MEMORY_MAP *Entry;
+ BOOLEAN Result;
+
+ Result = FALSE;
+ Link = gMemoryMap.ForwardLink;
+ while (Link != &gMemoryMap) {
+ Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+ Link = Link->ForwardLink;
+
+ if (Entry->Type != EfiConventionalMemory) {
+ continue;
+ }
+ if (Entry->Start == Base && Entry->End == Base + Length - 1) {
+ Result = TRUE;
+ break;
+ }
+ }
+
+ return Result;
+}
+
+/**
+ Check the consistency of Smm memory map.
+**/
+VOID
+SmmMemoryMapConsistencyCheck (
+ VOID
+ )
+{
+ LIST_ENTRY *Node;
+ FREE_PAGE_LIST *Pages;
+ BOOLEAN Result;
+
+ Pages = NULL;
+ Node = mSmmMemoryMap.ForwardLink;
+ while (Node != &mSmmMemoryMap) {
+ Pages = BASE_CR (Node, FREE_PAGE_LIST, Link);
+ Result = SmmMemoryMapConsistencyCheckRange ((EFI_PHYSICAL_ADDRESS)(UINTN)Pages, (UINTN)EFI_PAGES_TO_SIZE(Pages->NumberOfPages));
+ ASSERT (Result);
+ Node = Node->ForwardLink;
+ }
+}
+
/**
Internal Function. Allocate n pages from given free page node.
@@ -131,12 +681,13 @@ InternalAllocAddress (
/**
Allocates pages from the memory map.
- @param Type The type of allocation to perform.
- @param MemoryType The type of memory to turn the allocated pages
- into.
- @param NumberOfPages The number of pages to allocate.
- @param Memory A pointer to receive the base allocated memory
- address.
+ @param[in] Type The type of allocation to perform.
+ @param[in] MemoryType The type of memory to turn the allocated pages
+ into.
+ @param[in] NumberOfPages The number of pages to allocate.
+ @param[out] Memory A pointer to receive the base allocated memory
+ address.
+ @param[in] AddRegion If this memory is new added region.
@retval EFI_INVALID_PARAMETER Parameters violate checking rules defined in spec.
@retval EFI_NOT_FOUND Could not allocate pages match the requirement.
@@ -145,12 +696,12 @@ InternalAllocAddress (
**/
EFI_STATUS
-EFIAPI
-SmmInternalAllocatePages (
+SmmInternalAllocatePagesEx (
IN EFI_ALLOCATE_TYPE Type,
IN EFI_MEMORY_TYPE MemoryType,
IN UINTN NumberOfPages,
- OUT EFI_PHYSICAL_ADDRESS *Memory
+ OUT EFI_PHYSICAL_ADDRESS *Memory,
+ IN BOOLEAN AddRegion
)
{
UINTN RequestedAddress;
@@ -179,7 +730,7 @@ SmmInternalAllocatePages (
);
if (*Memory == (UINTN)-1) {
return EFI_OUT_OF_RESOURCES;
- }
+ }
break;
case AllocateAddress:
*Memory = InternalAllocAddress (
@@ -194,12 +745,49 @@ SmmInternalAllocatePages (
default:
return EFI_INVALID_PARAMETER;
}
+
+ //
+ // Update SmmMemoryMap here.
+ //
+ ConvertSmmMemoryMapEntry (MemoryType, *Memory, NumberOfPages, AddRegion);
+ if (!AddRegion) {
+ CoreFreeMemoryMapStack();
+ }
+
return EFI_SUCCESS;
}
/**
Allocates pages from the memory map.
+ @param[in] Type The type of allocation to perform.
+ @param[in] MemoryType The type of memory to turn the allocated pages
+ into.
+ @param[in] NumberOfPages The number of pages to allocate.
+ @param[out] Memory A pointer to receive the base allocated memory
+ address.
+
+ @retval EFI_INVALID_PARAMETER Parameters violate checking rules defined in spec.
+ @retval EFI_NOT_FOUND Could not allocate pages match the requirement.
+ @retval EFI_OUT_OF_RESOURCES No enough pages to allocate.
+ @retval EFI_SUCCESS Pages successfully allocated.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmInternalAllocatePages (
+ IN EFI_ALLOCATE_TYPE Type,
+ IN EFI_MEMORY_TYPE MemoryType,
+ IN UINTN NumberOfPages,
+ OUT EFI_PHYSICAL_ADDRESS *Memory
+ )
+{
+ return SmmInternalAllocatePagesEx (Type, MemoryType, NumberOfPages, Memory, FALSE);
+}
+
+/**
+ Allocates pages from the memory map.
+
@param Type The type of allocation to perform.
@param MemoryType The type of memory to turn the allocated pages
into.
@@ -268,8 +856,9 @@ InternalMergeNodes (
/**
Frees previous allocated pages.
- @param Memory Base address of memory being freed.
- @param NumberOfPages The number of pages to free.
+ @param[in] Memory Base address of memory being freed.
+ @param[in] NumberOfPages The number of pages to free.
+ @param[in] AddRegion If this memory is new added region.
@retval EFI_NOT_FOUND Could not find the entry that covers the range.
@retval EFI_INVALID_PARAMETER Address not aligned.
@@ -277,10 +866,10 @@ InternalMergeNodes (
**/
EFI_STATUS
-EFIAPI
-SmmInternalFreePages (
+SmmInternalFreePagesEx (
IN EFI_PHYSICAL_ADDRESS Memory,
- IN UINTN NumberOfPages
+ IN UINTN NumberOfPages,
+ IN BOOLEAN AddRegion
)
{
LIST_ENTRY *Node;
@@ -326,12 +915,41 @@ SmmInternalFreePages (
InternalMergeNodes (Pages);
}
+ //
+ // Update SmmMemoryMap here.
+ //
+ ConvertSmmMemoryMapEntry (EfiConventionalMemory, Memory, NumberOfPages, AddRegion);
+ if (!AddRegion) {
+ CoreFreeMemoryMapStack();
+ }
+
return EFI_SUCCESS;
}
/**
Frees previous allocated pages.
+ @param[in] Memory Base address of memory being freed.
+ @param[in] NumberOfPages The number of pages to free.
+
+ @retval EFI_NOT_FOUND Could not find the entry that covers the range.
+ @retval EFI_INVALID_PARAMETER Address not aligned.
+ @return EFI_SUCCESS Pages successfully freed.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmInternalFreePages (
+ IN EFI_PHYSICAL_ADDRESS Memory,
+ IN UINTN NumberOfPages
+ )
+{
+ return SmmInternalFreePagesEx (Memory, NumberOfPages, FALSE);
+}
+
+/**
+ Frees previous allocated pages.
+
@param Memory Base address of memory being freed.
@param NumberOfPages The number of pages to free.
@@ -383,16 +1001,121 @@ SmmAddMemoryRegion (
UINTN AlignedMemBase;
//
- // Do not add memory regions that is already allocated, needs testing, or needs ECC initialization
+ // Add EfiRuntimeServicesData for memory regions that is already allocated, needs testing, or needs ECC initialization
//
if ((Attributes & (EFI_ALLOCATED | EFI_NEEDS_TESTING | EFI_NEEDS_ECC_INITIALIZATION)) != 0) {
- return;
+ Type = EfiRuntimeServicesData;
+ } else {
+ Type = EfiConventionalMemory;
}
-
+
+ DEBUG ((DEBUG_INFO, "SmmAddMemoryRegion\n"));
+ DEBUG ((DEBUG_INFO, " MemBase - 0x%lx\n", MemBase));
+ DEBUG ((DEBUG_INFO, " MemLength - 0x%lx\n", MemLength));
+ DEBUG ((DEBUG_INFO, " Type - 0x%x\n", Type));
+ DEBUG ((DEBUG_INFO, " Attributes - 0x%lx\n", Attributes));
+
//
// Align range on an EFI_PAGE_SIZE boundary
- //
+ //
AlignedMemBase = (UINTN)(MemBase + EFI_PAGE_MASK) & ~EFI_PAGE_MASK;
MemLength -= AlignedMemBase - MemBase;
- SmmFreePages (AlignedMemBase, TRUNCATE_TO_PAGES ((UINTN)MemLength));
+ if (Type == EfiConventionalMemory) {
+ SmmInternalFreePagesEx (AlignedMemBase, TRUNCATE_TO_PAGES ((UINTN)MemLength), TRUE);
+ } else {
+ ConvertSmmMemoryMapEntry (EfiRuntimeServicesData, AlignedMemBase, TRUNCATE_TO_PAGES ((UINTN)MemLength), TRUE);
+ }
+
+ CoreFreeMemoryMapStack ();
+}
+
+/**
+ This function returns a copy of the current memory map. The map is an array of
+ memory descriptors, each of which describes a contiguous block of memory.
+
+ @param[in, out] MemoryMapSize A pointer to the size, in bytes, of the
+ MemoryMap buffer. On input, this is the size of
+ the buffer allocated by the caller. On output,
+ it is the size of the buffer returned by the
+ firmware if the buffer was large enough, or the
+ size of the buffer needed to contain the map if
+ the buffer was too small.
+ @param[in, out] MemoryMap A pointer to the buffer in which firmware places
+ the current memory map.
+ @param[out] MapKey A pointer to the location in which firmware
+ returns the key for the current memory map.
+ @param[out] DescriptorSize A pointer to the location in which firmware
+ returns the size, in bytes, of an individual
+ EFI_MEMORY_DESCRIPTOR.
+ @param[out] DescriptorVersion A pointer to the location in which firmware
+ returns the version number associated with the
+ EFI_MEMORY_DESCRIPTOR.
+
+ @retval EFI_SUCCESS The memory map was returned in the MemoryMap
+ buffer.
+ @retval EFI_BUFFER_TOO_SMALL The MemoryMap buffer was too small. The current
+ buffer size needed to hold the memory map is
+ returned in MemoryMapSize.
+ @retval EFI_INVALID_PARAMETER One of the parameters has an invalid value.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmCoreGetMemoryMap (
+ IN OUT UINTN *MemoryMapSize,
+ IN OUT EFI_MEMORY_DESCRIPTOR *MemoryMap,
+ OUT UINTN *MapKey,
+ OUT UINTN *DescriptorSize,
+ OUT UINT32 *DescriptorVersion
+ )
+{
+ UINTN Count;
+ LIST_ENTRY *Link;
+ MEMORY_MAP *Entry;
+ UINTN Size;
+ UINTN BufferSize;
+
+ Size = sizeof (EFI_MEMORY_DESCRIPTOR);
+
+ //
+ // Make sure Size != sizeof(EFI_MEMORY_DESCRIPTOR). This will
+ // prevent people from having pointer math bugs in their code.
+ // now you have to use *DescriptorSize to make things work.
+ //
+ Size += sizeof(UINT64) - (Size % sizeof (UINT64));
+
+ if (DescriptorSize != NULL) {
+ *DescriptorSize = Size;
+ }
+
+ if (DescriptorVersion != NULL) {
+ *DescriptorVersion = EFI_MEMORY_DESCRIPTOR_VERSION;
+ }
+
+ Count = GetSmmMemoryMapEntryCount ();
+ BufferSize = Size * Count;
+ if (*MemoryMapSize < BufferSize) {
+ *MemoryMapSize = BufferSize;
+ return EFI_BUFFER_TOO_SMALL;
+ }
+
+ *MemoryMapSize = BufferSize;
+ if (MemoryMap == NULL) {
+ return EFI_INVALID_PARAMETER;
+ }
+
+ ZeroMem (MemoryMap, BufferSize);
+ Link = gMemoryMap.ForwardLink;
+ while (Link != &gMemoryMap) {
+ Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+ Link = Link->ForwardLink;
+
+ MemoryMap->Type = Entry->Type;
+ MemoryMap->PhysicalStart = Entry->Start;
+ MemoryMap->NumberOfPages = RShiftU64 (Entry->End - Entry->Start + 1, EFI_PAGE_SHIFT);
+
+ MemoryMap = NEXT_MEMORY_DESCRIPTOR (MemoryMap, Size);
+ }
+
+ return EFI_SUCCESS;
}
diff --git a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.c b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.c
index 2bdb19c..b877a33 100644
--- a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.c
+++ b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.c
@@ -87,6 +87,8 @@ SMM_CORE_SMI_HANDLERS mSmmCoreSmiHandlers[] = {
UINTN mFullSmramRangeCount;
EFI_SMRAM_DESCRIPTOR *mFullSmramRanges;
+EFI_SMM_DRIVER_ENTRY *mSmmCoreDriverEntry;
+
EFI_LOADED_IMAGE_PROTOCOL *mSmmCoreLoadedImage;
/**
@@ -564,6 +566,42 @@ SmmCoreInstallLoadedImage (
);
ASSERT_EFI_ERROR (Status);
+ //
+ // Allocate a Loaded Image Protocol in SMM
+ //
+ Status = SmmAllocatePool (EfiRuntimeServicesData, sizeof(EFI_SMM_DRIVER_ENTRY), (VOID **)&mSmmCoreDriverEntry);
+ ASSERT_EFI_ERROR(Status);
+
+ ZeroMem (mSmmCoreDriverEntry, sizeof(EFI_SMM_DRIVER_ENTRY));
+ //
+ // Fill in the remaining fields of the Loaded Image Protocol instance.
+ //
+ mSmmCoreDriverEntry->Signature = EFI_SMM_DRIVER_ENTRY_SIGNATURE;
+ mSmmCoreDriverEntry->SmmLoadedImage.Revision = EFI_LOADED_IMAGE_PROTOCOL_REVISION;
+ mSmmCoreDriverEntry->SmmLoadedImage.ParentHandle = gSmmCorePrivate->SmmIplImageHandle;
+ mSmmCoreDriverEntry->SmmLoadedImage.SystemTable = gST;
+
+ mSmmCoreDriverEntry->SmmLoadedImage.ImageBase = (VOID *)(UINTN)gSmmCorePrivate->PiSmmCoreImageBase;
+ mSmmCoreDriverEntry->SmmLoadedImage.ImageSize = gSmmCorePrivate->PiSmmCoreImageSize;
+ mSmmCoreDriverEntry->SmmLoadedImage.ImageCodeType = EfiRuntimeServicesCode;
+ mSmmCoreDriverEntry->SmmLoadedImage.ImageDataType = EfiRuntimeServicesData;
+
+ mSmmCoreDriverEntry->ImageEntryPoint = gSmmCorePrivate->PiSmmCoreEntryPoint;
+ mSmmCoreDriverEntry->ImageBuffer = gSmmCorePrivate->PiSmmCoreImageBase;
+ mSmmCoreDriverEntry->NumberOfPage = EFI_SIZE_TO_PAGES((UINTN)gSmmCorePrivate->PiSmmCoreImageSize);
+
+ //
+ // Create a new image handle in the SMM handle database for the SMM Driver
+ //
+ mSmmCoreDriverEntry->SmmImageHandle = NULL;
+ Status = SmmInstallProtocolInterface (
+ &mSmmCoreDriverEntry->SmmImageHandle,
+ &gEfiLoadedImageProtocolGuid,
+ EFI_NATIVE_INTERFACE,
+ &mSmmCoreDriverEntry->SmmLoadedImage
+ );
+ ASSERT_EFI_ERROR(Status);
+
return ;
}
@@ -636,5 +674,7 @@ SmmMain (
SmmCoreInstallLoadedImage ();
+ SmmCoreInitializeMemoryAttributesTable ();
+
return EFI_SUCCESS;
}
diff --git a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.h b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.h
index f46ee72..e2fee54 100644
--- a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.h
+++ b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.h
@@ -110,6 +110,8 @@ typedef struct {
// Image Page Number
//
UINTN NumberOfPage;
+ EFI_HANDLE SmmImageHandle;
+ EFI_LOADED_IMAGE_PROTOCOL SmmLoadedImage;
} EFI_SMM_DRIVER_ENTRY;
#define EFI_HANDLE_SIGNATURE SIGNATURE_32('h','n','d','l')
@@ -551,6 +553,38 @@ SmmLocateProtocol (
);
/**
+ Function returns an array of handles that support the requested protocol
+ in a buffer allocated from pool. This is a version of SmmLocateHandle()
+ that allocates a buffer for the caller.
+
+ @param SearchType Specifies which handle(s) are to be returned.
+ @param Protocol Provides the protocol to search by. This
+ parameter is only valid for SearchType
+ ByProtocol.
+ @param SearchKey Supplies the search key depending on the
+ SearchType.
+ @param NumberHandles The number of handles returned in Buffer.
+ @param Buffer A pointer to the buffer to return the requested
+ array of handles that support Protocol.
+
+ @retval EFI_SUCCESS The result array of handles was returned.
+ @retval EFI_NOT_FOUND No handles match the search.
+ @retval EFI_OUT_OF_RESOURCES There is not enough pool memory to store the
+ matching results.
+ @retval EFI_INVALID_PARAMETER One or more paramters are not valid.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmLocateHandleBuffer (
+ IN EFI_LOCATE_SEARCH_TYPE SearchType,
+ IN EFI_GUID *Protocol OPTIONAL,
+ IN VOID *SearchKey OPTIONAL,
+ IN OUT UINTN *NumberHandles,
+ OUT EFI_HANDLE **Buffer
+ );
+
+/**
Manage SMI of a particular type.
@param HandlerType Points to the handler type or NULL for root SMI handlers.
@@ -980,9 +1014,66 @@ SmramProfileReadyToLock (
VOID
);
+/**
+ Initialize MemoryAttributes support.
+**/
+VOID
+EFIAPI
+SmmCoreInitializeMemoryAttributesTable (
+ VOID
+ );
+
+/**
+ This function returns a copy of the current memory map. The map is an array of
+ memory descriptors, each of which describes a contiguous block of memory.
+
+ @param[in, out] MemoryMapSize A pointer to the size, in bytes, of the
+ MemoryMap buffer. On input, this is the size of
+ the buffer allocated by the caller. On output,
+ it is the size of the buffer returned by the
+ firmware if the buffer was large enough, or the
+ size of the buffer needed to contain the map if
+ the buffer was too small.
+ @param[in, out] MemoryMap A pointer to the buffer in which firmware places
+ the current memory map.
+ @param[out] MapKey A pointer to the location in which firmware
+ returns the key for the current memory map.
+ @param[out] DescriptorSize A pointer to the location in which firmware
+ returns the size, in bytes, of an individual
+ EFI_MEMORY_DESCRIPTOR.
+ @param[out] DescriptorVersion A pointer to the location in which firmware
+ returns the version number associated with the
+ EFI_MEMORY_DESCRIPTOR.
+
+ @retval EFI_SUCCESS The memory map was returned in the MemoryMap
+ buffer.
+ @retval EFI_BUFFER_TOO_SMALL The MemoryMap buffer was too small. The current
+ buffer size needed to hold the memory map is
+ returned in MemoryMapSize.
+ @retval EFI_INVALID_PARAMETER One of the parameters has an invalid value.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmCoreGetMemoryMap (
+ IN OUT UINTN *MemoryMapSize,
+ IN OUT EFI_MEMORY_DESCRIPTOR *MemoryMap,
+ OUT UINTN *MapKey,
+ OUT UINTN *DescriptorSize,
+ OUT UINT32 *DescriptorVersion
+ );
+
+///
+/// For generic EFI machines make the default allocations 4K aligned
+///
+#define EFI_ACPI_RUNTIME_PAGE_ALLOCATION_ALIGNMENT (EFI_PAGE_SIZE)
+#define DEFAULT_PAGE_ALLOCATION (EFI_PAGE_SIZE)
+
extern UINTN mFullSmramRangeCount;
extern EFI_SMRAM_DESCRIPTOR *mFullSmramRanges;
+extern EFI_SMM_DRIVER_ENTRY *mSmmCoreDriverEntry;
+
extern EFI_LOADED_IMAGE_PROTOCOL *mSmmCoreLoadedImage;
//
diff --git a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf
index 1f73cbb..c256e90 100644
--- a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf
+++ b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf
@@ -38,6 +38,7 @@
Smi.c
InstallConfigurationTable.c
SmramProfileRecord.c
+ MemoryAttributesTable.c
[Packages]
MdePkg/MdePkg.dec
@@ -96,6 +97,7 @@
gEdkiiMemoryProfileGuid
## SOMETIMES_PRODUCES ## GUID # Install protocol
gEdkiiSmmMemoryProfileGuid
+ gEdkiiPiSmmMemoryAttributesTableGuid ## SOMETIMES_PRODUCES ## SystemTable
[UserExtensions.TianoCore."ExtraFiles"]
PiSmmCoreExtra.uni
diff --git a/MdeModulePkg/Core/PiSmmCore/Pool.c b/MdeModulePkg/Core/PiSmmCore/Pool.c
index 02dab01..dcfd13e 100644
--- a/MdeModulePkg/Core/PiSmmCore/Pool.c
+++ b/MdeModulePkg/Core/PiSmmCore/Pool.c
@@ -86,8 +86,24 @@ SmmInitializeMemoryServices (
}
//
// Initialize free SMRAM regions
+ // Need add Free memory at first, to let gSmmMemoryMap record data
//
for (Index = 0; Index < SmramRangeCount; Index++) {
+ if ((SmramRanges[Index].RegionState & (EFI_ALLOCATED | EFI_NEEDS_TESTING | EFI_NEEDS_ECC_INITIALIZATION)) != 0) {
+ continue;
+ }
+ SmmAddMemoryRegion (
+ SmramRanges[Index].CpuStart,
+ SmramRanges[Index].PhysicalSize,
+ EfiConventionalMemory,
+ SmramRanges[Index].RegionState
+ );
+ }
+
+ for (Index = 0; Index < SmramRangeCount; Index++) {
+ if ((SmramRanges[Index].RegionState & (EFI_ALLOCATED | EFI_NEEDS_TESTING | EFI_NEEDS_ECC_INITIALIZATION)) == 0) {
+ continue;
+ }
SmmAddMemoryRegion (
SmramRanges[Index].CpuStart,
SmramRanges[Index].PhysicalSize,
--
2.7.4.windows.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
` (2 preceding siblings ...)
2016-11-04 9:30 ` [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support Jiewen Yao
@ 2016-11-04 9:30 ` Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection Jiewen Yao
` (3 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04 9:30 UTC (permalink / raw)
To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek
If enabled, SMM will not use on-demand paging.
SMM will build static page table for all memory.
The page table size depend on 2 things:
1) The 1G paging capability.
2) The whole system memory/MMIO addressing capability.
A) If the system only supports 2M paging,
When the whole memory/MMIO is 32bit, we only need 1+1+4=6 pages for 4G.
When the whole memory/MMIO is 39bit, we need 1+1+256 pages (~ 1M)
When the whole memory/MMIO is 48bit, we need 1+256+256*256 pages (~ 257M)
B) If the system supports 1G paging.
When the whole memory/MMIO is 32bit, we only need 1+1+4=6 pages for 4G.
(We still generate 2M page for maintenance consideration.)
When the whole memory/MMIO is 39bit, we still need 6 pages.
(We setup 1G paging for >1G.)
When the whole memory/MMIO is 48bit, we need 1+256 pages (~ 1M).
Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
UefiCpuPkg/UefiCpuPkg.dec | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/UefiCpuPkg/UefiCpuPkg.dec b/UefiCpuPkg/UefiCpuPkg.dec
index 8674533..a110820 100644
--- a/UefiCpuPkg/UefiCpuPkg.dec
+++ b/UefiCpuPkg/UefiCpuPkg.dec
@@ -199,6 +199,14 @@
# @Prompt The specified AP target C-state for Mwait.
gUefiCpuPkgTokenSpaceGuid.PcdCpuApTargetCstate|0|UINT8|0x00000007
+ ## Indicates if SMM uses static page table.
+ # If enabled, SMM will not use on-demand paging. SMM will build static page table for all memory.<BR><BR>
+ # This flag only impacts X64 build, because SMM alway builds static page table for IA32.
+ # TRUE - SMM uses static page table for all memory.<BR>
+ # FALSE - SMM uses static page table for below 4G memory and use on-demand paging for above 4G memory.<BR>
+ # @Prompt Use static page table for all memory in SMM.
+ gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStaticPageTable|TRUE|BOOLEAN|0x3213210D
+
[PcdsDynamic, PcdsDynamicEx]
## Contains the pointer to a CPU S3 data buffer of structure ACPI_CPU_DATA.
# @Prompt The pointer to a CPU S3 data buffer.
--
2.7.4.windows.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
` (3 preceding siblings ...)
2016-11-04 9:30 ` [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable Jiewen Yao
@ 2016-11-04 9:30 ` Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm " Jiewen Yao
` (2 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04 9:30 UTC (permalink / raw)
To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek
PiSmmCpuDxeSmm consumes SmmAttributesTable and setup page table:
1) Code region is marked as read-only and Data region is non-executable,
if the PE image is 4K aligned.
2) Important data structure is set to RO, such as GDT/IDT.
3) SmmSaveState is set to non-executable,
and SmmEntrypoint is set to read-only.
4) If static page is supported, page table is read-only.
We use page table to protect other components, and itself.
If we use dynamic paging, we can still provide *partial* protection.
And hope page table is not modified by other components.
The XD enabling code is moved to SmiEntry to let NX take effect.
Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c | 71 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S | 67 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm | 68 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm | 70 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S | 226 +----
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm | 36 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm | 36 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 37 +-
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c | 4 +-
UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 127 ++-
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c | 142 +++-
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 156 +++-
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf | 5 +-
UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c | 871 ++++++++++++++++++++
UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c | 39 +-
UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h | 15 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c | 274 +++++-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S | 51 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm | 54 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm | 61 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S | 250 +-----
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm | 35 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm | 31 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 30 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c | 7 +-
25 files changed, 1988 insertions(+), 775 deletions(-)
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c
index a871bef..65f09e5 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c
@@ -58,7 +58,7 @@ SmmInitPageTable (
if (FeaturePcdGet (PcdCpuSmmStackGuard)) {
InitializeIDTSmmStackGuard ();
}
- return Gen4GPageTable (0, TRUE);
+ return Gen4GPageTable (TRUE);
}
/**
@@ -99,7 +99,7 @@ SmiPFHandler (
if ((FeaturePcdGet (PcdCpuSmmStackGuard)) &&
(PFAddress >= mCpuHotPlugData.SmrrBase) &&
(PFAddress < (mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize))) {
- DEBUG ((EFI_D_ERROR, "SMM stack overflow!\n"));
+ DEBUG ((DEBUG_ERROR, "SMM stack overflow!\n"));
CpuDeadLoop ();
}
@@ -109,7 +109,7 @@ SmiPFHandler (
if ((PFAddress < mCpuHotPlugData.SmrrBase) ||
(PFAddress >= mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize)) {
if ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0) {
- DEBUG ((EFI_D_ERROR, "Code executed on IP(0x%x) out of SMM range after SMM is locked!\n", PFAddress));
+ DEBUG ((DEBUG_ERROR, "Code executed on IP(0x%x) out of SMM range after SMM is locked!\n", PFAddress));
DEBUG_CODE (
DumpModuleInfoByIp (*(UINTN *)(UINTN)SystemContext.SystemContextIa32->Esp);
);
@@ -128,3 +128,68 @@ SmiPFHandler (
ReleaseSpinLock (mPFLock);
}
+
+/**
+ This function sets memory attribute for page table.
+**/
+VOID
+SetPageTableAttributes (
+ VOID
+ )
+{
+ UINTN Index2;
+ UINTN Index3;
+ UINT64 *L1PageTable;
+ UINT64 *L2PageTable;
+ UINT64 *L3PageTable;
+ BOOLEAN IsSplitted;
+ BOOLEAN PageTableSplitted;
+
+ DEBUG ((DEBUG_INFO, "SetPageTableAttributes\n"));
+
+ //
+ // Disable write protection, because we need mark page table to be write protected.
+ // We need *write* page table memory, to mark itself to be *read only*.
+ //
+ AsmWriteCr0 (AsmReadCr0() & ~CR0_WP);
+
+ do {
+ DEBUG ((DEBUG_INFO, "Start...\n"));
+ PageTableSplitted = FALSE;
+
+ L3PageTable = (UINT64 *)GetPageTableBase ();
+
+ SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L3PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+ PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+ for (Index3 = 0; Index3 < 4; Index3++) {
+ L2PageTable = (UINT64 *)(UINTN)(L3PageTable[Index3] & PAGING_4K_ADDRESS_MASK_64);
+ if (L2PageTable == NULL) {
+ continue;
+ }
+
+ SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L2PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+ PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+ for (Index2 = 0; Index2 < SIZE_4KB/sizeof(UINT64); Index2++) {
+ if ((L2PageTable[Index2] & IA32_PG_PS) != 0) {
+ // 2M
+ continue;
+ }
+ L1PageTable = (UINT64 *)(UINTN)(L2PageTable[Index2] & PAGING_4K_ADDRESS_MASK_64);
+ if (L1PageTable == NULL) {
+ continue;
+ }
+ SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L1PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+ PageTableSplitted = (PageTableSplitted || IsSplitted);
+ }
+ }
+ } while (PageTableSplitted);
+
+ //
+ // Enable write protection, after page table updated.
+ //
+ AsmWriteCr0 (AsmReadCr0() | CR0_WP);
+
+ return ;
+}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S
index ec5b9a0..93f11e2 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S
@@ -1,6 +1,6 @@
#------------------------------------------------------------------------------
#
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
# This program and the accompanying materials
# are licensed and made available under the terms and conditions of the BSD License
# which accompanies this distribution. The full text of the license may be found at
@@ -24,9 +24,13 @@ ASM_GLOBAL ASM_PFX(gcSmiHandlerSize)
ASM_GLOBAL ASM_PFX(gSmiCr3)
ASM_GLOBAL ASM_PFX(gSmiStack)
ASM_GLOBAL ASM_PFX(gSmbase)
+ASM_GLOBAL ASM_PFX(mXdSupported)
ASM_GLOBAL ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))
ASM_GLOBAL ASM_PFX(gSmiHandlerIdtr)
+.equ MSR_EFER, 0xc0000080
+.equ MSR_EFER_XD, 0x800
+
.equ DSC_OFFSET, 0xfb00
.equ DSC_GDTPTR, 0x30
.equ DSC_GDTSIZ, 0x38
@@ -122,8 +126,39 @@ L11:
orl $BIT10, %eax
L12: # as cr4.PGE is not set here, refresh cr3
movl %eax, %cr4 # in PreModifyMtrrs() to flush TLB.
+
+ cmpb $0, ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))
+ jz L5
+# Load TSS
+ movb $0x89, (TSS_SEGMENT + 5)(%ebp) # clear busy flag
+ movl $TSS_SEGMENT, %eax
+ ltrw %ax
+L5:
+
+# enable NXE if supported
+ .byte 0xb0 # mov al, imm8
+ASM_PFX(mXdSupported): .byte 1
+ cmpb $0, %al
+ jz L14
+#
+# Check XD disable bit
+#
+ movl $MSR_IA32_MISC_ENABLE, %ecx
+ rdmsr
+ pushl %edx # save MSR_IA32_MISC_ENABLE[63-32]
+ testl $BIT2, %edx # MSR_IA32_MISC_ENABLE[34]
+ jz L13
+ andw $0x0FFFB, %dx # clear XD Disable bit if it is set
+ wrmsr
+L13:
+ movl $MSR_EFER, %ecx
+ rdmsr
+ orw $MSR_EFER_XD,%ax # enable NXE
+ wrmsr
+L14:
+
movl %cr0, %ebx
- orl $0x080010000, %ebx # enable paging + WP
+ orl $0x080010023, %ebx # enable paging + WP + NE + MP + PE
movl %ebx, %cr0
leal DSC_OFFSET(%edi),%ebx
movw DSC_DS(%ebx),%ax
@@ -135,35 +170,35 @@ L12: # as cr4.PGE is not set here, refresh
movw DSC_SS(%ebx),%ax
movl %eax, %ss
- cmpb $0, ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))
- jz L5
-
-# Load TSS
- movb $0x89, (TSS_SEGMENT + 5)(%ebp) # clear busy flag
- movl $TSS_SEGMENT, %eax
- ltrw %ax
-L5:
-
# jmp _SmiHandler # instruction is not needed
_SmiHandler:
- movl (%esp), %ebx
+ movl 4(%esp), %ebx
pushl %ebx
movl $ASM_PFX(CpuSmmDebugEntry), %eax
call *%eax
- popl %ecx
-
+ addl $4, %esp
+
pushl %ebx
movl $ASM_PFX(SmiRendezvous), %eax
call *%eax
- popl %ecx
+ addl $4, %esp
pushl %ebx
movl $ASM_PFX(CpuSmmDebugExit), %eax
call *%eax
- popl %ecx
+ addl $4, %esp
+
+ popl %edx # get saved MSR_IA32_MISC_ENABLE[63-32]
+ testl $BIT2, %edx
+ jz L16
+ movl $MSR_IA32_MISC_ENABLE, %ecx
+ rdmsr
+ orw $BIT2, %dx # set XD Disable bit if it was set before entering into SMM
+ wrmsr
+L16:
rsm
ASM_PFX(gcSmiHandlerSize): .word . - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm
index ac1a9b4..1e5db55 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm
@@ -1,5 +1,5 @@
;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
; This program and the accompanying materials
; are licensed and made available under the terms and conditions of the BSD License
; which accompanies this distribution. The full text of the license may be found at
@@ -22,6 +22,10 @@
.model flat,C
.xmm
+MSR_IA32_MISC_ENABLE EQU 1A0h
+MSR_EFER EQU 0c0000080h
+MSR_EFER_XD EQU 0800h
+
DSC_OFFSET EQU 0fb00h
DSC_GDTPTR EQU 30h
DSC_GDTSIZ EQU 38h
@@ -43,6 +47,7 @@ EXTERNDEF gcSmiHandlerSize:WORD
EXTERNDEF gSmiCr3:DWORD
EXTERNDEF gSmiStack:DWORD
EXTERNDEF gSmbase:DWORD
+EXTERNDEF mXdSupported:BYTE
EXTERNDEF FeaturePcdGet (PcdCpuSmmStackGuard):BYTE
EXTERNDEF gSmiHandlerIdtr:FWORD
@@ -128,8 +133,39 @@ gSmiCr3 DD ?
or eax, BIT10
@@: ; as cr4.PGE is not set here, refresh cr3
mov cr4, eax ; in PreModifyMtrrs() to flush TLB.
+
+ cmp FeaturePcdGet (PcdCpuSmmStackGuard), 0
+ jz @F
+; Load TSS
+ mov byte ptr [ebp + TSS_SEGMENT + 5], 89h ; clear busy flag
+ mov eax, TSS_SEGMENT
+ ltr ax
+@@:
+
+; enable NXE if supported
+ DB 0b0h ; mov al, imm8
+mXdSupported DB 1
+ cmp al, 0
+ jz @SkipXd
+;
+; Check XD disable bit
+;
+ mov ecx, MSR_IA32_MISC_ENABLE
+ rdmsr
+ push edx ; save MSR_IA32_MISC_ENABLE[63-32]
+ test edx, BIT2 ; MSR_IA32_MISC_ENABLE[34]
+ jz @f
+ and dx, 0FFFBh ; clear XD Disable bit if it is set
+ wrmsr
+@@:
+ mov ecx, MSR_EFER
+ rdmsr
+ or ax, MSR_EFER_XD ; enable NXE
+ wrmsr
+@SkipXd:
+
mov ebx, cr0
- or ebx, 080010000h ; enable paging + WP
+ or ebx, 080010023h ; enable paging + WP + NE + MP + PE
mov cr0, ebx
lea ebx, [edi + DSC_OFFSET]
mov ax, [ebx + DSC_DS]
@@ -141,34 +177,34 @@ gSmiCr3 DD ?
mov ax, [ebx + DSC_SS]
mov ss, eax
- cmp FeaturePcdGet (PcdCpuSmmStackGuard), 0
- jz @F
-
-; Load TSS
- mov byte ptr [ebp + TSS_SEGMENT + 5], 89h ; clear busy flag
- mov eax, TSS_SEGMENT
- ltr ax
-@@:
; jmp _SmiHandler ; instruction is not needed
_SmiHandler PROC
- mov ebx, [esp] ; CPU Index
-
+ mov ebx, [esp + 4] ; CPU Index
push ebx
mov eax, CpuSmmDebugEntry
call eax
- pop ecx
+ add esp, 4
push ebx
mov eax, SmiRendezvous
call eax
- pop ecx
-
+ add esp, 4
+
push ebx
mov eax, CpuSmmDebugExit
call eax
- pop ecx
+ add esp, 4
+ pop edx ; get saved MSR_IA32_MISC_ENABLE[63-32]
+ test edx, BIT2
+ jz @f
+ mov ecx, MSR_IA32_MISC_ENABLE
+ rdmsr
+ or dx, BIT2 ; set XD Disable bit if it was set before entering into SMM
+ wrmsr
+
+@@:
rsm
_SmiHandler ENDP
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm
index 4fb0c13..2d81dde 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm
@@ -18,6 +18,10 @@
;
;-------------------------------------------------------------------------------
+%define MSR_IA32_MISC_ENABLE 0x1A0
+%define MSR_EFER 0xc0000080
+%define MSR_EFER_XD 0x800
+
%define DSC_OFFSET 0xfb00
%define DSC_GDTPTR 0x30
%define DSC_GDTSIZ 0x38
@@ -40,6 +44,7 @@ global ASM_PFX(gcSmiHandlerSize)
global ASM_PFX(gSmiCr3)
global ASM_PFX(gSmiStack)
global ASM_PFX(gSmbase)
+global ASM_PFX(mXdSupported)
extern ASM_PFX(gSmiHandlerIdtr)
SECTION .text
@@ -56,7 +61,7 @@ _SmiEntryPoint:
mov ebp, eax ; ebp = GDT base
o32 lgdt [cs:bx] ; lgdt fword ptr cs:[bx]
mov ax, PROTECT_MODE_CS
- mov [cs:bx-0x2],ax
+ mov [cs:bx-0x2],ax
DB 0x66, 0xbf ; mov edi, SMBASE
ASM_PFX(gSmbase): DD 0
lea eax, [edi + (@32bit - _SmiEntryPoint) + 0x8000]
@@ -66,7 +71,7 @@ ASM_PFX(gSmbase): DD 0
or ebx, 0x23
mov cr0, ebx
jmp dword 0x0:0x0
-_GdtDesc:
+_GdtDesc:
DW 0
DD 0
@@ -115,8 +120,39 @@ ASM_PFX(gSmiCr3): DD 0
or eax, BIT10
.4: ; as cr4.PGE is not set here, refresh cr3
mov cr4, eax ; in PreModifyMtrrs() to flush TLB.
+
+ cmp byte [dword ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))], 0
+ jz .6
+; Load TSS
+ mov byte [ebp + TSS_SEGMENT + 5], 0x89 ; clear busy flag
+ mov eax, TSS_SEGMENT
+ ltr ax
+.6:
+
+; enable NXE if supported
+ DB 0b0h ; mov al, imm8
+ASM_PFX(mXdSupported): DB 1
+ cmp al, 0
+ jz @SkipXd
+;
+; Check XD disable bit
+;
+ mov ecx, MSR_IA32_MISC_ENABLE
+ rdmsr
+ push edx ; save MSR_IA32_MISC_ENABLE[63-32]
+ test edx, BIT2 ; MSR_IA32_MISC_ENABLE[34]
+ jz .5
+ and dx, 0xFFFB ; clear XD Disable bit if it is set
+ wrmsr
+.5:
+ mov ecx, MSR_EFER
+ rdmsr
+ or ax, MSR_EFER_XD ; enable NXE
+ wrmsr
+@SkipXd:
+
mov ebx, cr0
- or ebx, 0x080010000 ; enable paging + WP
+ or ebx, 0x80010023 ; enable paging + WP + NE + MP + PE
mov cr0, ebx
lea ebx, [edi + DSC_OFFSET]
mov ax, [ebx + DSC_DS]
@@ -128,35 +164,35 @@ ASM_PFX(gSmiCr3): DD 0
mov ax, [ebx + DSC_SS]
mov ss, eax
- cmp byte [dword ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))], 0
- jz .5
-
-; Load TSS
- mov byte [ebp + TSS_SEGMENT + 5], 0x89 ; clear busy flag
- mov eax, TSS_SEGMENT
- ltr ax
-.5:
; jmp _SmiHandler ; instruction is not needed
global ASM_PFX(SmiHandler)
ASM_PFX(SmiHandler):
- mov ebx, [esp] ; CPU Index
-
+ mov ebx, [esp + 4] ; CPU Index
push ebx
mov eax, ASM_PFX(CpuSmmDebugEntry)
call eax
- pop ecx
+ add esp, 4
push ebx
mov eax, ASM_PFX(SmiRendezvous)
call eax
- pop ecx
-
+ add esp, 4
+
push ebx
mov eax, ASM_PFX(CpuSmmDebugExit)
call eax
- pop ecx
+ add esp, 4
+
+ pop edx ; get saved MSR_IA32_MISC_ENABLE[63-32]
+ test edx, BIT2
+ jz .7
+ mov ecx, MSR_IA32_MISC_ENABLE
+ rdmsr
+ or dx, BIT2 ; set XD Disable bit if it was set before entering into SMM
+ wrmsr
+.7:
rsm
ASM_PFX(gcSmiHandlerSize): DW $ - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S
index 4130bf5..cf5ef82 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S
@@ -1,6 +1,6 @@
#------------------------------------------------------------------------------
#
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
# This program and the accompanying materials
# are licensed and made available under the terms and conditions of the BSD License
# which accompanies this distribution. The full text of the license may be found at
@@ -24,6 +24,7 @@ ASM_GLOBAL ASM_PFX(PageFaultStubFunction)
ASM_GLOBAL ASM_PFX(gSmiMtrrs)
ASM_GLOBAL ASM_PFX(gcSmiIdtr)
ASM_GLOBAL ASM_PFX(gcSmiGdtr)
+ASM_GLOBAL ASM_PFX(gTaskGateDescriptor)
ASM_GLOBAL ASM_PFX(gcPsd)
ASM_GLOBAL ASM_PFX(FeaturePcdGet (PcdCpuSmmProfileEnable))
@@ -236,207 +237,10 @@ ASM_PFX(gcPsd):
ASM_PFX(gcSmiGdtr): .word GDT_SIZE - 1
.long NullSeg
-ASM_PFX(gcSmiIdtr): .word IDT_SIZE - 1
- .long _SmiIDT
-
-_SmiIDT:
-# The following segment repeats 32 times:
-# No. 1
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 2
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 3
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 4
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 5
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 6
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 7
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 8
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 9
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 10
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 11
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 12
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 13
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 14
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 15
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 16
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 17
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 18
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 19
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 20
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 21
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 22
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 23
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 24
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 25
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 26
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 27
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 28
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 29
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 30
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 31
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-# No. 32
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
-
-.equ IDT_SIZE, . - _SmiIDT
-
-TaskGateDescriptor:
+ASM_PFX(gcSmiIdtr): .word 0
+ .long 0
+
+ASM_PFX(gTaskGateDescriptor):
.word 0 # Reserved
.word EXCEPTION_TSS_SEL # TSS Segment selector
.byte 0 # Reserved
@@ -891,21 +695,3 @@ ASM_PFX(PageFaultStubFunction):
#
clts
iret
-
-ASM_GLOBAL ASM_PFX(InitializeIDTSmmStackGuard)
-ASM_PFX(InitializeIDTSmmStackGuard):
- pushl %ebx
-#
-# If SMM Stack Guard feature is enabled, the Page Fault Exception entry in IDT
-# is a Task Gate Descriptor so that when a Page Fault Exception occurs,
-# the processors can use a known good stack in case stack ran out.
-#
- leal _SmiIDT + 14 * 8, %ebx
- leal TaskGateDescriptor, %edx
- movl (%edx), %eax
- movl %eax, (%ebx)
- movl 4(%edx), %eax
- movl %eax, 4(%ebx)
-
- popl %ebx
- ret
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm
index b4eb492..7b162f8 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm
@@ -1,5 +1,5 @@
;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
; This program and the accompanying materials
; are licensed and made available under the terms and conditions of the BSD License
; which accompanies this distribution. The full text of the license may be found at
@@ -26,6 +26,7 @@ EXTERNDEF PageFaultStubFunction:PROC
EXTERNDEF gSmiMtrrs:QWORD
EXTERNDEF gcSmiIdtr:FWORD
EXTERNDEF gcSmiGdtr:FWORD
+EXTERNDEF gTaskGateDescriptor:QWORD
EXTERNDEF gcPsd:BYTE
EXTERNDEF FeaturePcdGet (PcdCpuSmmProfileEnable):BYTE
@@ -252,20 +253,10 @@ gcSmiGdtr LABEL FWORD
DD offset NullSeg
gcSmiIdtr LABEL FWORD
- DW IDT_SIZE - 1
- DD offset _SmiIDT
-
-_SmiIDT LABEL QWORD
-REPEAT 32
- DW 0 ; Offset 0:15
- DW CODE_SEL ; Segment selector
- DB 0 ; Unused
- DB 8eh ; Interrupt Gate, Present
- DW 0 ; Offset 16:31
- ENDM
-IDT_SIZE = $ - offset _SmiIDT
-
-TaskGateDescriptor LABEL DWORD
+ DW 0
+ DD 0
+
+gTaskGateDescriptor LABEL QWORD
DW 0 ; Reserved
DW EXCEPTION_TSS_SEL ; TSS Segment selector
DB 0 ; Reserved
@@ -720,19 +711,4 @@ PageFaultStubFunction PROC
iretd
PageFaultStubFunction ENDP
-InitializeIDTSmmStackGuard PROC USES ebx
-;
-; If SMM Stack Guard feature is enabled, the Page Fault Exception entry in IDT
-; is a Task Gate Descriptor so that when a Page Fault Exception occurs,
-; the processors can use a known good stack in case stack is ran out.
-;
- lea ebx, _SmiIDT + 14 * 8
- lea edx, TaskGateDescriptor
- mov eax, [edx]
- mov [ebx], eax
- mov eax, [edx + 4]
- mov [ebx + 4], eax
- ret
-InitializeIDTSmmStackGuard ENDP
-
END
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm
index 6a32828..4d58999 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm
@@ -1,5 +1,5 @@
;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
; This program and the accompanying materials
; are licensed and made available under the terms and conditions of the BSD License
; which accompanies this distribution. The full text of the license may be found at
@@ -24,6 +24,7 @@ extern ASM_PFX(SmiPFHandler)
global ASM_PFX(gcSmiIdtr)
global ASM_PFX(gcSmiGdtr)
+global ASM_PFX(gTaskGateDescriptor)
global ASM_PFX(gcPsd)
SECTION .data
@@ -250,21 +251,10 @@ ASM_PFX(gcSmiGdtr):
DD NullSeg
ASM_PFX(gcSmiIdtr):
- DW IDT_SIZE - 1
- DD _SmiIDT
+ DW 0
+ DD 0
-_SmiIDT:
-%rep 32
- DW 0 ; Offset 0:15
- DW CODE_SEL ; Segment selector
- DB 0 ; Unused
- DB 0x8e ; Interrupt Gate, Present
- DW 0 ; Offset 16:31
-%endrep
-
-IDT_SIZE equ $ - _SmiIDT
-
-TaskGateDescriptor:
+ASM_PFX(gTaskGateDescriptor):
DW 0 ; Reserved
DW EXCEPTION_TSS_SEL ; TSS Segment selector
DB 0 ; Reserved
@@ -717,19 +707,3 @@ ASM_PFX(PageFaultStubFunction):
clts
iretd
-global ASM_PFX(InitializeIDTSmmStackGuard)
-ASM_PFX(InitializeIDTSmmStackGuard):
- push ebx
-;
-; If SMM Stack Guard feature is enabled, the Page Fault Exception entry in IDT
-; is a Task Gate Descriptor so that when a Page Fault Exception occurrs,
-; the processors can use a known good stack in case stack is ran out.
-;
- lea ebx, [_SmiIDT + 14 * 8]
- lea edx, [TaskGateDescriptor]
- mov eax, [edx]
- mov [ebx], eax
- mov eax, [edx + 4]
- mov [ebx + 4], eax
- pop ebx
- ret
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
index 545b534..e87bf7b 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
@@ -1,7 +1,7 @@
/** @file
SMM CPU misc functions for Ia32 arch specific.
-Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2015 - 2016, Intel Corporation. All rights reserved.<BR>
This program and the accompanying materials
are licensed and made available under the terms and conditions of the BSD License
which accompanies this distribution. The full text of the license may be found at
@@ -14,6 +14,33 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
#include "PiSmmCpuDxeSmm.h"
+extern UINT64 gTaskGateDescriptor;
+
+EFI_PHYSICAL_ADDRESS mGdtBuffer;
+UINTN mGdtBufferSize;
+
+/**
+ Initialize IDT for SMM Stack Guard.
+
+**/
+VOID
+EFIAPI
+InitializeIDTSmmStackGuard (
+ VOID
+ )
+{
+ IA32_IDT_GATE_DESCRIPTOR *IdtGate;
+
+ //
+ // If SMM Stack Guard feature is enabled, the Page Fault Exception entry in IDT
+ // is a Task Gate Descriptor so that when a Page Fault Exception occurs,
+ // the processors can use a known good stack in case stack is ran out.
+ //
+ IdtGate = (IA32_IDT_GATE_DESCRIPTOR *)gcSmiIdtr.Base;
+ IdtGate += EXCEPT_IA32_PAGE_FAULT;
+ IdtGate->Uint64 = gTaskGateDescriptor;
+}
+
/**
Initialize Gdt for all processors.
@@ -49,8 +76,10 @@ InitGdt (
gcSmiGdtr.Limit += (UINT16)(2 * sizeof (IA32_SEGMENT_DESCRIPTOR));
GdtTssTableSize = (gcSmiGdtr.Limit + 1 + TSS_SIZE * 2 + 7) & ~7; // 8 bytes aligned
- GdtTssTables = (UINT8*)AllocatePages (EFI_SIZE_TO_PAGES (GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus));
+ mGdtBufferSize = GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus;
+ GdtTssTables = (UINT8*)AllocateCodePages (EFI_SIZE_TO_PAGES (mGdtBufferSize));
ASSERT (GdtTssTables != NULL);
+ mGdtBuffer = (UINTN)GdtTssTables;
GdtTableStepSize = GdtTssTableSize;
for (Index = 0; Index < gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus; Index++) {
@@ -82,8 +111,10 @@ InitGdt (
// Just use original table, AllocatePage and copy them here to make sure GDTs are covered in page memory.
//
GdtTssTableSize = gcSmiGdtr.Limit + 1;
- GdtTssTables = (UINT8*)AllocatePages (EFI_SIZE_TO_PAGES (GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus));
+ mGdtBufferSize = GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus;
+ GdtTssTables = (UINT8*)AllocateCodePages (EFI_SIZE_TO_PAGES (mGdtBufferSize));
ASSERT (GdtTssTables != NULL);
+ mGdtBuffer = (UINTN)GdtTssTables;
GdtTableStepSize = GdtTssTableSize;
for (Index = 0; Index < gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus; Index++) {
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c
index 767cb69..724cd92 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c
@@ -1,7 +1,7 @@
/** @file
IA-32 processor specific functions to enable SMM profile.
-Copyright (c) 2012 - 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2012 - 2016, Intel Corporation. All rights reserved.<BR>
This program and the accompanying materials
are licensed and made available under the terms and conditions of the BSD License
which accompanies this distribution. The full text of the license may be found at
@@ -24,7 +24,7 @@ InitSmmS3Cr3 (
VOID
)
{
- mSmmS3ResumeState->SmmS3Cr3 = Gen4GPageTable (0, TRUE);
+ mSmmS3ResumeState->SmmS3Cr3 = Gen4GPageTable (TRUE);
return ;
}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
index 12466ef..d0092d2 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
@@ -734,14 +734,12 @@ APHandler (
/**
Create 4G PageTable in SMRAM.
- @param ExtraPages Additional page numbers besides for 4G memory
- @param Is32BitPageTable Whether the page table is 32-bit PAE
+ @param[in] Is32BitPageTable Whether the page table is 32-bit PAE
@return PageTable Address
**/
UINT32
Gen4GPageTable (
- IN UINTN ExtraPages,
IN BOOLEAN Is32BitPageTable
)
{
@@ -775,10 +773,10 @@ Gen4GPageTable (
//
// Allocate the page table
//
- PageTable = AllocatePageTableMemory (ExtraPages + 5 + PagesNeeded);
+ PageTable = AllocatePageTableMemory (5 + PagesNeeded);
ASSERT (PageTable != NULL);
- PageTable = (VOID *)((UINTN)PageTable + EFI_PAGES_TO_SIZE (ExtraPages));
+ PageTable = (VOID *)((UINTN)PageTable);
Pte = (UINT64*)PageTable;
//
@@ -903,13 +901,13 @@ SetCacheability (
PageTable[PTIndex] |= (UINT64)Cacheability;
}
-
/**
Schedule a procedure to run on the specified CPU.
- @param Procedure The address of the procedure to run
- @param CpuIndex Target CPU Index
- @param ProcArguments The parameter to pass to the procedure
+ @param[in] Procedure The address of the procedure to run
+ @param[in] CpuIndex Target CPU Index
+ @param[in, OUT] ProcArguments The parameter to pass to the procedure
+ @param[in] BlockingMode Startup AP in blocking mode or not
@retval EFI_INVALID_PARAMETER CpuNumber not valid
@retval EFI_INVALID_PARAMETER CpuNumber specifying BSP
@@ -919,26 +917,44 @@ SetCacheability (
**/
EFI_STATUS
-EFIAPI
-SmmStartupThisAp (
+InternalSmmStartupThisAp (
IN EFI_AP_PROCEDURE Procedure,
IN UINTN CpuIndex,
- IN OUT VOID *ProcArguments OPTIONAL
+ IN OUT VOID *ProcArguments OPTIONAL,
+ IN BOOLEAN BlockingMode
)
{
- if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
- CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
- !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
- gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove ||
- !AcquireSpinLockOrFail (mSmmMpSyncData->CpuData[CpuIndex].Busy)) {
+ if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus) {
+ DEBUG((DEBUG_ERROR, "CpuIndex(%d) >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus(%d)\n", CpuIndex, gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus));
+ return EFI_INVALID_PARAMETER;
+ }
+ if (CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu) {
+ DEBUG((DEBUG_ERROR, "CpuIndex(%d) == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu\n", CpuIndex));
return EFI_INVALID_PARAMETER;
}
+ if (!(*(mSmmMpSyncData->CpuData[CpuIndex].Present))) {
+ DEBUG((DEBUG_ERROR, "!mSmmMpSyncData->CpuData[%d].Present\n", CpuIndex));
+ return EFI_INVALID_PARAMETER;
+ }
+ if (gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
+ DEBUG((DEBUG_ERROR, "gSmmCpuPrivate->Operation[%d] == SmmCpuRemove\n", CpuIndex));
+ return EFI_INVALID_PARAMETER;
+ }
+
+ if (BlockingMode) {
+ AcquireSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
+ } else {
+ if (!AcquireSpinLockOrFail (mSmmMpSyncData->CpuData[CpuIndex].Busy)) {
+ DEBUG((DEBUG_ERROR, "mSmmMpSyncData->CpuData[%d].Busy\n", CpuIndex));
+ return EFI_INVALID_PARAMETER;
+ }
+ }
mSmmMpSyncData->CpuData[CpuIndex].Procedure = Procedure;
mSmmMpSyncData->CpuData[CpuIndex].Parameter = ProcArguments;
ReleaseSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);
- if (FeaturePcdGet (PcdCpuSmmBlockStartupThisAp)) {
+ if (BlockingMode) {
AcquireSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
ReleaseSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
}
@@ -946,6 +962,56 @@ SmmStartupThisAp (
}
/**
+ Schedule a procedure to run on the specified CPU in blocking mode.
+
+ @param[in] Procedure The address of the procedure to run
+ @param[in] CpuIndex Target CPU Index
+ @param[in, out] ProcArguments The parameter to pass to the procedure
+
+ @retval EFI_INVALID_PARAMETER CpuNumber not valid
+ @retval EFI_INVALID_PARAMETER CpuNumber specifying BSP
+ @retval EFI_INVALID_PARAMETER The AP specified by CpuNumber did not enter SMM
+ @retval EFI_INVALID_PARAMETER The AP specified by CpuNumber is busy
+ @retval EFI_SUCCESS The procedure has been successfully scheduled
+
+**/
+EFI_STATUS
+EFIAPI
+SmmBlockingStartupThisAp (
+ IN EFI_AP_PROCEDURE Procedure,
+ IN UINTN CpuIndex,
+ IN OUT VOID *ProcArguments OPTIONAL
+ )
+{
+ return InternalSmmStartupThisAp(Procedure, CpuIndex, ProcArguments, TRUE);
+}
+
+/**
+ Schedule a procedure to run on the specified CPU.
+
+ @param Procedure The address of the procedure to run
+ @param CpuIndex Target CPU Index
+ @param ProcArguments The parameter to pass to the procedure
+
+ @retval EFI_INVALID_PARAMETER CpuNumber not valid
+ @retval EFI_INVALID_PARAMETER CpuNumber specifying BSP
+ @retval EFI_INVALID_PARAMETER The AP specified by CpuNumber did not enter SMM
+ @retval EFI_INVALID_PARAMETER The AP specified by CpuNumber is busy
+ @retval EFI_SUCCESS The procedure has been successfully scheduled
+
+**/
+EFI_STATUS
+EFIAPI
+SmmStartupThisAp (
+ IN EFI_AP_PROCEDURE Procedure,
+ IN UINTN CpuIndex,
+ IN OUT VOID *ProcArguments OPTIONAL
+ )
+{
+ return InternalSmmStartupThisAp(Procedure, CpuIndex, ProcArguments, FeaturePcdGet (PcdCpuSmmBlockStartupThisAp));
+}
+
+/**
This function sets DR6 & DR7 according to SMM save state, before running SMM C code.
They are useful when you want to enable hardware breakpoints in SMM without entry SMM mode.
@@ -1022,8 +1088,6 @@ SmiRendezvous (
BOOLEAN BspInProgress;
UINTN Index;
UINTN Cr2;
- BOOLEAN XdDisableFlag;
- MSR_IA32_MISC_ENABLE_REGISTER MiscEnableMsr;
//
// Save Cr2 because Page Fault exception in SMM may override its value
@@ -1082,20 +1146,6 @@ SmiRendezvous (
InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
}
- //
- // Try to enable XD
- //
- XdDisableFlag = FALSE;
- if (mXdSupported) {
- MiscEnableMsr.Uint64 = AsmReadMsr64 (MSR_IA32_MISC_ENABLE);
- if (MiscEnableMsr.Bits.XD == 1) {
- XdDisableFlag = TRUE;
- MiscEnableMsr.Bits.XD = 0;
- AsmWriteMsr64 (MSR_IA32_MISC_ENABLE, MiscEnableMsr.Uint64);
- }
- ActivateXd ();
- }
-
if (FeaturePcdGet (PcdCpuSmmProfileEnable)) {
ActivateSmmProfile (CpuIndex);
}
@@ -1176,15 +1226,6 @@ SmiRendezvous (
//
while (*mSmmMpSyncData->AllCpusInSync) {
CpuPause ();
- }
-
- //
- // Restore XD
- //
- if (XdDisableFlag) {
- MiscEnableMsr.Uint64 = AsmReadMsr64 (MSR_IA32_MISC_ENABLE);
- MiscEnableMsr.Bits.XD = 1;
- AsmWriteMsr64 (MSR_IA32_MISC_ENABLE, MiscEnableMsr.Uint64);
}
}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
index 852b5c7..8ef6695 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
@@ -113,6 +113,19 @@ InitializeSmmIdt (
EFI_STATUS Status;
BOOLEAN InterruptState;
IA32_DESCRIPTOR DxeIdtr;
+
+ //
+ // There are 32 (not 255) entries in it since only processor
+ // generated exceptions will be handled.
+ //
+ gcSmiIdtr.Limit = (sizeof(IA32_IDT_GATE_DESCRIPTOR) * 32) - 1;
+ //
+ // Allocate page aligned IDT, because it might be set as read only.
+ //
+ gcSmiIdtr.Base = (UINTN)AllocateCodePages (EFI_SIZE_TO_PAGES(gcSmiIdtr.Limit + 1));
+ ASSERT (gcSmiIdtr.Base != 0);
+ ZeroMem ((VOID *)gcSmiIdtr.Base, gcSmiIdtr.Limit + 1);
+
//
// Disable Interrupt and save DXE IDT table
//
@@ -731,9 +744,9 @@ PiCpuSmmEntry (
//
BufferPages = EFI_SIZE_TO_PAGES (SIZE_32KB + TileSize * (mMaxNumberOfCpus - 1));
if ((FamilyId == 4) || (FamilyId == 5)) {
- Buffer = AllocateAlignedPages (BufferPages, SIZE_32KB);
+ Buffer = AllocateAlignedCodePages (BufferPages, SIZE_32KB);
} else {
- Buffer = AllocateAlignedPages (BufferPages, SIZE_4KB);
+ Buffer = AllocateAlignedCodePages (BufferPages, SIZE_4KB);
}
ASSERT (Buffer != NULL);
DEBUG ((EFI_D_INFO, "SMRAM SaveState Buffer (0x%08x, 0x%08x)\n", Buffer, EFI_PAGES_TO_SIZE(BufferPages)));
@@ -1137,6 +1150,17 @@ ConfigSmmCodeAccessCheck (
}
/**
+ Set code region to be read only and data region to be execute disable.
+**/
+VOID
+SetRegionAttributes (
+ VOID
+ )
+{
+ SetMemMapAttributes ();
+}
+
+/**
This API provides a way to allocate memory for page table.
This API can be called more once to allocate memory for page tables.
@@ -1166,6 +1190,109 @@ AllocatePageTableMemory (
}
/**
+ Allocate pages for code.
+
+ @param[in] Pages Number of pages to be allocated.
+
+ @return Allocated memory.
+**/
+VOID *
+AllocateCodePages (
+ IN UINTN Pages
+ )
+{
+ EFI_STATUS Status;
+ EFI_PHYSICAL_ADDRESS Memory;
+
+ if (Pages == 0) {
+ return NULL;
+ }
+
+ Status = gSmst->SmmAllocatePages (AllocateAnyPages, EfiRuntimeServicesCode, Pages, &Memory);
+ if (EFI_ERROR (Status)) {
+ return NULL;
+ }
+ return (VOID *) (UINTN) Memory;
+}
+
+/**
+ Allocate aligned pages for code.
+
+ @param[in] Pages Number of pages to be allocated.
+ @param[in] Alignment The requested alignment of the allocation.
+ Must be a power of two.
+ If Alignment is zero, then byte alignment is used.
+
+ @return Allocated memory.
+**/
+VOID *
+AllocateAlignedCodePages (
+ IN UINTN Pages,
+ IN UINTN Alignment
+ )
+{
+ EFI_STATUS Status;
+ EFI_PHYSICAL_ADDRESS Memory;
+ UINTN AlignedMemory;
+ UINTN AlignmentMask;
+ UINTN UnalignedPages;
+ UINTN RealPages;
+
+ //
+ // Alignment must be a power of two or zero.
+ //
+ ASSERT ((Alignment & (Alignment - 1)) == 0);
+
+ if (Pages == 0) {
+ return NULL;
+ }
+ if (Alignment > EFI_PAGE_SIZE) {
+ //
+ // Calculate the total number of pages since alignment is larger than page size.
+ //
+ AlignmentMask = Alignment - 1;
+ RealPages = Pages + EFI_SIZE_TO_PAGES (Alignment);
+ //
+ // Make sure that Pages plus EFI_SIZE_TO_PAGES (Alignment) does not overflow.
+ //
+ ASSERT (RealPages > Pages);
+
+ Status = gSmst->SmmAllocatePages (AllocateAnyPages, EfiRuntimeServicesCode, RealPages, &Memory);
+ if (EFI_ERROR (Status)) {
+ return NULL;
+ }
+ AlignedMemory = ((UINTN) Memory + AlignmentMask) & ~AlignmentMask;
+ UnalignedPages = EFI_SIZE_TO_PAGES (AlignedMemory - (UINTN) Memory);
+ if (UnalignedPages > 0) {
+ //
+ // Free first unaligned page(s).
+ //
+ Status = gSmst->SmmFreePages (Memory, UnalignedPages);
+ ASSERT_EFI_ERROR (Status);
+ }
+ Memory = (EFI_PHYSICAL_ADDRESS) (AlignedMemory + EFI_PAGES_TO_SIZE (Pages));
+ UnalignedPages = RealPages - Pages - UnalignedPages;
+ if (UnalignedPages > 0) {
+ //
+ // Free last unaligned page(s).
+ //
+ Status = gSmst->SmmFreePages (Memory, UnalignedPages);
+ ASSERT_EFI_ERROR (Status);
+ }
+ } else {
+ //
+ // Do not over-allocate pages in this case.
+ //
+ Status = gSmst->SmmAllocatePages (AllocateAnyPages, EfiRuntimeServicesCode, Pages, &Memory);
+ if (EFI_ERROR (Status)) {
+ return NULL;
+ }
+ AlignedMemory = (UINTN) Memory;
+ }
+ return (VOID *) AlignedMemory;
+}
+
+/**
Perform the remaining tasks.
**/
@@ -1185,6 +1312,17 @@ PerformRemainingTasks (
// Create a mix of 2MB and 4KB page table. Update some memory ranges absent and execute-disable.
//
InitPaging ();
+
+ //
+ // Mark critical region to be read-only in page table
+ //
+ SetRegionAttributes ();
+
+ //
+ // Set page table itself to be read-only
+ //
+ SetPageTableAttributes ();
+
//
// Configure SMM Code Access Check feature if available.
//
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 9b119c8..6a1582b 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -25,6 +25,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
#include <Protocol/SmmCpuService.h>
#include <Guid/AcpiS3Context.h>
+#include <Guid/PiSmmMemoryAttributesTable.h>
#include <Library/BaseLib.h>
#include <Library/IoLib.h>
@@ -83,13 +84,38 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
#define IA32_PG_PMNT BIT62
#define IA32_PG_NX BIT63
-#define PAGE_ATTRIBUTE_BITS (IA32_PG_RW | IA32_PG_P)
+#define PAGE_ATTRIBUTE_BITS (IA32_PG_D | IA32_PG_A | IA32_PG_U | IA32_PG_RW | IA32_PG_P)
//
// Bits 1, 2, 5, 6 are reserved in the IA32 PAE PDPTE
// X64 PAE PDPTE does not have such restriction
//
#define IA32_PAE_PDPTE_ATTRIBUTE_BITS (IA32_PG_P)
+#define PAGE_PROGATE_BITS (IA32_PG_NX | PAGE_ATTRIBUTE_BITS)
+
+#define PAGING_4K_MASK 0xFFF
+#define PAGING_2M_MASK 0x1FFFFF
+#define PAGING_1G_MASK 0x3FFFFFFF
+
+#define PAGING_PAE_INDEX_MASK 0x1FF
+
+#define PAGING_4K_ADDRESS_MASK_64 0x000FFFFFFFFFF000ull
+#define PAGING_2M_ADDRESS_MASK_64 0x000FFFFFFFE00000ull
+#define PAGING_1G_ADDRESS_MASK_64 0x000FFFFFC0000000ull
+
+typedef enum {
+ PageNone,
+ Page4K,
+ Page2M,
+ Page1G,
+} PAGE_ATTRIBUTE;
+
+typedef struct {
+ PAGE_ATTRIBUTE Attribute;
+ UINT64 Length;
+ UINT64 AddressMask;
+} PAGE_ATTRIBUTE_TABLE;
+
//
// Size of Task-State Segment defined in IA32 Manual
//
@@ -98,6 +124,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
#define TSS_IA32_CR3_OFFSET 28
#define TSS_IA32_ESP_OFFSET 56
+#define CR0_WP BIT16
+
//
// Code select value
//
@@ -395,6 +423,8 @@ typedef struct {
} SMM_CPU_SEMAPHORES;
extern IA32_DESCRIPTOR gcSmiGdtr;
+extern EFI_PHYSICAL_ADDRESS mGdtBuffer;
+extern UINTN mGdtBufferSize;
extern IA32_DESCRIPTOR gcSmiIdtr;
extern VOID *gcSmiIdtrPtr;
extern CONST PROCESSOR_SMM_DESCRIPTOR gcPsd;
@@ -414,14 +444,12 @@ extern SPIN_LOCK *mMemoryMappedLock;
/**
Create 4G PageTable in SMRAM.
- @param ExtraPages Additional page numbers besides for 4G memory
- @param Is32BitPageTable Whether the page table is 32-bit PAE
+ @param[in] Is32BitPageTable Whether the page table is 32-bit PAE
@return PageTable Address
**/
UINT32
Gen4GPageTable (
- IN UINTN ExtraPages,
IN BOOLEAN Is32BitPageTable
);
@@ -482,7 +510,7 @@ InitializeIDTSmmStackGuard (
/**
Initialize Gdt for all processors.
-
+
@param[in] Cr3 CR3 value.
@param[out] GdtStepSize The step size for GDT table.
@@ -761,6 +789,96 @@ DumpModuleInfoByIp (
);
/**
+ This function sets memory attribute according to MemoryAttributesTable.
+**/
+VOID
+SetMemMapAttributes (
+ VOID
+ );
+
+/**
+ This function sets memory attribute for page table.
+**/
+VOID
+SetPageTableAttributes (
+ VOID
+ );
+
+/**
+ Return page table base.
+
+ @return page table base.
+**/
+UINTN
+GetPageTableBase (
+ VOID
+ );
+
+/**
+ This function sets the attributes for the memory region specified by BaseAddress and
+ Length from their current attributes to the attributes specified by Attributes.
+
+ @param[in] BaseAddress The physical address that is the start address of a memory region.
+ @param[in] Length The size in bytes of the memory region.
+ @param[in] Attributes The bit mask of attributes to set for the memory region.
+ @param[out] IsSplitted TRUE means page table splitted. FALSE means page table not splitted.
+
+ @retval EFI_SUCCESS The attributes were set for the memory region.
+ @retval EFI_ACCESS_DENIED The attributes for the memory resource range specified by
+ BaseAddress and Length cannot be modified.
+ @retval EFI_INVALID_PARAMETER Length is zero.
+ Attributes specified an illegal combination of attributes that
+ cannot be set together.
+ @retval EFI_OUT_OF_RESOURCES There are not enough system resources to modify the attributes of
+ the memory resource range.
+ @retval EFI_UNSUPPORTED The processor does not support one or more bytes of the memory
+ resource range specified by BaseAddress and Length.
+ The bit mask of attributes is not support for the memory resource
+ range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmSetMemoryAttributesEx (
+ IN EFI_PHYSICAL_ADDRESS BaseAddress,
+ IN UINT64 Length,
+ IN UINT64 Attributes,
+ OUT BOOLEAN *IsSplitted OPTIONAL
+ );
+
+/**
+ This function clears the attributes for the memory region specified by BaseAddress and
+ Length from their current attributes to the attributes specified by Attributes.
+
+ @param[in] BaseAddress The physical address that is the start address of a memory region.
+ @param[in] Length The size in bytes of the memory region.
+ @param[in] Attributes The bit mask of attributes to clear for the memory region.
+ @param[out] IsSplitted TRUE means page table splitted. FALSE means page table not splitted.
+
+ @retval EFI_SUCCESS The attributes were cleared for the memory region.
+ @retval EFI_ACCESS_DENIED The attributes for the memory resource range specified by
+ BaseAddress and Length cannot be modified.
+ @retval EFI_INVALID_PARAMETER Length is zero.
+ Attributes specified an illegal combination of attributes that
+ cannot be set together.
+ @retval EFI_OUT_OF_RESOURCES There are not enough system resources to modify the attributes of
+ the memory resource range.
+ @retval EFI_UNSUPPORTED The processor does not support one or more bytes of the memory
+ resource range specified by BaseAddress and Length.
+ The bit mask of attributes is not support for the memory resource
+ range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmClearMemoryAttributesEx (
+ IN EFI_PHYSICAL_ADDRESS BaseAddress,
+ IN UINT64 Length,
+ IN UINT64 Attributes,
+ OUT BOOLEAN *IsSplitted OPTIONAL
+ );
+
+/**
This API provides a way to allocate memory for page table.
This API can be called more once to allocate memory for page tables.
@@ -780,6 +898,34 @@ AllocatePageTableMemory (
IN UINTN Pages
);
+/**
+ Allocate pages for code.
+
+ @param[in] Pages Number of pages to be allocated.
+
+ @return Allocated memory.
+**/
+VOID *
+AllocateCodePages (
+ IN UINTN Pages
+ );
+
+/**
+ Allocate aligned pages for code.
+
+ @param[in] Pages Number of pages to be allocated.
+ @param[in] Alignment The requested alignment of the allocation.
+ Must be a power of two.
+ If Alignment is zero, then byte alignment is used.
+
+ @return Allocated memory.
+**/
+VOID *
+AllocateAlignedCodePages (
+ IN UINTN Pages,
+ IN UINTN Alignment
+ );
+
//
// S3 related global variable and function prototype.
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf
index 5d598d6..d409edf 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf
@@ -4,7 +4,7 @@
# This SMM driver performs SMM initialization, deploy SMM Entry Vector,
# provides CPU specific services in SMM.
#
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
#
# This program and the accompanying materials
# are licensed and made available under the terms and conditions of the BSD License
@@ -44,6 +44,7 @@
SmmProfile.h
SmmProfileInternal.h
SmramSaveState.c
+ SmmCpuMemoryManagement.c
[Sources.Ia32]
Ia32/Semaphore.c
@@ -133,6 +134,7 @@
gEfiGlobalVariableGuid ## SOMETIMES_PRODUCES ## Variable:L"SmmProfileData"
gEfiAcpi20TableGuid ## SOMETIMES_CONSUMES ## SystemTable
gEfiAcpi10TableGuid ## SOMETIMES_CONSUMES ## SystemTable
+ gEdkiiPiSmmMemoryAttributesTableGuid ## CONSUMES ## SystemTable
[FeaturePcd]
gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmDebug ## CONSUMES
@@ -153,6 +155,7 @@
gUefiCpuPkgTokenSpaceGuid.PcdCpuHotPlugDataAddress ## SOMETIMES_PRODUCES
gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmCodeAccessCheckEnable ## CONSUMES
gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode ## CONSUMES
+ gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStaticPageTable ## CONSUMES
gEfiMdeModulePkgTokenSpaceGuid.PcdAcpiS3Enable ## CONSUMES
[Depex]
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
new file mode 100644
index 0000000..4c1f900
--- /dev/null
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
@@ -0,0 +1,871 @@
+/** @file
+
+Copyright (c) 2016, Intel Corporation. All rights reserved.<BR>
+This program and the accompanying materials
+are licensed and made available under the terms and conditions of the BSD License
+which accompanies this distribution. The full text of the license may be found at
+http://opensource.org/licenses/bsd-license.php
+
+THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+
+**/
+
+#include "PiSmmCpuDxeSmm.h"
+
+#define NEXT_MEMORY_DESCRIPTOR(MemoryDescriptor, Size) \
+ ((EFI_MEMORY_DESCRIPTOR *)((UINT8 *)(MemoryDescriptor) + (Size)))
+
+PAGE_ATTRIBUTE_TABLE mPageAttributeTable[] = {
+ {Page4K, SIZE_4KB, PAGING_4K_ADDRESS_MASK_64},
+ {Page2M, SIZE_2MB, PAGING_2M_ADDRESS_MASK_64},
+ {Page1G, SIZE_1GB, PAGING_1G_ADDRESS_MASK_64},
+};
+
+/**
+ Return page table base.
+
+ @return page table base.
+**/
+UINTN
+GetPageTableBase (
+ VOID
+ )
+{
+ return (AsmReadCr3 () & PAGING_4K_ADDRESS_MASK_64);
+}
+
+/**
+ Return length according to page attributes.
+
+ @param[in] PageAttributes The page attribute of the page entry.
+
+ @return The length of page entry.
+**/
+UINTN
+PageAttributeToLength (
+ IN PAGE_ATTRIBUTE PageAttribute
+ )
+{
+ UINTN Index;
+ for (Index = 0; Index < sizeof(mPageAttributeTable)/sizeof(mPageAttributeTable[0]); Index++) {
+ if (PageAttribute == mPageAttributeTable[Index].Attribute) {
+ return (UINTN)mPageAttributeTable[Index].Length;
+ }
+ }
+ return 0;
+}
+
+/**
+ Return address mask according to page attributes.
+
+ @param[in] PageAttributes The page attribute of the page entry.
+
+ @return The address mask of page entry.
+**/
+UINTN
+PageAttributeToMask (
+ IN PAGE_ATTRIBUTE PageAttribute
+ )
+{
+ UINTN Index;
+ for (Index = 0; Index < sizeof(mPageAttributeTable)/sizeof(mPageAttributeTable[0]); Index++) {
+ if (PageAttribute == mPageAttributeTable[Index].Attribute) {
+ return (UINTN)mPageAttributeTable[Index].AddressMask;
+ }
+ }
+ return 0;
+}
+
+/**
+ Return page table entry to match the address.
+
+ @param[in] Address The address to be checked.
+ @param[out] PageAttributes The page attribute of the page entry.
+
+ @return The page entry.
+**/
+VOID *
+GetPageTableEntry (
+ IN PHYSICAL_ADDRESS Address,
+ OUT PAGE_ATTRIBUTE *PageAttribute
+ )
+{
+ UINTN Index1;
+ UINTN Index2;
+ UINTN Index3;
+ UINTN Index4;
+ UINT64 *L1PageTable;
+ UINT64 *L2PageTable;
+ UINT64 *L3PageTable;
+ UINT64 *L4PageTable;
+
+ Index4 = ((UINTN)RShiftU64 (Address, 39)) & PAGING_PAE_INDEX_MASK;
+ Index3 = ((UINTN)Address >> 30) & PAGING_PAE_INDEX_MASK;
+ Index2 = ((UINTN)Address >> 21) & PAGING_PAE_INDEX_MASK;
+ Index1 = ((UINTN)Address >> 12) & PAGING_PAE_INDEX_MASK;
+
+ if (sizeof(UINTN) == sizeof(UINT64)) {
+ L4PageTable = (UINT64 *)GetPageTableBase ();
+ if (L4PageTable[Index4] == 0) {
+ *PageAttribute = PageNone;
+ return NULL;
+ }
+
+ L3PageTable = (UINT64 *)(UINTN)(L4PageTable[Index4] & PAGING_4K_ADDRESS_MASK_64);
+ } else {
+ L3PageTable = (UINT64 *)GetPageTableBase ();
+ }
+ if (L3PageTable[Index3] == 0) {
+ *PageAttribute = PageNone;
+ return NULL;
+ }
+ if ((L3PageTable[Index3] & IA32_PG_PS) != 0) {
+ // 1G
+ *PageAttribute = Page1G;
+ return &L3PageTable[Index3];
+ }
+
+ L2PageTable = (UINT64 *)(UINTN)(L3PageTable[Index3] & PAGING_4K_ADDRESS_MASK_64);
+ if (L2PageTable[Index2] == 0) {
+ *PageAttribute = PageNone;
+ return NULL;
+ }
+ if ((L2PageTable[Index2] & IA32_PG_PS) != 0) {
+ // 2M
+ *PageAttribute = Page2M;
+ return &L2PageTable[Index2];
+ }
+
+ // 4k
+ L1PageTable = (UINT64 *)(UINTN)(L2PageTable[Index2] & PAGING_4K_ADDRESS_MASK_64);
+ if ((L1PageTable[Index1] == 0) && (Address != 0)) {
+ *PageAttribute = PageNone;
+ return NULL;
+ }
+ *PageAttribute = Page4K;
+ return &L1PageTable[Index1];
+}
+
+/**
+ Return memory attributes of page entry.
+
+ @param[in] PageEntry The page entry.
+
+ @return Memory attributes of page entry.
+**/
+UINT64
+GetAttributesFromPageEntry (
+ IN UINT64 *PageEntry
+ )
+{
+ UINT64 Attributes;
+ Attributes = 0;
+ if ((*PageEntry & IA32_PG_P) == 0) {
+ Attributes |= EFI_MEMORY_RP;
+ }
+ if ((*PageEntry & IA32_PG_RW) == 0) {
+ Attributes |= EFI_MEMORY_RO;
+ }
+ if ((*PageEntry & IA32_PG_NX) != 0) {
+ Attributes |= EFI_MEMORY_XP;
+ }
+ return Attributes;
+}
+
+/**
+ Modify memory attributes of page entry.
+
+ @param[in] PageEntry The page entry.
+ @param[in] Attributes The bit mask of attributes to modify for the memory region.
+ @param[in] IsSet TRUE means to set attributes. FALSE means to clear attributes.
+ @param[out] IsModified TRUE means page table modified. FALSE means page table not modified.
+**/
+VOID
+ConvertPageEntryAttribute (
+ IN UINT64 *PageEntry,
+ IN UINT64 Attributes,
+ IN BOOLEAN IsSet,
+ OUT BOOLEAN *IsModified
+ )
+{
+ UINT64 CurrentPageEntry;
+ UINT64 NewPageEntry;
+
+ CurrentPageEntry = *PageEntry;
+ NewPageEntry = CurrentPageEntry;
+ if ((Attributes & EFI_MEMORY_RP) != 0) {
+ if (IsSet) {
+ NewPageEntry &= ~(UINT64)IA32_PG_P;
+ } else {
+ NewPageEntry |= IA32_PG_P;
+ }
+ }
+ if ((Attributes & EFI_MEMORY_RO) != 0) {
+ if (IsSet) {
+ NewPageEntry &= ~(UINT64)IA32_PG_RW;
+ } else {
+ NewPageEntry |= IA32_PG_RW;
+ }
+ }
+ if ((Attributes & EFI_MEMORY_XP) != 0) {
+ if (IsSet) {
+ NewPageEntry |= IA32_PG_NX;
+ } else {
+ NewPageEntry &= ~IA32_PG_NX;
+ }
+ }
+ *PageEntry = NewPageEntry;
+ if (CurrentPageEntry != NewPageEntry) {
+ *IsModified = TRUE;
+ DEBUG ((DEBUG_INFO, "ConvertPageEntryAttribute 0x%lx", CurrentPageEntry));
+ DEBUG ((DEBUG_INFO, "->0x%lx\n", NewPageEntry));
+ } else {
+ *IsModified = FALSE;
+ }
+}
+
+/**
+ This function returns if there is need to split page entry.
+
+ @param[in] BaseAddress The base address to be checked.
+ @param[in] Length The length to be checked.
+ @param[in] PageEntry The page entry to be checked.
+ @param[in] PageAttribute The page attribute of the page entry.
+
+ @retval SplitAttributes on if there is need to split page entry.
+**/
+PAGE_ATTRIBUTE
+NeedSplitPage (
+ IN PHYSICAL_ADDRESS BaseAddress,
+ IN UINT64 Length,
+ IN UINT64 *PageEntry,
+ IN PAGE_ATTRIBUTE PageAttribute
+ )
+{
+ UINT64 PageEntryLength;
+
+ PageEntryLength = PageAttributeToLength (PageAttribute);
+
+ if (((BaseAddress & (PageEntryLength - 1)) == 0) && (Length >= PageEntryLength)) {
+ return PageNone;
+ }
+
+ if (((BaseAddress & PAGING_2M_MASK) != 0) || (Length < SIZE_2MB)) {
+ return Page4K;
+ }
+
+ return Page2M;
+}
+
+/**
+ This function splits one page entry to small page entries.
+
+ @param[in] PageEntry The page entry to be splitted.
+ @param[in] PageAttribute The page attribute of the page entry.
+ @param[in] SplitAttribute How to split the page entry.
+
+ @retval RETURN_SUCCESS The page entry is splitted.
+ @retval RETURN_UNSUPPORTED The page entry does not support to be splitted.
+ @retval RETURN_OUT_OF_RESOURCES No resource to split page entry.
+**/
+RETURN_STATUS
+SplitPage (
+ IN UINT64 *PageEntry,
+ IN PAGE_ATTRIBUTE PageAttribute,
+ IN PAGE_ATTRIBUTE SplitAttribute
+ )
+{
+ UINT64 BaseAddress;
+ UINT64 *NewPageEntry;
+ UINTN Index;
+
+ ASSERT (PageAttribute == Page2M || PageAttribute == Page1G);
+
+ if (PageAttribute == Page2M) {
+ //
+ // Split 2M to 4K
+ //
+ ASSERT (SplitAttribute == Page4K);
+ if (SplitAttribute == Page4K) {
+ NewPageEntry = AllocatePageTableMemory (1);
+ DEBUG ((DEBUG_INFO, "Split - 0x%x\n", NewPageEntry));
+ if (NewPageEntry == NULL) {
+ return RETURN_OUT_OF_RESOURCES;
+ }
+ BaseAddress = *PageEntry & PAGING_2M_ADDRESS_MASK_64;
+ for (Index = 0; Index < SIZE_4KB / sizeof(UINT64); Index++) {
+ NewPageEntry[Index] = BaseAddress + SIZE_4KB * Index + ((*PageEntry) & PAGE_PROGATE_BITS);
+ }
+ (*PageEntry) = (UINT64)(UINTN)NewPageEntry + ((*PageEntry) & PAGE_PROGATE_BITS);
+ return RETURN_SUCCESS;
+ } else {
+ return RETURN_UNSUPPORTED;
+ }
+ } else if (PageAttribute == Page1G) {
+ //
+ // Split 1G to 2M
+ // No need support 1G->4K directly, we should use 1G->2M, then 2M->4K to get more compact page table.
+ //
+ ASSERT (SplitAttribute == Page2M || SplitAttribute == Page4K);
+ if ((SplitAttribute == Page2M || SplitAttribute == Page4K)) {
+ NewPageEntry = AllocatePageTableMemory (1);
+ DEBUG ((DEBUG_INFO, "Split - 0x%x\n", NewPageEntry));
+ if (NewPageEntry == NULL) {
+ return RETURN_OUT_OF_RESOURCES;
+ }
+ BaseAddress = *PageEntry & PAGING_1G_ADDRESS_MASK_64;
+ for (Index = 0; Index < SIZE_4KB / sizeof(UINT64); Index++) {
+ NewPageEntry[Index] = BaseAddress + SIZE_2MB * Index + IA32_PG_PS + ((*PageEntry) & PAGE_PROGATE_BITS);
+ }
+ (*PageEntry) = (UINT64)(UINTN)NewPageEntry + ((*PageEntry) & PAGE_PROGATE_BITS);
+ return RETURN_SUCCESS;
+ } else {
+ return RETURN_UNSUPPORTED;
+ }
+ } else {
+ return RETURN_UNSUPPORTED;
+ }
+}
+
+/**
+ This function modifies the page attributes for the memory region specified by BaseAddress and
+ Length from their current attributes to the attributes specified by Attributes.
+
+ Caller should make sure BaseAddress and Length is at page boundary.
+
+ @param[in] BaseAddress The physical address that is the start address of a memory region.
+ @param[in] Length The size in bytes of the memory region.
+ @param[in] Attributes The bit mask of attributes to modify for the memory region.
+ @param[in] IsSet TRUE means to set attributes. FALSE means to clear attributes.
+ @param[out] IsSplitted TRUE means page table splitted. FALSE means page table not splitted.
+ @param[out] IsModified TRUE means page table modified. FALSE means page table not modified.
+
+ @retval RETURN_SUCCESS The attributes were modified for the memory region.
+ @retval RETURN_ACCESS_DENIED The attributes for the memory resource range specified by
+ BaseAddress and Length cannot be modified.
+ @retval RETURN_INVALID_PARAMETER Length is zero.
+ Attributes specified an illegal combination of attributes that
+ cannot be set together.
+ @retval RETURN_OUT_OF_RESOURCES There are not enough system resources to modify the attributes of
+ the memory resource range.
+ @retval RETURN_UNSUPPORTED The processor does not support one or more bytes of the memory
+ resource range specified by BaseAddress and Length.
+ The bit mask of attributes is not support for the memory resource
+ range specified by BaseAddress and Length.
+**/
+RETURN_STATUS
+EFIAPI
+ConvertMemoryPageAttributes (
+ IN PHYSICAL_ADDRESS BaseAddress,
+ IN UINT64 Length,
+ IN UINT64 Attributes,
+ IN BOOLEAN IsSet,
+ OUT BOOLEAN *IsSplitted, OPTIONAL
+ OUT BOOLEAN *IsModified OPTIONAL
+ )
+{
+ UINT64 *PageEntry;
+ PAGE_ATTRIBUTE PageAttribute;
+ UINTN PageEntryLength;
+ PAGE_ATTRIBUTE SplitAttribute;
+ RETURN_STATUS Status;
+ BOOLEAN IsEntryModified;
+
+ ASSERT (Attributes != 0);
+ ASSERT ((Attributes & ~(EFI_MEMORY_RP | EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0);
+
+ ASSERT ((BaseAddress & (SIZE_4KB - 1)) == 0);
+ ASSERT ((Length & (SIZE_4KB - 1)) == 0);
+
+ if (Length == 0) {
+ return RETURN_INVALID_PARAMETER;
+ }
+
+// DEBUG ((DEBUG_ERROR, "ConvertMemoryPageAttributes(%x) - %016lx, %016lx, %02lx\n", IsSet, BaseAddress, Length, Attributes));
+
+ if (IsSplitted != NULL) {
+ *IsSplitted = FALSE;
+ }
+ if (IsModified != NULL) {
+ *IsModified = FALSE;
+ }
+
+ //
+ // Below logic is to check 2M/4K page to make sure we donot waist memory.
+ //
+ while (Length != 0) {
+ PageEntry = GetPageTableEntry (BaseAddress, &PageAttribute);
+ if (PageEntry == NULL) {
+ return RETURN_UNSUPPORTED;
+ }
+ PageEntryLength = PageAttributeToLength (PageAttribute);
+ SplitAttribute = NeedSplitPage (BaseAddress, Length, PageEntry, PageAttribute);
+ if (SplitAttribute == PageNone) {
+ ConvertPageEntryAttribute (PageEntry, Attributes, IsSet, &IsEntryModified);
+ if (IsEntryModified) {
+ if (IsModified != NULL) {
+ *IsModified = TRUE;
+ }
+ }
+ //
+ // Convert success, move to next
+ //
+ BaseAddress += PageEntryLength;
+ Length -= PageEntryLength;
+ } else {
+ Status = SplitPage (PageEntry, PageAttribute, SplitAttribute);
+ if (RETURN_ERROR (Status)) {
+ return RETURN_UNSUPPORTED;
+ }
+ if (IsSplitted != NULL) {
+ *IsSplitted = TRUE;
+ }
+ if (IsModified != NULL) {
+ *IsModified = TRUE;
+ }
+ //
+ // Just split current page
+ // Convert success in next around
+ //
+ }
+ }
+
+ return RETURN_SUCCESS;
+}
+
+/**
+ FlushTlb on current processor.
+
+ @param[in,out] Buffer Pointer to private data buffer.
+**/
+VOID
+EFIAPI
+FlushTlbOnCurrentProcessor (
+ IN OUT VOID *Buffer
+ )
+{
+ CpuFlushTlb ();
+}
+
+/**
+ FlushTlb for all processors.
+**/
+VOID
+FlushTlbForAll (
+ VOID
+ )
+{
+ UINTN Index;
+
+ FlushTlbOnCurrentProcessor (NULL);
+
+ for (Index = 0; Index < gSmst->NumberOfCpus; Index++) {
+ if (Index != gSmst->CurrentlyExecutingCpu) {
+ // Force to start up AP in blocking mode,
+ SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
+ // Do not check return status, because AP might not be present in some corner cases.
+ }
+ }
+}
+
+/**
+ This function sets the attributes for the memory region specified by BaseAddress and
+ Length from their current attributes to the attributes specified by Attributes.
+
+ @param[in] BaseAddress The physical address that is the start address of a memory region.
+ @param[in] Length The size in bytes of the memory region.
+ @param[in] Attributes The bit mask of attributes to set for the memory region.
+ @param[out] IsSplitted TRUE means page table splitted. FALSE means page table not splitted.
+
+ @retval EFI_SUCCESS The attributes were set for the memory region.
+ @retval EFI_ACCESS_DENIED The attributes for the memory resource range specified by
+ BaseAddress and Length cannot be modified.
+ @retval EFI_INVALID_PARAMETER Length is zero.
+ Attributes specified an illegal combination of attributes that
+ cannot be set together.
+ @retval EFI_OUT_OF_RESOURCES There are not enough system resources to modify the attributes of
+ the memory resource range.
+ @retval EFI_UNSUPPORTED The processor does not support one or more bytes of the memory
+ resource range specified by BaseAddress and Length.
+ The bit mask of attributes is not support for the memory resource
+ range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmSetMemoryAttributesEx (
+ IN EFI_PHYSICAL_ADDRESS BaseAddress,
+ IN UINT64 Length,
+ IN UINT64 Attributes,
+ OUT BOOLEAN *IsSplitted OPTIONAL
+ )
+{
+ EFI_STATUS Status;
+ BOOLEAN IsModified;
+
+ Status = ConvertMemoryPageAttributes (BaseAddress, Length, Attributes, TRUE, IsSplitted, &IsModified);
+ if (!EFI_ERROR(Status)) {
+ if (IsModified) {
+ //
+ // Flush TLB as last step
+ //
+ FlushTlbForAll();
+ }
+ }
+
+ return Status;
+}
+
+/**
+ This function clears the attributes for the memory region specified by BaseAddress and
+ Length from their current attributes to the attributes specified by Attributes.
+
+ @param[in] BaseAddress The physical address that is the start address of a memory region.
+ @param[in] Length The size in bytes of the memory region.
+ @param[in] Attributes The bit mask of attributes to clear for the memory region.
+ @param[out] IsSplitted TRUE means page table splitted. FALSE means page table not splitted.
+
+ @retval EFI_SUCCESS The attributes were cleared for the memory region.
+ @retval EFI_ACCESS_DENIED The attributes for the memory resource range specified by
+ BaseAddress and Length cannot be modified.
+ @retval EFI_INVALID_PARAMETER Length is zero.
+ Attributes specified an illegal combination of attributes that
+ cannot be set together.
+ @retval EFI_OUT_OF_RESOURCES There are not enough system resources to modify the attributes of
+ the memory resource range.
+ @retval EFI_UNSUPPORTED The processor does not support one or more bytes of the memory
+ resource range specified by BaseAddress and Length.
+ The bit mask of attributes is not support for the memory resource
+ range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmClearMemoryAttributesEx (
+ IN EFI_PHYSICAL_ADDRESS BaseAddress,
+ IN UINT64 Length,
+ IN UINT64 Attributes,
+ OUT BOOLEAN *IsSplitted OPTIONAL
+ )
+{
+ EFI_STATUS Status;
+ BOOLEAN IsModified;
+
+ Status = ConvertMemoryPageAttributes (BaseAddress, Length, Attributes, FALSE, IsSplitted, &IsModified);
+ if (!EFI_ERROR(Status)) {
+ if (IsModified) {
+ //
+ // Flush TLB as last step
+ //
+ FlushTlbForAll();
+ }
+ }
+
+ return Status;
+}
+
+/**
+ This function sets the attributes for the memory region specified by BaseAddress and
+ Length from their current attributes to the attributes specified by Attributes.
+
+ @param[in] BaseAddress The physical address that is the start address of a memory region.
+ @param[in] Length The size in bytes of the memory region.
+ @param[in] Attributes The bit mask of attributes to set for the memory region.
+
+ @retval EFI_SUCCESS The attributes were set for the memory region.
+ @retval EFI_ACCESS_DENIED The attributes for the memory resource range specified by
+ BaseAddress and Length cannot be modified.
+ @retval EFI_INVALID_PARAMETER Length is zero.
+ Attributes specified an illegal combination of attributes that
+ cannot be set together.
+ @retval EFI_OUT_OF_RESOURCES There are not enough system resources to modify the attributes of
+ the memory resource range.
+ @retval EFI_UNSUPPORTED The processor does not support one or more bytes of the memory
+ resource range specified by BaseAddress and Length.
+ The bit mask of attributes is not support for the memory resource
+ range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmSetMemoryAttributes (
+ IN EFI_PHYSICAL_ADDRESS BaseAddress,
+ IN UINT64 Length,
+ IN UINT64 Attributes
+ )
+{
+ return SmmSetMemoryAttributesEx (BaseAddress, Length, Attributes, NULL);
+}
+
+/**
+ This function clears the attributes for the memory region specified by BaseAddress and
+ Length from their current attributes to the attributes specified by Attributes.
+
+ @param[in] BaseAddress The physical address that is the start address of a memory region.
+ @param[in] Length The size in bytes of the memory region.
+ @param[in] Attributes The bit mask of attributes to clear for the memory region.
+
+ @retval EFI_SUCCESS The attributes were cleared for the memory region.
+ @retval EFI_ACCESS_DENIED The attributes for the memory resource range specified by
+ BaseAddress and Length cannot be modified.
+ @retval EFI_INVALID_PARAMETER Length is zero.
+ Attributes specified an illegal combination of attributes that
+ cannot be set together.
+ @retval EFI_OUT_OF_RESOURCES There are not enough system resources to modify the attributes of
+ the memory resource range.
+ @retval EFI_UNSUPPORTED The processor does not support one or more bytes of the memory
+ resource range specified by BaseAddress and Length.
+ The bit mask of attributes is not support for the memory resource
+ range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmClearMemoryAttributes (
+ IN EFI_PHYSICAL_ADDRESS BaseAddress,
+ IN UINT64 Length,
+ IN UINT64 Attributes
+ )
+{
+ return SmmClearMemoryAttributesEx (BaseAddress, Length, Attributes, NULL);
+}
+
+
+
+/**
+ Retrieves a pointer to the system configuration table from the SMM System Table
+ based on a specified GUID.
+
+ @param[in] TableGuid The pointer to table's GUID type.
+ @param[out] Table The pointer to the table associated with TableGuid in the EFI System Table.
+
+ @retval EFI_SUCCESS A configuration table matching TableGuid was found.
+ @retval EFI_NOT_FOUND A configuration table matching TableGuid could not be found.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmGetSystemConfigurationTable (
+ IN EFI_GUID *TableGuid,
+ OUT VOID **Table
+ )
+{
+ UINTN Index;
+
+ ASSERT (TableGuid != NULL);
+ ASSERT (Table != NULL);
+
+ *Table = NULL;
+ for (Index = 0; Index < gSmst->NumberOfTableEntries; Index++) {
+ if (CompareGuid (TableGuid, &(gSmst->SmmConfigurationTable[Index].VendorGuid))) {
+ *Table = gSmst->SmmConfigurationTable[Index].VendorTable;
+ return EFI_SUCCESS;
+ }
+ }
+
+ return EFI_NOT_FOUND;
+}
+
+/**
+ This function sets SMM save state buffer to be RW and XP.
+**/
+VOID
+PatchSmmSaveStateMap (
+ VOID
+ )
+{
+ UINTN Index;
+ UINTN TileCodeSize;
+ UINTN TileDataSize;
+ UINTN TileSize;
+
+ TileCodeSize = GetSmiHandlerSize ();
+ TileCodeSize = ALIGN_VALUE(TileCodeSize, SIZE_4KB);
+ TileDataSize = sizeof (SMRAM_SAVE_STATE_MAP) + sizeof (PROCESSOR_SMM_DESCRIPTOR);
+ TileDataSize = ALIGN_VALUE(TileDataSize, SIZE_4KB);
+ TileSize = TileDataSize + TileCodeSize - 1;
+ TileSize = 2 * GetPowerOfTwo32 ((UINT32)TileSize);
+
+ DEBUG ((DEBUG_INFO, "PatchSmmSaveStateMap:\n"));
+ for (Index = 0; Index < mMaxNumberOfCpus - 1; Index++) {
+ //
+ // Code
+ //
+ SmmSetMemoryAttributes (
+ mCpuHotPlugData.SmBase[Index] + SMM_HANDLER_OFFSET,
+ TileCodeSize,
+ EFI_MEMORY_RO
+ );
+ SmmClearMemoryAttributes (
+ mCpuHotPlugData.SmBase[Index] + SMM_HANDLER_OFFSET,
+ TileCodeSize,
+ EFI_MEMORY_XP
+ );
+
+ //
+ // Data
+ //
+ SmmClearMemoryAttributes (
+ mCpuHotPlugData.SmBase[Index] + SMM_HANDLER_OFFSET + TileCodeSize,
+ TileSize - TileCodeSize,
+ EFI_MEMORY_RO
+ );
+ SmmSetMemoryAttributes (
+ mCpuHotPlugData.SmBase[Index] + SMM_HANDLER_OFFSET + TileCodeSize,
+ TileSize - TileCodeSize,
+ EFI_MEMORY_XP
+ );
+ }
+
+ //
+ // Code
+ //
+ SmmSetMemoryAttributes (
+ mCpuHotPlugData.SmBase[mMaxNumberOfCpus - 1] + SMM_HANDLER_OFFSET,
+ TileCodeSize,
+ EFI_MEMORY_RO
+ );
+ SmmClearMemoryAttributes (
+ mCpuHotPlugData.SmBase[mMaxNumberOfCpus - 1] + SMM_HANDLER_OFFSET,
+ TileCodeSize,
+ EFI_MEMORY_XP
+ );
+
+ //
+ // Data
+ //
+ SmmClearMemoryAttributes (
+ mCpuHotPlugData.SmBase[mMaxNumberOfCpus - 1] + SMM_HANDLER_OFFSET + TileCodeSize,
+ SIZE_32KB - TileCodeSize,
+ EFI_MEMORY_RO
+ );
+ SmmSetMemoryAttributes (
+ mCpuHotPlugData.SmBase[mMaxNumberOfCpus - 1] + SMM_HANDLER_OFFSET + TileCodeSize,
+ SIZE_32KB - TileCodeSize,
+ EFI_MEMORY_XP
+ );
+}
+
+/**
+ This function sets GDT/IDT buffer to be RO and XP.
+**/
+VOID
+PatchGdtIdtMap (
+ VOID
+ )
+{
+ EFI_PHYSICAL_ADDRESS BaseAddress;
+ UINTN Size;
+
+ //
+ // GDT
+ //
+ DEBUG ((DEBUG_INFO, "PatchGdtIdtMap - GDT:\n"));
+
+ BaseAddress = mGdtBuffer;
+ Size = ALIGN_VALUE(mGdtBufferSize, SIZE_4KB);
+ SmmSetMemoryAttributes (
+ BaseAddress,
+ Size,
+ EFI_MEMORY_RO
+ );
+ SmmSetMemoryAttributes (
+ BaseAddress,
+ Size,
+ EFI_MEMORY_XP
+ );
+
+ //
+ // IDT
+ //
+ DEBUG ((DEBUG_INFO, "PatchGdtIdtMap - IDT:\n"));
+
+ BaseAddress = gcSmiIdtr.Base;
+ Size = ALIGN_VALUE(gcSmiIdtr.Limit + 1, SIZE_4KB);
+ SmmSetMemoryAttributes (
+ BaseAddress,
+ Size,
+ EFI_MEMORY_RO
+ );
+ SmmSetMemoryAttributes (
+ BaseAddress,
+ Size,
+ EFI_MEMORY_XP
+ );
+}
+
+/**
+ This function sets memory attribute according to MemoryAttributesTable.
+**/
+VOID
+SetMemMapAttributes (
+ VOID
+ )
+{
+ EFI_MEMORY_DESCRIPTOR *MemoryMap;
+ EFI_MEMORY_DESCRIPTOR *MemoryMapStart;
+ UINTN MemoryMapEntryCount;
+ UINTN DescriptorSize;
+ UINTN Index;
+ EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE *MemoryAttributesTable;
+
+ SmmGetSystemConfigurationTable (&gEdkiiPiSmmMemoryAttributesTableGuid, (VOID **)&MemoryAttributesTable);
+ if (MemoryAttributesTable == NULL) {
+ DEBUG ((DEBUG_INFO, "MemoryAttributesTable - NULL\n"));
+ return ;
+ }
+
+ DEBUG ((DEBUG_INFO, "MemoryAttributesTable:\n"));
+ DEBUG ((DEBUG_INFO, " Version - 0x%08x\n", MemoryAttributesTable->Version));
+ DEBUG ((DEBUG_INFO, " NumberOfEntries - 0x%08x\n", MemoryAttributesTable->NumberOfEntries));
+ DEBUG ((DEBUG_INFO, " DescriptorSize - 0x%08x\n", MemoryAttributesTable->DescriptorSize));
+
+ MemoryMapEntryCount = MemoryAttributesTable->NumberOfEntries;
+ DescriptorSize = MemoryAttributesTable->DescriptorSize;
+ MemoryMapStart = (EFI_MEMORY_DESCRIPTOR *)(MemoryAttributesTable + 1);
+ MemoryMap = MemoryMapStart;
+ for (Index = 0; Index < MemoryMapEntryCount; Index++) {
+ DEBUG ((DEBUG_INFO, "Entry (0x%x)\n", MemoryMap));
+ DEBUG ((DEBUG_INFO, " Type - 0x%x\n", MemoryMap->Type));
+ DEBUG ((DEBUG_INFO, " PhysicalStart - 0x%016lx\n", MemoryMap->PhysicalStart));
+ DEBUG ((DEBUG_INFO, " VirtualStart - 0x%016lx\n", MemoryMap->VirtualStart));
+ DEBUG ((DEBUG_INFO, " NumberOfPages - 0x%016lx\n", MemoryMap->NumberOfPages));
+ DEBUG ((DEBUG_INFO, " Attribute - 0x%016lx\n", MemoryMap->Attribute));
+ MemoryMap = NEXT_MEMORY_DESCRIPTOR(MemoryMap, DescriptorSize);
+ }
+
+ MemoryMap = MemoryMapStart;
+ for (Index = 0; Index < MemoryMapEntryCount; Index++) {
+ DEBUG ((DEBUG_INFO, "SetAttribute: Memory Entry - 0x%lx, 0x%x\n", MemoryMap->PhysicalStart, MemoryMap->NumberOfPages));
+ switch (MemoryMap->Type) {
+ case EfiRuntimeServicesCode:
+ SmmSetMemoryAttributes (
+ MemoryMap->PhysicalStart,
+ EFI_PAGES_TO_SIZE((UINTN)MemoryMap->NumberOfPages),
+ EFI_MEMORY_RO
+ );
+ break;
+ case EfiRuntimeServicesData:
+ SmmSetMemoryAttributes (
+ MemoryMap->PhysicalStart,
+ EFI_PAGES_TO_SIZE((UINTN)MemoryMap->NumberOfPages),
+ EFI_MEMORY_XP
+ );
+ break;
+ default:
+ SmmSetMemoryAttributes (
+ MemoryMap->PhysicalStart,
+ EFI_PAGES_TO_SIZE((UINTN)MemoryMap->NumberOfPages),
+ EFI_MEMORY_XP
+ );
+ break;
+ }
+ MemoryMap = NEXT_MEMORY_DESCRIPTOR(MemoryMap, DescriptorSize);
+ }
+
+ PatchSmmSaveStateMap ();
+ PatchGdtIdtMap ();
+
+ return ;
+}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c
index 329574e..4b7fad2 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c
@@ -30,11 +30,6 @@ UINTN mSmmProfileSize;
UINTN mMsrDsAreaSize = SMM_PROFILE_DTS_SIZE;
//
-// The flag indicates if execute-disable is supported by processor.
-//
-BOOLEAN mXdSupported = TRUE;
-
-//
// The flag indicates if execute-disable is enabled on processor.
//
BOOLEAN mXdEnabled = FALSE;
@@ -529,6 +524,12 @@ InitPaging (
//
continue;
}
+ if ((*Pde & IA32_PG_PS) != 0) {
+ //
+ // This is 1G entry, skip it
+ //
+ continue;
+ }
Pte = (UINT64 *)(UINTN)(*Pde & PHYSICAL_ADDRESS_MASK);
if (Pte == 0) {
continue;
@@ -587,6 +588,15 @@ InitPaging (
//
continue;
}
+ if ((*Pde & IA32_PG_PS) != 0) {
+ //
+ // This is 1G entry, set NX bit and skip it
+ //
+ if (mXdSupported) {
+ *Pde = *Pde | IA32_PG_NX;
+ }
+ continue;
+ }
Pte = (UINT64 *)(UINTN)(*Pde & PHYSICAL_ADDRESS_MASK);
if (Pte == 0) {
continue;
@@ -976,25 +986,6 @@ CheckFeatureSupported (
}
/**
- Enable XD feature.
-
-**/
-VOID
-ActivateXd (
- VOID
- )
-{
- UINT64 MsrRegisters;
-
- MsrRegisters = AsmReadMsr64 (MSR_EFER);
- if ((MsrRegisters & MSR_EFER_XD) != 0) {
- return ;
- }
- MsrRegisters |= MSR_EFER_XD;
- AsmWriteMsr64 (MSR_EFER, MsrRegisters);
-}
-
-/**
Enable single step.
**/
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h
index 13ff675..b6fb5cf 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h
@@ -97,15 +97,6 @@ CheckFeatureSupported (
);
/**
- Enable XD feature.
-
-**/
-VOID
-ActivateXd (
- VOID
- );
-
-/**
Update page table according to protected memory ranges and the 4KB-page mapped memory ranges.
**/
@@ -114,7 +105,13 @@ InitPaging (
VOID
);
+//
+// The flag indicates if execute-disable is supported by processor.
+//
extern BOOLEAN mXdSupported;
+//
+// The flag indicates if execute-disable is enabled on processor.
+//
extern BOOLEAN mXdEnabled;
#endif // _SMM_PROFILE_H_
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c
index 9cee784..b3e50a4 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c
@@ -18,6 +18,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
#define ACC_MAX_BIT BIT3
LIST_ENTRY mPagePool = INITIALIZE_LIST_HEAD_VARIABLE (mPagePool);
BOOLEAN m1GPageTableSupport = FALSE;
+UINT8 mPhysicalAddressBits;
+BOOLEAN mCpuSmmStaticPageTable;
/**
Check if 1-GByte pages is supported by processor or not.
@@ -86,6 +88,146 @@ GetSubEntriesNum (
}
/**
+ Calculate the maximum support address.
+
+ @return the maximum support address.
+**/
+UINT8
+CalculateMaximumSupportAddress (
+ VOID
+ )
+{
+ UINT32 RegEax;
+ UINT8 PhysicalAddressBits;
+ VOID *Hob;
+
+ //
+ // Get physical address bits supported.
+ //
+ Hob = GetFirstHob (EFI_HOB_TYPE_CPU);
+ if (Hob != NULL) {
+ PhysicalAddressBits = ((EFI_HOB_CPU *) Hob)->SizeOfMemorySpace;
+ } else {
+ AsmCpuid (0x80000000, &RegEax, NULL, NULL, NULL);
+ if (RegEax >= 0x80000008) {
+ AsmCpuid (0x80000008, &RegEax, NULL, NULL, NULL);
+ PhysicalAddressBits = (UINT8) RegEax;
+ } else {
+ PhysicalAddressBits = 36;
+ }
+ }
+
+ //
+ // IA-32e paging translates 48-bit linear addresses to 52-bit physical addresses.
+ //
+ ASSERT (PhysicalAddressBits <= 52);
+ if (PhysicalAddressBits > 48) {
+ PhysicalAddressBits = 48;
+ }
+ return PhysicalAddressBits;
+}
+
+/**
+ Set static page table.
+
+ @param[in] PageTable Address of page table.
+**/
+VOID
+SetStaticPageTable (
+ IN UINTN PageTable
+ )
+{
+ UINT64 PageAddress;
+ UINTN NumberOfPml4EntriesNeeded;
+ UINTN NumberOfPdpEntriesNeeded;
+ UINTN IndexOfPml4Entries;
+ UINTN IndexOfPdpEntries;
+ UINTN IndexOfPageDirectoryEntries;
+ UINT64 *PageMapLevel4Entry;
+ UINT64 *PageMap;
+ UINT64 *PageDirectoryPointerEntry;
+ UINT64 *PageDirectory1GEntry;
+ UINT64 *PageDirectoryEntry;
+
+ if (mPhysicalAddressBits <= 39 ) {
+ NumberOfPml4EntriesNeeded = 1;
+ NumberOfPdpEntriesNeeded = (UINT32)LShiftU64 (1, (mPhysicalAddressBits - 30));
+ } else {
+ NumberOfPml4EntriesNeeded = (UINT32)LShiftU64 (1, (mPhysicalAddressBits - 39));
+ NumberOfPdpEntriesNeeded = 512;
+ }
+
+ //
+ // By architecture only one PageMapLevel4 exists - so lets allocate storage for it.
+ //
+ PageMap = (VOID *) PageTable;
+
+ PageMapLevel4Entry = PageMap;
+ PageAddress = 0;
+ for (IndexOfPml4Entries = 0; IndexOfPml4Entries < NumberOfPml4EntriesNeeded; IndexOfPml4Entries++, PageMapLevel4Entry++) {
+ //
+ // Each PML4 entry points to a page of Page Directory Pointer entries.
+ //
+ PageDirectoryPointerEntry = (UINT64 *) ((*PageMapLevel4Entry) & gPhyMask);
+ if (PageDirectoryPointerEntry == NULL) {
+ PageDirectoryPointerEntry = AllocatePageTableMemory (1);
+ ASSERT(PageDirectoryPointerEntry != NULL);
+ ZeroMem (PageDirectoryPointerEntry, EFI_PAGES_TO_SIZE(1));
+
+ *PageMapLevel4Entry = ((UINTN)PageDirectoryPointerEntry & gPhyMask) | PAGE_ATTRIBUTE_BITS;
+ }
+
+ if (m1GPageTableSupport) {
+ PageDirectory1GEntry = PageDirectoryPointerEntry;
+ for (IndexOfPageDirectoryEntries = 0; IndexOfPageDirectoryEntries < 512; IndexOfPageDirectoryEntries++, PageDirectory1GEntry++, PageAddress += SIZE_1GB) {
+ if (IndexOfPml4Entries == 0 && IndexOfPageDirectoryEntries < 4) {
+ //
+ // Skip the < 4G entries
+ //
+ continue;
+ }
+ //
+ // Fill in the Page Directory entries
+ //
+ *PageDirectory1GEntry = (PageAddress & gPhyMask) | IA32_PG_PS | PAGE_ATTRIBUTE_BITS;
+ }
+ } else {
+ PageAddress = BASE_4GB;
+ for (IndexOfPdpEntries = 0; IndexOfPdpEntries < NumberOfPdpEntriesNeeded; IndexOfPdpEntries++, PageDirectoryPointerEntry++) {
+ if (IndexOfPml4Entries == 0 && IndexOfPdpEntries < 4) {
+ //
+ // Skip the < 4G entries
+ //
+ continue;
+ }
+ //
+ // Each Directory Pointer entries points to a page of Page Directory entires.
+ // So allocate space for them and fill them in in the IndexOfPageDirectoryEntries loop.
+ //
+ PageDirectoryEntry = (UINT64 *) ((*PageDirectoryPointerEntry) & gPhyMask);
+ if (PageDirectoryEntry == NULL) {
+ PageDirectoryEntry = AllocatePageTableMemory (1);
+ ASSERT(PageDirectoryEntry != NULL);
+ ZeroMem (PageDirectoryEntry, EFI_PAGES_TO_SIZE(1));
+
+ //
+ // Fill in a Page Directory Pointer Entries
+ //
+ *PageDirectoryPointerEntry = (UINT64)(UINTN)PageDirectoryEntry | PAGE_ATTRIBUTE_BITS;
+ }
+
+ for (IndexOfPageDirectoryEntries = 0; IndexOfPageDirectoryEntries < 512; IndexOfPageDirectoryEntries++, PageDirectoryEntry++, PageAddress += SIZE_2MB) {
+ //
+ // Fill in the Page Directory entries
+ //
+ *PageDirectoryEntry = (UINT64)PageAddress | IA32_PG_PS | PAGE_ATTRIBUTE_BITS;
+ }
+ }
+ }
+ }
+}
+
+/**
Create PageTable for SMM use.
@return The address of PML4 (to set CR3).
@@ -108,11 +250,17 @@ SmmInitPageTable (
//
InitializeSpinLock (mPFLock);
+ mCpuSmmStaticPageTable = PcdGetBool (PcdCpuSmmStaticPageTable);
m1GPageTableSupport = Is1GPageSupport ();
+ DEBUG ((DEBUG_INFO, "1GPageTableSupport - 0x%x\n", m1GPageTableSupport));
+ DEBUG ((DEBUG_INFO, "PcdCpuSmmStaticPageTable - 0x%x\n", mCpuSmmStaticPageTable));
+
+ mPhysicalAddressBits = CalculateMaximumSupportAddress ();
+ DEBUG ((DEBUG_INFO, "PhysicalAddressBits - 0x%x\n", mPhysicalAddressBits));
//
// Generate PAE page table for the first 4GB memory space
//
- Pages = Gen4GPageTable (PAGE_TABLE_PAGES + 1, FALSE);
+ Pages = Gen4GPageTable (FALSE);
//
// Set IA32_PG_PMNT bit to mask this entry
@@ -125,21 +273,28 @@ SmmInitPageTable (
//
// Fill Page-Table-Level4 (PML4) entry
//
- PTEntry = (UINT64*)(UINTN)(Pages - EFI_PAGES_TO_SIZE (PAGE_TABLE_PAGES + 1));
- *PTEntry = Pages + PAGE_ATTRIBUTE_BITS;
+ PTEntry = (UINT64*)AllocatePageTableMemory (1);
+ ASSERT (PTEntry != NULL);
+ *PTEntry = Pages | PAGE_ATTRIBUTE_BITS;
ZeroMem (PTEntry + 1, EFI_PAGE_SIZE - sizeof (*PTEntry));
+
//
// Set sub-entries number
//
SetSubEntriesNum (PTEntry, 3);
- //
- // Add remaining pages to page pool
- //
- FreePage = (LIST_ENTRY*)(PTEntry + EFI_PAGE_SIZE / sizeof (*PTEntry));
- while ((UINTN)FreePage < Pages) {
- InsertTailList (&mPagePool, FreePage);
- FreePage += EFI_PAGE_SIZE / sizeof (*FreePage);
+ if (mCpuSmmStaticPageTable) {
+ SetStaticPageTable ((UINTN)PTEntry);
+ } else {
+ //
+ // Add pages to page pool
+ //
+ FreePage = (LIST_ENTRY*)AllocatePageTableMemory (PAGE_TABLE_PAGES);
+ ASSERT (FreePage != NULL);
+ for (Index = 0; Index < PAGE_TABLE_PAGES; Index++) {
+ InsertTailList (&mPagePool, FreePage);
+ FreePage += EFI_PAGE_SIZE / sizeof (*FreePage);
+ }
}
if (FeaturePcdGet (PcdCpuSmmProfileEnable)) {
@@ -561,7 +716,7 @@ SmiDefaultPFHandler (
break;
case SmmPageSize1G:
if (!m1GPageTableSupport) {
- DEBUG ((EFI_D_ERROR, "1-GByte pages is not supported!"));
+ DEBUG ((DEBUG_ERROR, "1-GByte pages is not supported!"));
ASSERT (FALSE);
}
//
@@ -612,8 +767,8 @@ SmiDefaultPFHandler (
// Check if the entry has already existed, this issue may occur when the different
// size page entries created under the same entry
//
- DEBUG ((EFI_D_ERROR, "PageTable = %lx, PTIndex = %x, PageTable[PTIndex] = %lx\n", PageTable, PTIndex, PageTable[PTIndex]));
- DEBUG ((EFI_D_ERROR, "New page table overlapped with old page table!\n"));
+ DEBUG ((DEBUG_ERROR, "PageTable = %lx, PTIndex = %x, PageTable[PTIndex] = %lx\n", PageTable, PTIndex, PageTable[PTIndex]));
+ DEBUG ((DEBUG_ERROR, "New page table overlapped with old page table!\n"));
ASSERT (FALSE);
}
//
@@ -654,13 +809,18 @@ SmiPFHandler (
PFAddress = AsmReadCr2 ();
+ if (mCpuSmmStaticPageTable && (PFAddress >= LShiftU64 (1, (mPhysicalAddressBits - 1)))) {
+ DEBUG ((DEBUG_ERROR, "Do not support address 0x%lx by processor!\n", PFAddress));
+ CpuDeadLoop ();
+ }
+
//
// If a page fault occurs in SMRAM range, it should be in a SMM stack guard page.
//
if ((FeaturePcdGet (PcdCpuSmmStackGuard)) &&
(PFAddress >= mCpuHotPlugData.SmrrBase) &&
(PFAddress < (mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize))) {
- DEBUG ((EFI_D_ERROR, "SMM stack overflow!\n"));
+ DEBUG ((DEBUG_ERROR, "SMM stack overflow!\n"));
CpuDeadLoop ();
}
@@ -670,7 +830,7 @@ SmiPFHandler (
if ((PFAddress < mCpuHotPlugData.SmrrBase) ||
(PFAddress >= mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize)) {
if ((SystemContext.SystemContextX64->ExceptionData & IA32_PF_EC_ID) != 0) {
- DEBUG ((EFI_D_ERROR, "Code executed on IP(0x%lx) out of SMM range after SMM is locked!\n", PFAddress));
+ DEBUG ((DEBUG_ERROR, "Code executed on IP(0x%lx) out of SMM range after SMM is locked!\n", PFAddress));
DEBUG_CODE (
DumpModuleInfoByIp (*(UINTN *)(UINTN)SystemContext.SystemContextX64->Rsp);
);
@@ -689,3 +849,87 @@ SmiPFHandler (
ReleaseSpinLock (mPFLock);
}
+
+/**
+ This function sets memory attribute for page table.
+**/
+VOID
+SetPageTableAttributes (
+ VOID
+ )
+{
+ UINTN Index2;
+ UINTN Index3;
+ UINTN Index4;
+ UINT64 *L1PageTable;
+ UINT64 *L2PageTable;
+ UINT64 *L3PageTable;
+ UINT64 *L4PageTable;
+ BOOLEAN IsSplitted;
+ BOOLEAN PageTableSplitted;
+
+ if (!mCpuSmmStaticPageTable) {
+ return ;
+ }
+
+ DEBUG ((DEBUG_INFO, "SetPageTableAttributes\n"));
+
+ //
+ // Disable write protection, because we need mark page table to be write protected.
+ // We need *write* page table memory, to mark itself to be *read only*.
+ //
+ AsmWriteCr0 (AsmReadCr0() & ~CR0_WP);
+
+ do {
+ DEBUG ((DEBUG_INFO, "Start...\n"));
+ PageTableSplitted = FALSE;
+
+ L4PageTable = (UINT64 *)GetPageTableBase ();
+ SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L4PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+ PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+ for (Index4 = 0; Index4 < SIZE_4KB/sizeof(UINT64); Index4++) {
+ L3PageTable = (UINT64 *)(UINTN)(L4PageTable[Index4] & PAGING_4K_ADDRESS_MASK_64);
+ if (L3PageTable == NULL) {
+ continue;
+ }
+
+ SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L3PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+ PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+ for (Index3 = 0; Index3 < SIZE_4KB/sizeof(UINT64); Index3++) {
+ if ((L3PageTable[Index3] & IA32_PG_PS) != 0) {
+ // 1G
+ continue;
+ }
+ L2PageTable = (UINT64 *)(UINTN)(L3PageTable[Index3] & PAGING_4K_ADDRESS_MASK_64);
+ if (L2PageTable == NULL) {
+ continue;
+ }
+
+ SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L2PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+ PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+ for (Index2 = 0; Index2 < SIZE_4KB/sizeof(UINT64); Index2++) {
+ if ((L2PageTable[Index2] & IA32_PG_PS) != 0) {
+ // 2M
+ continue;
+ }
+ L1PageTable = (UINT64 *)(UINTN)(L2PageTable[Index2] & PAGING_4K_ADDRESS_MASK_64);
+ if (L1PageTable == NULL) {
+ continue;
+ }
+ SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L1PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+ PageTableSplitted = (PageTableSplitted || IsSplitted);
+ }
+ }
+ }
+ } while (PageTableSplitted);
+
+ //
+ // Enable write protection, after page table updated.
+ //
+ AsmWriteCr0 (AsmReadCr0() | CR0_WP);
+
+ return ;
+}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S
index 7e9ac58..a425830 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S
@@ -1,6 +1,6 @@
#------------------------------------------------------------------------------
#
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
# This program and the accompanying materials
# are licensed and made available under the terms and conditions of the BSD License
# which accompanies this distribution. The full text of the license may be found at
@@ -24,8 +24,13 @@ ASM_GLOBAL ASM_PFX(gcSmiHandlerSize)
ASM_GLOBAL ASM_PFX(gSmiCr3)
ASM_GLOBAL ASM_PFX(gSmiStack)
ASM_GLOBAL ASM_PFX(gSmbase)
+ASM_GLOBAL ASM_PFX(mXdSupported)
ASM_GLOBAL ASM_PFX(gSmiHandlerIdtr)
+.equ MSR_IA32_MISC_ENABLE, 0x1A0
+.equ MSR_EFER, 0xc0000080
+.equ MSR_EFER_XD, 0x800
+
#
# Constants relating to PROCESSOR_SMM_DESCRIPTOR
#
@@ -132,6 +137,29 @@ ASM_PFX(gSmiCr3): .space 4
movl $TSS_SEGMENT, %eax
ltr %ax
+# enable NXE if supported
+ .byte 0xb0 # mov al, imm8
+ASM_PFX(mXdSupported): .byte 1
+ cmpb $0, %al
+ jz NxeDone
+#
+# Check XD disable bit
+#
+ movl $MSR_IA32_MISC_ENABLE, %ecx
+ rdmsr
+ subl $4, %esp
+ pushq %rdx # save MSR_IA32_MISC_ENABLE[63-32]
+ testl $BIT2, %edx # MSR_IA32_MISC_ENABLE[34]
+ jz L13
+ andw $0x0FFFB, %dx # clear XD Disable bit if it is set
+ wrmsr
+L13:
+ movl $MSR_EFER, %ecx
+ rdmsr
+ orw $MSR_EFER_XD,%ax # enable NXE
+ wrmsr
+NxeDone:
+
#
# Switch to LongMode
#
@@ -139,12 +167,13 @@ ASM_PFX(gSmiCr3): .space 4
call Base # push return address for retf later
Base:
addl $(LongMode - Base), (%rsp) # offset for far retf, seg is the 1st arg
- movl $0xc0000080, %ecx
+
+ movl $MSR_EFER, %ecx
rdmsr
- orb $1,%ah
+ orb $1,%ah # enable LME
wrmsr
movq %cr0, %rbx
- orl $0x080010000, %ebx # enable paging + WP
+ orl $0x080010023, %ebx # enable paging + WP + NE + MP + PE
movq %rbx, %cr0
retf
LongMode: # long mode (64-bit code) starts here
@@ -162,10 +191,10 @@ LongMode: # long mode (64-bit code) starts here
# jmp _SmiHandler ; instruction is not needed
_SmiHandler:
- movq (%rsp), %rbx
+ movq 8(%rsp), %rbx
# Save FP registers
- subq $0x208, %rsp
+ subq $0x200, %rsp
.byte 0x48 # FXSAVE64
fxsave (%rsp)
@@ -191,6 +220,16 @@ _SmiHandler:
.byte 0x48 # FXRSTOR64
fxrstor (%rsp)
+ addq $0x200, %rsp
+ popq %rdx # get saved MSR_IA32_MISC_ENABLE[63-32]
+ testl $BIT2, %edx
+ jz L16
+ movl $MSR_IA32_MISC_ENABLE, %ecx
+ rdmsr
+ orw $BIT2, %dx # set XD Disable bit if it was set before entering into SMM
+ wrmsr
+
+L16:
rsm
ASM_PFX(gcSmiHandlerSize): .word . - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm
index 094cf2c..74d320e 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm
@@ -1,5 +1,5 @@
;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
; This program and the accompanying materials
; are licensed and made available under the terms and conditions of the BSD License
; which accompanies this distribution. The full text of the license may be found at
@@ -29,8 +29,12 @@ EXTERNDEF gcSmiHandlerSize:WORD
EXTERNDEF gSmiCr3:DWORD
EXTERNDEF gSmiStack:DWORD
EXTERNDEF gSmbase:DWORD
+EXTERNDEF mXdSupported:BYTE
EXTERNDEF gSmiHandlerIdtr:FWORD
+MSR_IA32_MISC_ENABLE EQU 1A0h
+MSR_EFER EQU 0c0000080h
+MSR_EFER_XD EQU 0800h
;
; Constants relating to PROCESSOR_SMM_DESCRIPTOR
@@ -130,17 +134,41 @@ gSmiCr3 DD ?
mov eax, TSS_SEGMENT
ltr ax
+; enable NXE if supported
+ DB 0b0h ; mov al, imm8
+mXdSupported DB 1
+ cmp al, 0
+ jz @SkipXd
+;
+; Check XD disable bit
+;
+ mov ecx, MSR_IA32_MISC_ENABLE
+ rdmsr
+ sub esp, 4
+ push rdx ; save MSR_IA32_MISC_ENABLE[63-32]
+ test edx, BIT2 ; MSR_IA32_MISC_ENABLE[34]
+ jz @f
+ and dx, 0FFFBh ; clear XD Disable bit if it is set
+ wrmsr
+@@:
+ mov ecx, MSR_EFER
+ rdmsr
+ or ax, MSR_EFER_XD ; enable NXE
+ wrmsr
+@SkipXd:
+
; Switch into @LongMode
push LONG_MODE_CS ; push cs hardcore here
call Base ; push return address for retf later
Base:
add dword ptr [rsp], @LongMode - Base; offset for far retf, seg is the 1st arg
- mov ecx, 0c0000080h
+
+ mov ecx, MSR_EFER
rdmsr
- or ah, 1
+ or ah, 1 ; enable LME
wrmsr
mov rbx, cr0
- or ebx, 080010000h ; enable paging + WP
+ or ebx, 080010023h ; enable paging + WP + NE + MP + PE
mov cr0, rbx
retf
@LongMode: ; long mode (64-bit code) starts here
@@ -163,7 +191,7 @@ _SmiHandler:
;
; Save FP registers
;
- sub rsp, 208h
+ sub rsp, 200h
DB 48h ; FXSAVE64
fxsave [rsp]
@@ -172,15 +200,15 @@ _SmiHandler:
mov rcx, rbx
mov rax, CpuSmmDebugEntry
call rax
-
+
mov rcx, rbx
mov rax, SmiRendezvous ; rax <- absolute addr of SmiRedezvous
call rax
-
+
mov rcx, rbx
mov rax, CpuSmmDebugExit
call rax
-
+
add rsp, 20h
;
@@ -189,6 +217,16 @@ _SmiHandler:
DB 48h ; FXRSTOR64
fxrstor [rsp]
+ add rsp, 200h
+ pop rdx ; get saved MSR_IA32_MISC_ENABLE[63-32]
+ test edx, BIT2
+ jz @f
+ mov ecx, MSR_IA32_MISC_ENABLE
+ rdmsr
+ or dx, BIT2 ; set XD Disable bit if it was set before entering into SMM
+ wrmsr
+
+@@:
rsm
gcSmiHandlerSize DW $ - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm
index b717cda..5eb5cc6 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm
@@ -22,6 +22,10 @@
; Variables referrenced by C code
;
+%define MSR_IA32_MISC_ENABLE 0x1A0
+%define MSR_EFER 0xc0000080
+%define MSR_EFER_XD 0x800
+
;
; Constants relating to PROCESSOR_SMM_DESCRIPTOR
;
@@ -50,6 +54,7 @@ extern ASM_PFX(CpuSmmDebugEntry)
extern ASM_PFX(CpuSmmDebugExit)
global ASM_PFX(gSmbase)
+global ASM_PFX(mXdSupported)
global ASM_PFX(gSmiStack)
global ASM_PFX(gSmiCr3)
global ASM_PFX(gcSmiHandlerTemplate)
@@ -69,7 +74,7 @@ _SmiEntryPoint:
mov [cs:bx + 2], eax
o32 lgdt [cs:bx] ; lgdt fword ptr cs:[bx]
mov ax, PROTECT_MODE_CS
- mov [cs:bx-0x2],ax
+ mov [cs:bx-0x2],ax
DB 0x66, 0xbf ; mov edi, SMBASE
ASM_PFX(gSmbase): DD 0
lea eax, [edi + (@ProtectedMode - _SmiEntryPoint) + 0x8000]
@@ -79,7 +84,7 @@ ASM_PFX(gSmbase): DD 0
or ebx, 0x23
mov cr0, ebx
jmp dword 0x0:0x0
-_GdtDesc:
+_GdtDesc:
DW 0
DD 0
@@ -112,17 +117,41 @@ ASM_PFX(gSmiCr3): DD 0
mov eax, TSS_SEGMENT
ltr ax
+; enable NXE if supported
+ DB 0xb0 ; mov al, imm8
+ASM_PFX(mXdSupported): DB 1
+ cmp al, 0
+ jz @SkipXd
+;
+; Check XD disable bit
+;
+ mov ecx, MSR_IA32_MISC_ENABLE
+ rdmsr
+ sub esp, 4
+ push rdx ; save MSR_IA32_MISC_ENABLE[63-32]
+ test edx, BIT2 ; MSR_IA32_MISC_ENABLE[34]
+ jz .0
+ and dx, 0xFFFB ; clear XD Disable bit if it is set
+ wrmsr
+.0:
+ mov ecx, MSR_EFER
+ rdmsr
+ or ax, MSR_EFER_XD ; enable NXE
+ wrmsr
+@SkipXd:
+
; Switch into @LongMode
push LONG_MODE_CS ; push cs hardcore here
- call Base ; push reture address for retf later
+ call Base ; push return address for retf later
Base:
add dword [rsp], @LongMode - Base; offset for far retf, seg is the 1st arg
- mov ecx, 0xc0000080
+
+ mov ecx, MSR_EFER
rdmsr
- or ah, 1
+ or ah, 1 ; enable LME
wrmsr
mov rbx, cr0
- or ebx, 080010000h ; enable paging + WP
+ or ebx, 0x80010023 ; enable paging + WP + NE + MP + PE
mov cr0, rbx
retf
@LongMode: ; long mode (64-bit code) starts here
@@ -140,12 +169,12 @@ Base:
; jmp _SmiHandler ; instruction is not needed
_SmiHandler:
- mov rbx, [rsp] ; rbx <- CpuIndex
+ mov rbx, [rsp + 0x8] ; rcx <- CpuIndex
;
; Save FP registers
;
- sub rsp, 0x208
+ sub rsp, 0x200
DB 0x48 ; FXSAVE64
fxsave [rsp]
@@ -154,15 +183,15 @@ _SmiHandler:
mov rcx, rbx
mov rax, CpuSmmDebugEntry
call rax
-
+
mov rcx, rbx
mov rax, SmiRendezvous ; rax <- absolute addr of SmiRedezvous
call rax
-
+
mov rcx, rbx
mov rax, CpuSmmDebugExit
call rax
-
+
add rsp, 0x20
;
@@ -171,6 +200,16 @@ _SmiHandler:
DB 0x48 ; FXRSTOR64
fxrstor [rsp]
+ add rsp, 0x200
+ pop rdx ; get saved MSR_IA32_MISC_ENABLE[63-32]
+ test edx, BIT2
+ jz .1
+ mov ecx, MSR_IA32_MISC_ENABLE
+ rdmsr
+ or dx, BIT2 ; set XD Disable bit if it was set before entering into SMM
+ wrmsr
+
+.1:
rsm
gcSmiHandlerSize DW $ - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S
index 2ae6f2c..2e2792d 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S
@@ -1,6 +1,6 @@
#------------------------------------------------------------------------------
#
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
# This program and the accompanying materials
# are licensed and made available under the terms and conditions of the BSD License
# which accompanies this distribution. The full text of the license may be found at
@@ -128,244 +128,8 @@ ASM_PFX(gcSmiGdtr):
.quad NullSeg
ASM_PFX(gcSmiIdtr):
- .word IDT_SIZE - 1
- .quad _SmiIDT
-
-
-#
-# Here is the IDT. There are 32 (not 255) entries in it since only processor
-# generated exceptions will be handled.
-#
-_SmiIDT:
-# The following segment repeats 32 times:
-# No. 1
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 2
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 3
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 4
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 5
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 6
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 7
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 8
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 9
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 10
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 11
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 12
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 13
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 14
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 15
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 16
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 17
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 18
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 19
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 20
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 21
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 22
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 23
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 24
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 25
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 26
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 27
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 28
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 29
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 30
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 31
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-# No. 32
- .word 0 # Offset 0:15
- .word CODE_SEL
- .byte 0 # Unused
- .byte 0x8e # Interrupt Gate, Present
- .word 0 # Offset 16:31
- .quad 0 # Offset 32:63
-
-_SmiIDTEnd:
-
-.equ IDT_SIZE, (_SmiIDTEnd - _SmiIDT)
+ .word 0
+ .quad 0
.text
@@ -600,11 +364,3 @@ L5:
addq $16, %rsp # skip INT# & ErrCode
iretq
-ASM_GLOBAL ASM_PFX(InitializeIDTSmmStackGuard)
-ASM_PFX(InitializeIDTSmmStackGuard):
-# If SMM Stack Guard feature is enabled, set the IST field of
-# the interrupt gate for Page Fault Exception to be 1
-#
- movabsq $_SmiIDT + 14 * 16, %rax
- movb $1, 4(%rax)
- ret
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm
index ab71645..f55ba72 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm
@@ -1,5 +1,5 @@
;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
; This program and the accompanying materials
; are licensed and made available under the terms and conditions of the BSD License
; which accompanies this distribution. The full text of the license may be found at
@@ -144,27 +144,8 @@ gcSmiGdtr LABEL FWORD
DQ offset NullSeg
gcSmiIdtr LABEL FWORD
- DW IDT_SIZE - 1
- DQ offset _SmiIDT
-
- .data
-
-;
-; Here is the IDT. There are 32 (not 255) entries in it since only processor
-; generated exceptions will be handled.
-;
-_SmiIDT:
-REPEAT 32
- DW 0 ; Offset 0:15
- DW CODE_SEL ; Segment selector
- DB 0 ; Unused
- DB 8eh ; Interrupt Gate, Present
- DW 0 ; Offset 16:31
- DQ 0 ; Offset 32:63
- ENDM
-_SmiIDTEnd:
-
-IDT_SIZE = (offset _SmiIDTEnd - offset _SmiIDT)
+ DW 0
+ DQ 0
.code
@@ -400,14 +381,4 @@ PageFaultIdtHandlerSmmProfile PROC
iretq
PageFaultIdtHandlerSmmProfile ENDP
-InitializeIDTSmmStackGuard PROC
-;
-; If SMM Stack Guard feature is enabled, set the IST field of
-; the interrupt gate for Page Fault Exception to be 1
-;
- lea rax, _SmiIDT + 14 * 16
- mov byte ptr [rax + 4], 1
- ret
-InitializeIDTSmmStackGuard ENDP
-
END
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm
index 821ee18..bc8d95d 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm
@@ -145,25 +145,8 @@ ASM_PFX(gcSmiGdtr):
DQ NullSeg
ASM_PFX(gcSmiIdtr):
- DW IDT_SIZE - 1
- DQ _SmiIDT
-
-;
-; Here is the IDT. There are 32 (not 255) entries in it since only processor
-; generated exceptions will be handled.
-;
-_SmiIDT:
-%rep 32
- DW 0 ; 0:15
- DW CODE_SEL ; Segment selector
- DB 0 ; Unused
- DB 0x8e ; Interrupt Gate, Present
- DW 0 ; 16:31
- DQ 0 ; 32:63
-%endrep
-_SmiIDTEnd:
-
-IDT_SIZE equ _SmiIDTEnd - _SmiIDT
+ DW 0
+ DQ 0
DEFAULT REL
SECTION .text
@@ -400,13 +383,3 @@ ASM_PFX(PageFaultIdtHandlerSmmProfile):
add rsp, 16 ; skip INT# & ErrCode
iretq
-global ASM_PFX(InitializeIDTSmmStackGuard)
-ASM_PFX(InitializeIDTSmmStackGuard):
-;
-; If SMM Stack Guard feature is enabled, set the IST field of
-; the interrupt gate for Page Fault Exception to be 1
-;
- lea rax, [_SmiIDT + 14 * 16]
- mov byte [rax + 4], 1
- ret
-
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
index b53aa45..e2eca73 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
@@ -1,7 +1,7 @@
/** @file
SMM CPU misc functions for x64 arch specific.
-Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2015 - 2016, Intel Corporation. All rights reserved.<BR>
This program and the accompanying materials
are licensed and made available under the terms and conditions of the BSD License
which accompanies this distribution. The full text of the license may be found at
@@ -14,6 +14,30 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
#include "PiSmmCpuDxeSmm.h"
+EFI_PHYSICAL_ADDRESS mGdtBuffer;
+UINTN mGdtBufferSize;
+
+/**
+ Initialize IDT for SMM Stack Guard.
+
+**/
+VOID
+EFIAPI
+InitializeIDTSmmStackGuard (
+ VOID
+ )
+{
+ IA32_IDT_GATE_DESCRIPTOR *IdtGate;
+
+ //
+ // If SMM Stack Guard feature is enabled, set the IST field of
+ // the interrupt gate for Page Fault Exception to be 1
+ //
+ IdtGate = (IA32_IDT_GATE_DESCRIPTOR *)gcSmiIdtr.Base;
+ IdtGate += EXCEPT_IA32_PAGE_FAULT;
+ IdtGate->Bits.Reserved_0 = 1;
+}
+
/**
Initialize Gdt for all processors.
@@ -41,8 +65,10 @@ InitGdt (
// on each SMI entry.
//
GdtTssTableSize = (gcSmiGdtr.Limit + 1 + TSS_SIZE + 7) & ~7; // 8 bytes aligned
- GdtTssTables = (UINT8*)AllocatePages (EFI_SIZE_TO_PAGES (GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus));
+ mGdtBufferSize = GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus;
+ GdtTssTables = (UINT8*)AllocateCodePages (EFI_SIZE_TO_PAGES (mGdtBufferSize));
ASSERT (GdtTssTables != NULL);
+ mGdtBuffer = (UINTN)GdtTssTables;
GdtTableStepSize = GdtTssTableSize;
for (Index = 0; Index < gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus; Index++) {
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c
index 065fb2c..cc393dc 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c
@@ -1,7 +1,7 @@
/** @file
X64 processor specific functions to enable SMM profile.
-Copyright (c) 2012 - 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2012 - 2016, Intel Corporation. All rights reserved.<BR>
This program and the accompanying materials
are licensed and made available under the terms and conditions of the BSD License
which accompanies this distribution. The full text of the license may be found at
@@ -45,12 +45,13 @@ InitSmmS3Cr3 (
//
// Generate PAE page table for the first 4GB memory space
//
- Pages = Gen4GPageTable (1, FALSE);
+ Pages = Gen4GPageTable (FALSE);
//
// Fill Page-Table-Level4 (PML4) entry
//
- PTEntry = (UINT64*)(UINTN)(Pages - EFI_PAGES_TO_SIZE (1));
+ PTEntry = (UINT64*)AllocatePageTableMemory (1);
+ ASSERT (PTEntry != NULL);
*PTEntry = Pages | PAGE_ATTRIBUTE_BITS;
ZeroMem (PTEntry + 1, EFI_PAGE_SIZE - sizeof (*PTEntry));
--
2.7.4.windows.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm paging protection.
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
` (4 preceding siblings ...)
2016-11-04 9:30 ` [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection Jiewen Yao
@ 2016-11-04 9:30 ` Jiewen Yao
2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
2016-11-08 1:22 ` Laszlo Ersek
7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04 9:30 UTC (permalink / raw)
To: edk2-devel
Cc: Michael D Kinney, Kelly Steele, Jeff Fan, Feng Tian, Star Zeng,
Laszlo Ersek
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Kelly Steele <kelly.steele@intel.com>
Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
QuarkPlatformPkg/Quark.dsc | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/QuarkPlatformPkg/Quark.dsc b/QuarkPlatformPkg/Quark.dsc
index d5988da..9804b70 100644
--- a/QuarkPlatformPkg/Quark.dsc
+++ b/QuarkPlatformPkg/Quark.dsc
@@ -891,3 +891,9 @@
[BuildOptions.common.EDKII.DXE_RUNTIME_DRIVER]
MSFT:*_*_*_DLINK_FLAGS = /ALIGN:4096
+
+# Force PE/COFF sections to be aligned at 4KB boundaries to support page level protection of DXE_SMM_DRIVER/SMM_CORE modules
+[BuildOptions.common.EDKII.DXE_SMM_DRIVER, BuildOptions.common.EDKII.SMM_CORE]
+ MSFT:*_*_*_DLINK_FLAGS = /ALIGN:4096
+ GCC:*_*_*_DLINK_FLAGS = -z common-page-size=0x1000
+
--
2.7.4.windows.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
` (5 preceding siblings ...)
2016-11-04 9:30 ` [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm " Jiewen Yao
@ 2016-11-04 22:40 ` Laszlo Ersek
2016-11-04 22:46 ` Yao, Jiewen
2016-11-08 1:22 ` Laszlo Ersek
7 siblings, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-04 22:40 UTC (permalink / raw)
To: Jiewen Yao, edk2-devel; +Cc: Michael D Kinney, Feng Tian, Jeff Fan, Star Zeng
On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
Jiewen, can you please push this series to a new branch in your repo?
I see a branch called "SmmProtection_V2", but it seems to end with an
incomplete patch (26f482d8b611d0fcb07d3ffbf3f4468fd249767b, subject
"pismmcpu"), so I figured I'd ask explicitly.
Thanks
Laszlo
> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2 X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Star Zeng <star.zeng@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
>
> Jiewen Yao (6):
> MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
> MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
> MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
> UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
> UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
> QuarkPlatformPkg/dsc: enable Smm paging protection.
>
> MdeModulePkg/Core/PiSmmCore/Dispatcher.c | 66 +
> MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c | 1509 ++++++++++++++++++++
> MdeModulePkg/Core/PiSmmCore/Page.c | 775 +++++++++-
> MdeModulePkg/Core/PiSmmCore/PiSmmCore.c | 40 +
> MdeModulePkg/Core/PiSmmCore/PiSmmCore.h | 91 ++
> MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf | 2 +
> MdeModulePkg/Core/PiSmmCore/Pool.c | 16 +
> MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h | 51 +
> MdeModulePkg/MdeModulePkg.dec | 3 +
> QuarkPlatformPkg/Quark.dsc | 6 +
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c | 71 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S | 67 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm | 68 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm | 70 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S | 226 +--
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm | 36 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm | 36 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 37 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c | 4 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 127 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c | 142 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 156 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf | 5 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c | 871 +++++++++++
> UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c | 39 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h | 15 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c | 274 +++-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S | 51 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm | 54 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm | 61 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S | 250 +---
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm | 35 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm | 31 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 30 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c | 7 +-
> UefiCpuPkg/UefiCpuPkg.dec | 8 +
> 36 files changed, 4529 insertions(+), 801 deletions(-)
> create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
> create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
> create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
@ 2016-11-04 22:46 ` Yao, Jiewen
2016-11-04 23:08 ` Laszlo Ersek
0 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-04 22:46 UTC (permalink / raw)
To: Laszlo Ersek, edk2-devel@ml01.01.org
Cc: Kinney, Michael D, Tian, Feng, Fan, Jeff, Zeng, Star
Ah, yes. Laszlo. You are right.
I forget to push the last update yesterday. Thank you to remind me.
Now it is synced.
Thank you
Yao Jiewen
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Saturday, November 5, 2016 6:40 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; edk2-devel@ml01.01.org
Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
Jiewen, can you please push this series to a new branch in your repo?
I see a branch called "SmmProtection_V2", but it seems to end with an
incomplete patch (26f482d8b611d0fcb07d3ffbf3f4468fd249767b, subject
"pismmcpu"), so I figured I'd ask explicitly.
Thanks
Laszlo
> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2 X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
> MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
> MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
> MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
> UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
> UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
> QuarkPlatformPkg/dsc: enable Smm paging protection.
>
> MdeModulePkg/Core/PiSmmCore/Dispatcher.c | 66 +
> MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c | 1509 ++++++++++++++++++++
> MdeModulePkg/Core/PiSmmCore/Page.c | 775 +++++++++-
> MdeModulePkg/Core/PiSmmCore/PiSmmCore.c | 40 +
> MdeModulePkg/Core/PiSmmCore/PiSmmCore.h | 91 ++
> MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf | 2 +
> MdeModulePkg/Core/PiSmmCore/Pool.c | 16 +
> MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h | 51 +
> MdeModulePkg/MdeModulePkg.dec | 3 +
> QuarkPlatformPkg/Quark.dsc | 6 +
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c | 71 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S | 67 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm | 68 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm | 70 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S | 226 +--
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm | 36 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm | 36 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 37 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c | 4 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 127 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c | 142 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 156 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf | 5 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c | 871 +++++++++++
> UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c | 39 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h | 15 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c | 274 +++-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S | 51 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm | 54 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm | 61 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S | 250 +---
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm | 35 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm | 31 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 30 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c | 7 +-
> UefiCpuPkg/UefiCpuPkg.dec | 8 +
> 36 files changed, 4529 insertions(+), 801 deletions(-)
> create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
> create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
> create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-04 22:46 ` Yao, Jiewen
@ 2016-11-04 23:08 ` Laszlo Ersek
0 siblings, 0 replies; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-04 23:08 UTC (permalink / raw)
To: Yao, Jiewen, edk2-devel@ml01.01.org
Cc: Kinney, Michael D, Tian, Feng, Fan, Jeff, Zeng, Star
On 11/04/16 23:46, Yao, Jiewen wrote:
> Ah, yes. Laszlo. You are right.
>
> I forget to push the last update yesterday. Thank you to remind me.
> Now it is synced.
Thanks! The commit message updates and the v1->v2 differences look
good/reasonable to me (I diffed the code-level end results of the two
versions, plus I compared the commit messages pairwise). I hope to test
v2 sometime next week, and I intend to look into the S3 instability too
(I took note of Paolo's advice with the "info tlb" QEMU monitor command).
Going through the (now documented) SMRAM impact again, I realize the
platform can elect to set PcdCpuSmmStaticPageTable dynamically as well.
I'm sort of guessing that we might want to set the PCD in OVMF's
PlatformPei, based on the guest-phys address width (which we also
calculate in PlatformPei), in combination with availability of 1G
paging. The case we should likely avoid is
> A) If the system only supports 2M paging,
> When the whole memory/MMIO is 48bit, we need 1+256+256*256 pages
> (~ 257M)
Anyway, I don't want to be too clever about this until we see a problem
(out-of-SMRAM) in practice.
Thanks!
Laszlo
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Saturday, November 5, 2016 6:40 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>; edk2-devel@ml01.01.org
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>
> Jiewen, can you please push this series to a new branch in your repo?
>
> I see a branch called "SmmProtection_V2", but it seems to end with an
> incomplete patch (26f482d8b611d0fcb07d3ffbf3f4468fd249767b, subject
> "pismmcpu"), so I figured I'd ask explicitly.
>
> Thanks
> Laszlo
>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2 X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>>
>> Jiewen Yao (6):
>> MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>> MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>> MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>> UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>> UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>> QuarkPlatformPkg/dsc: enable Smm paging protection.
>>
>> MdeModulePkg/Core/PiSmmCore/Dispatcher.c | 66 +
>> MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c | 1509 ++++++++++++++++++++
>> MdeModulePkg/Core/PiSmmCore/Page.c | 775 +++++++++-
>> MdeModulePkg/Core/PiSmmCore/PiSmmCore.c | 40 +
>> MdeModulePkg/Core/PiSmmCore/PiSmmCore.h | 91 ++
>> MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf | 2 +
>> MdeModulePkg/Core/PiSmmCore/Pool.c | 16 +
>> MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h | 51 +
>> MdeModulePkg/MdeModulePkg.dec | 3 +
>> QuarkPlatformPkg/Quark.dsc | 6 +
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c | 71 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S | 67 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm | 68 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm | 70 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S | 226 +--
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm | 36 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm | 36 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 37 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c | 4 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 127 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c | 142 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 156 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf | 5 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c | 871 +++++++++++
>> UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c | 39 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h | 15 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c | 274 +++-
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S | 51 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm | 54 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm | 61 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S | 250 +---
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm | 35 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm | 31 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 30 +-
>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c | 7 +-
>> UefiCpuPkg/UefiCpuPkg.dec | 8 +
>> 36 files changed, 4529 insertions(+), 801 deletions(-)
>> create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
>> create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
>> create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
` (6 preceding siblings ...)
2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
@ 2016-11-08 1:22 ` Laszlo Ersek
2016-11-08 12:59 ` Yao, Jiewen
` (2 more replies)
7 siblings, 3 replies; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-08 1:22 UTC (permalink / raw)
To: Jiewen Yao
Cc: edk2-devel, Michael D Kinney, Feng Tian, Jeff Fan, Star Zeng,
Paolo Bonzini
On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>
> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2 X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Star Zeng <star.zeng@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
I have new test results. Let's start with the table again:
Legend:
- "untested" means the test was not executed because the same test
failed or proved unreliable in a less demanding configuration already,
- "n/a" means a setting or test case was impossible,
- "fail" and "unreliable" (lower case) are outside the scope of this
series; they either capture the pre-series status, or are expected
even with the series applied due to the pre-series status,
- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
series.
In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).
series OVMF VCPU boot S3 resume
# applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
-- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
1 no Ia32 64 n/a 1x2x2 pass unreliable
2 no Ia32 255 n/a 52x2x2 pass untested
3 no Ia32 255 n/a 53x2x2 unreliable untested
4 no Ia32X64 64 n/a 1x2x2 pass unreliable
5 no Ia32X64 255 n/a 52x2x2 pass untested
6 no Ia32X64 255 n/a 54x2x2 fail n/a
7 v2 Ia32 64 FALSE 1x2x2 pass untested
8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
9 v2 Ia32 255 FALSE 52x2x2 pass untested
10 v2 Ia32 255 FALSE 53x2x2 untested untested
11 v2 Ia32 255 TRUE 52x2x2 untested untested
12 v2 Ia32 255 TRUE 53x2x2 untested untested
13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
* Case 8: this test case failed with v2 as well, but this time with
different symptoms:
> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
I didn't try to narrow this down.
* Case 13 (the "unreliable S3 resume" case): Here the news are both bad
and good. The good news is for Jiewen: this patch series does not
cause the unreliability, it "only" amplifies it severely. The bad news
is correspondingly for everyone else: S3 resume is actually unreliable
even in case 4, that is, without this series applied, it's just the
failure rate is much-much lower.
Namely, in my new testing, in case 13, S3 resume failed 8 times out of
21 tries. (I stopped testing at the 8th failure.)
Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
#12 that failed; I continued testing and aborted the test after the
55th try.)
So, while the series hugely amplifies the failure rate, the failure
does exist without the series. Which is why I modified the case 4
results in the table, and also lower-cased the word "unreliable" in
case 13.
Below I will return to this problem separately; let's go over the rest
of the table first.
* Case 17: I guess this is not a real failure, I'm just including it for
completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
additional SMRAM demand (see the commit message on patch V2 4/6). This
case fails with
> SmmLockBox Command - 4
> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
> SmmLockBox SmmLockBoxHandler Exit
> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
which is an SMRAM allocation failure. If I lower the VCPU count to
50x2x2, then the guest boots fine.
----*----
Before I get to the S3 resume problem (which, again, reproduces without
this series, although much less frequently), I'd like to comment on the
removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
function, on the return value of SmmBlockingStartupThisAp(). This change
allows v2 to proceed past that point; however, I'm seeing a whole lot of
> !mSmmMpSyncData->CpuData[1].Present
> !mSmmMpSyncData->CpuData[2].Present
> !mSmmMpSyncData->CpuData[3].Present
> ...
messages in the OVMF boot log, interspersed with
> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
style messages. (That is, one error message for each AP, per
ConvertPageEntryAttribute() message.)
Is this okay / intentional? The number of these messages can go up to
several thousands and that sort of drowns out everything else in the
log.
It's also not easy to mask the message, because it's logged on the
DEBUG_ERROR level.
----*----
* Okay, so the S3 problem. Last time I suspected that the failure point
(RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
9A1D0, according to the OVMF log). In order to test this idea, I
exercised this series with S3 against a Windows 8.1 guest (--> case 13
again). The failure reproduced on the second S3 resume, with identical
RIP, despite the Windows wakeup vector being located elsewhere (at
0x1000).
Quoting the OVMF log leading up to the resume:
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 1000
QEMU log (same as before):
> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT= 000000007f294000 00000047
> IDT= 000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT= 000000007f294000 00000047
> IDT= 000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
So, we can exclude the suspicion that the problem is guest OS
dependent.
* Then I looked for the base address of the page containing the
RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
some firmware component might have allocated that area actually. Here
we go:
> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
That is, the failure hits (when it hits -- not always) in the area
where the CpuMpPei driver *borrows* memory for the startup vector of
the APs, for the purposes of the MP service PPI. ("Wakeup" is an
overloaded word here; the "wakeup buffer" has nothing to do with S3
resume, it just serves for booting the APs temporarily in PEI, for
implementing the MP service PPI.)
When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
the original contents of this area. This occurs just before
transfering control to the guest OS wakeup vector: see the
"EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
quoted from the OVMF log.
I documented (parts of) this logic in OVMF commit
https://github.com/tianocore/edk2/commit/e3e3090a959a0
(see the code comments as well).
* At that time, I thought to have identified a memory management bug in
CpuMpPei; see the following discussion and bug report for details:
https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
https://bugzilla.tianocore.org/show_bug.cgi?id=67
However, with the extraction / introduction of MpInitLib, this issue
has been fixed: GetWakeupBuffer() now calls
CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
no longer; we shouldn't be looking there for the root cause.
* Either way, I don't understand why anything would want to execute code
in the one page that happens to host the MP services PPI startup
buffer for APs during PEI.
Not understanding the "why", I looked at the "what", and resorted to
tracing KVM. Because the problem readily reproduces with this series
applied (case 13), it wasn't hard to start the tracing while the guest
was suspended, and capture just the actions that led from the
KVM-level wakeup to the failure.
The QEMU state dumps are visible above in the email. I've also
uploaded the compressed OVMF log and the textual KVM trace here:
http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
I sincerely hope that Paolo will have a field day with the KVM trace
:) I managed to identify the following curiosities (remember this is
all on the S3 resume path):
* First, the VCPUs (there are four of them) enter and leave SMM in a
really funky pattern:
vcpu#0 vcpu#1 vcpu#2 vcpu#3
------ ------ ------ ------
enter
|
leave
enter
|
leave
enter
|
leave
enter
|
leave
enter enter
enter | enter |
| | | |
leave | | |
| | |
enter | | |
| | | |
leave leave leave leave
That is, first we have each VCPU enter and leave SMM in complete
isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
temporarily (it comes back in later), while the other three remain
in SMM. Finally all four of them leave SMM together.
After which the problem occurs.
* Second, the instruction that causes things to blow up is <0f aa>,
i.e., RSM. I have absolutely no clue why RSM is executed:
(a) in the area that used to host the AP startup routine for the MP
services PPI -- note that we also have "Transfer to 16bit OS waking
vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
area completeley! --,
(b) and why *after* all four VCPUs have just left SMM, together.
* The RSM instruction is handled successfully elsewhere, for example
when all four VCPUs leave SMM, at the bottom of the diagram above:
> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
* The guest-phys address 7ff7f000 that we see just before the error:
> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
can be found higher up in the trace; namely, it is written to CR3
several times. It's the root of the page tables.
* The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
* I also tried the "info tlb" monitor command, via "virsh
qemu-monitor-command --hmp", while the guest was auto-paused after the
crash.
I cannot provide results: QEMU appeared to return a message that would
be longer than 16MB after encoding by libvirt, and libvirt rejected
that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
Anyway, the KVM trace, and the QEMU register dump, look consistent
with what Paolo said about "Code=?? ?? ??...":
The question marks usually mean that the page tables do not map a
page at that address.
CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
(SMM=0). We can't translate *any* guest-virtual address, as we can't
even begin walking the page tables.
Thanks
Laszlo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-08 1:22 ` Laszlo Ersek
@ 2016-11-08 12:59 ` Yao, Jiewen
2016-11-08 13:22 ` Laszlo Ersek
2016-11-09 6:25 ` Yao, Jiewen
2016-11-09 11:23 ` Paolo Bonzini
2 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-08 12:59 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
HI Laszlo
Thanks for the detail test result.
Quick comment for the debug message:
1) For "ConvertPageEntryAttribute 0x7F92B067->0x7F92B065", I agree to change to DEBUG_VERBOSE, because it pure debug purpose.
2) For "!mSmmMpSyncData->CpuData[1].Present", I think people has interest to know startup failure reason. I would prefer to keep current DEBUG_ERROR.
At same time, I understand your OVMF concern on too many debug message in FlushTlb. So I plan to resolve problem in another way.
I will check "mSmmMpSyncData->CpuData[1].Present" before calling SmmBlockingStartupThisAp(). So you will not see any debug message in FlashTlb(). :)
What about your idea?
Thank you
Yao Jiewen
From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Tuesday, November 8, 2016 9:22 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>
> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2 X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
I have new test results. Let's start with the table again:
Legend:
- "untested" means the test was not executed because the same test
failed or proved unreliable in a less demanding configuration already,
- "n/a" means a setting or test case was impossible,
- "fail" and "unreliable" (lower case) are outside the scope of this
series; they either capture the pre-series status, or are expected
even with the series applied due to the pre-series status,
- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
series.
In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).
series OVMF VCPU boot S3 resume
# applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
-- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
1 no Ia32 64 n/a 1x2x2 pass unreliable
2 no Ia32 255 n/a 52x2x2 pass untested
3 no Ia32 255 n/a 53x2x2 unreliable untested
4 no Ia32X64 64 n/a 1x2x2 pass unreliable
5 no Ia32X64 255 n/a 52x2x2 pass untested
6 no Ia32X64 255 n/a 54x2x2 fail n/a
7 v2 Ia32 64 FALSE 1x2x2 pass untested
8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
9 v2 Ia32 255 FALSE 52x2x2 pass untested
10 v2 Ia32 255 FALSE 53x2x2 untested untested
11 v2 Ia32 255 TRUE 52x2x2 untested untested
12 v2 Ia32 255 TRUE 53x2x2 untested untested
13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
* Case 8: this test case failed with v2 as well, but this time with
different symptoms:
> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
I didn't try to narrow this down.
* Case 13 (the "unreliable S3 resume" case): Here the news are both bad
and good. The good news is for Jiewen: this patch series does not
cause the unreliability, it "only" amplifies it severely. The bad news
is correspondingly for everyone else: S3 resume is actually unreliable
even in case 4, that is, without this series applied, it's just the
failure rate is much-much lower.
Namely, in my new testing, in case 13, S3 resume failed 8 times out of
21 tries. (I stopped testing at the 8th failure.)
Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
#12 that failed; I continued testing and aborted the test after the
55th try.)
So, while the series hugely amplifies the failure rate, the failure
does exist without the series. Which is why I modified the case 4
results in the table, and also lower-cased the word "unreliable" in
case 13.
Below I will return to this problem separately; let's go over the rest
of the table first.
* Case 17: I guess this is not a real failure, I'm just including it for
completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
additional SMRAM demand (see the commit message on patch V2 4/6). This
case fails with
> SmmLockBox Command - 4
> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
> SmmLockBox SmmLockBoxHandler Exit
> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
which is an SMRAM allocation failure. If I lower the VCPU count to
50x2x2, then the guest boots fine.
----*----
Before I get to the S3 resume problem (which, again, reproduces without
this series, although much less frequently), I'd like to comment on the
removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
function, on the return value of SmmBlockingStartupThisAp(). This change
allows v2 to proceed past that point; however, I'm seeing a whole lot of
> !mSmmMpSyncData->CpuData[1].Present
> !mSmmMpSyncData->CpuData[2].Present
> !mSmmMpSyncData->CpuData[3].Present
> ...
messages in the OVMF boot log, interspersed with
> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
style messages. (That is, one error message for each AP, per
ConvertPageEntryAttribute() message.)
Is this okay / intentional? The number of these messages can go up to
several thousands and that sort of drowns out everything else in the
log.
It's also not easy to mask the message, because it's logged on the
DEBUG_ERROR level.
----*----
* Okay, so the S3 problem. Last time I suspected that the failure point
(RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
9A1D0, according to the OVMF log). In order to test this idea, I
exercised this series with S3 against a Windows 8.1 guest (--> case 13
again). The failure reproduced on the second S3 resume, with identical
RIP, despite the Windows wakeup vector being located elsewhere (at
0x1000).
Quoting the OVMF log leading up to the resume:
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 1000
QEMU log (same as before):
> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT= 000000007f294000 00000047
> IDT= 000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT= 000000007f294000 00000047
> IDT= 000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
So, we can exclude the suspicion that the problem is guest OS
dependent.
* Then I looked for the base address of the page containing the
RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
some firmware component might have allocated that area actually. Here
we go:
> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
That is, the failure hits (when it hits -- not always) in the area
where the CpuMpPei driver *borrows* memory for the startup vector of
the APs, for the purposes of the MP service PPI. ("Wakeup" is an
overloaded word here; the "wakeup buffer" has nothing to do with S3
resume, it just serves for booting the APs temporarily in PEI, for
implementing the MP service PPI.)
When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
the original contents of this area. This occurs just before
transfering control to the guest OS wakeup vector: see the
"EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
quoted from the OVMF log.
I documented (parts of) this logic in OVMF commit
https://github.com/tianocore/edk2/commit/e3e3090a959a0
(see the code comments as well).
* At that time, I thought to have identified a memory management bug in
CpuMpPei; see the following discussion and bug report for details:
https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
https://bugzilla.tianocore.org/show_bug.cgi?id=67
However, with the extraction / introduction of MpInitLib, this issue
has been fixed: GetWakeupBuffer() now calls
CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
no longer; we shouldn't be looking there for the root cause.
* Either way, I don't understand why anything would want to execute code
in the one page that happens to host the MP services PPI startup
buffer for APs during PEI.
Not understanding the "why", I looked at the "what", and resorted to
tracing KVM. Because the problem readily reproduces with this series
applied (case 13), it wasn't hard to start the tracing while the guest
was suspended, and capture just the actions that led from the
KVM-level wakeup to the failure.
The QEMU state dumps are visible above in the email. I've also
uploaded the compressed OVMF log and the textual KVM trace here:
http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
I sincerely hope that Paolo will have a field day with the KVM trace
:) I managed to identify the following curiosities (remember this is
all on the S3 resume path):
* First, the VCPUs (there are four of them) enter and leave SMM in a
really funky pattern:
vcpu#0 vcpu#1 vcpu#2 vcpu#3
------ ------ ------ ------
enter
|
leave
enter
|
leave
enter
|
leave
enter
|
leave
enter enter
enter | enter |
| | | |
leave | | |
| | |
enter | | |
| | | |
leave leave leave leave
That is, first we have each VCPU enter and leave SMM in complete
isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
temporarily (it comes back in later), while the other three remain
in SMM. Finally all four of them leave SMM together.
After which the problem occurs.
* Second, the instruction that causes things to blow up is <0f aa>,
i.e., RSM. I have absolutely no clue why RSM is executed:
(a) in the area that used to host the AP startup routine for the MP
services PPI -- note that we also have "Transfer to 16bit OS waking
vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
area completeley! --,
(b) and why *after* all four VCPUs have just left SMM, together.
* The RSM instruction is handled successfully elsewhere, for example
when all four VCPUs leave SMM, at the bottom of the diagram above:
> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
* The guest-phys address 7ff7f000 that we see just before the error:
> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
can be found higher up in the trace; namely, it is written to CR3
several times. It's the root of the page tables.
* The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
* I also tried the "info tlb" monitor command, via "virsh
qemu-monitor-command --hmp", while the guest was auto-paused after the
crash.
I cannot provide results: QEMU appeared to return a message that would
be longer than 16MB after encoding by libvirt, and libvirt rejected
that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
Anyway, the KVM trace, and the QEMU register dump, look consistent
with what Paolo said about "Code=?? ?? ??...":
The question marks usually mean that the page tables do not map a
page at that address.
CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
(SMM=0). We can't translate *any* guest-virtual address, as we can't
even begin walking the page tables.
Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-08 12:59 ` Yao, Jiewen
@ 2016-11-08 13:22 ` Laszlo Ersek
2016-11-08 13:41 ` Yao, Jiewen
0 siblings, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-08 13:22 UTC (permalink / raw)
To: Yao, Jiewen
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
On 11/08/16 13:59, Yao, Jiewen wrote:
> HI Laszlo
>
> Thanks for the detail test result.
>
>
>
> Quick comment for the debug message:
>
> 1) For “ConvertPageEntryAttribute 0x7F92B067->0x7F92B065”, I agree
> to change to DEBUG_VERBOSE, because it pure debug purpose.
>
>
>
> 2) For “!mSmmMpSyncData->CpuData[1].Present”, I think people has
> interest to know startup failure reason. I would prefer to keep current
> DEBUG_ERROR.
I agree that DEBUG_ERROR is approprite for messages that can directly
relate to startup failures.
However, does this condition unavoidably imply startup failure? Because,
as demonstrated by QEMU + OVMF, a platform where an SMI does not pull
all processors into SMM at once can still work with PiSmmCpuDxeSmm,
assuming the appropriate PCD settings.
Therefore, can we make this error message conditional on
(mSmmMpSyncData->EffectiveSyncMode == SmmCpuSyncModeTradition)
? Because, "not present" is an error for the traditional sync mode, but
for the relaxed / directed mode, "not present" is expected. Isn't it?
> At same time, I understand your OVMF concern on too many debug message
> in FlushTlb. So I plan to resolve problem in another way.
>
> I will check “mSmmMpSyncData->CpuData[1].Present” before calling
> SmmBlockingStartupThisAp(). So you will not see any debug message in
> FlashTlb(). J
>
>
>
> What about your idea?
If we cannot omit (or downgrade) the message for
SmmCpuSyncModeRelaxedAp, then decreasing its frequency would be appreciated.
Thanks
Laszlo
>
>
> *From:*edk2-devel [mailto:edk2-devel-bounces@lists.01.org] *On Behalf Of
> *Laszlo Ersek
> *Sent:* Tuesday, November 8, 2016 9:22 AM
> *To:* Yao, Jiewen <jiewen.yao@intel.com>
> *Cc:* Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney,
> Michael D <michael.d.kinney@intel.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star
> <star.zeng@intel.com>
> *Subject:* Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
>
>
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2 X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com <mailto:jeff.fan@intel.com>>
>> Cc: Feng Tian <feng.tian@intel.com <mailto:feng.tian@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com <mailto:star.zeng@intel.com>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>>
>> Cc: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>
>
> I have new test results. Let's start with the table again:
>
> Legend:
>
> - "untested" means the test was not executed because the same test
> failed or proved unreliable in a less demanding configuration already,
>
> - "n/a" means a setting or test case was impossible,
>
> - "fail" and "unreliable" (lower case) are outside the scope of this
> series; they either capture the pre-series status, or are expected
> even with the series applied due to the pre-series status,
>
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
> series.
>
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
>
> series OVMF VCPU boot S3 resume
> # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
> -- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
> 1 no Ia32 64 n/a 1x2x2 pass unreliable
> 2 no Ia32 255 n/a 52x2x2 pass untested
> 3 no Ia32 255 n/a 53x2x2 unreliable untested
> 4 no Ia32X64 64 n/a 1x2x2 pass unreliable
> 5 no Ia32X64 255 n/a 52x2x2 pass untested
> 6 no Ia32X64 255 n/a 54x2x2 fail n/a
> 7 v2 Ia32 64 FALSE 1x2x2 pass untested
> 8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
> 9 v2 Ia32 255 FALSE 52x2x2 pass untested
> 10 v2 Ia32 255 FALSE 53x2x2 untested untested
> 11 v2 Ia32 255 TRUE 52x2x2 untested untested
> 12 v2 Ia32 255 TRUE 53x2x2 untested untested
> 13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
> 14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
> 15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
> 16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
> 17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
> 18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
>
> * Case 8: this test case failed with v2 as well, but this time with
> different symptoms:
>
>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitExitBootServicesCallback() done!
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>
> I didn't try to narrow this down.
>
> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
> and good. The good news is for Jiewen: this patch series does not
> cause the unreliability, it "only" amplifies it severely. The bad news
> is correspondingly for everyone else: S3 resume is actually unreliable
> even in case 4, that is, without this series applied, it's just the
> failure rate is much-much lower.
>
> Namely, in my new testing, in case 13, S3 resume failed 8 times out of
> 21 tries. (I stopped testing at the 8th failure.)
>
> Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
> exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
> #12 that failed; I continued testing and aborted the test after the
> 55th try.)
>
> So, while the series hugely amplifies the failure rate, the failure
> does exist without the series. Which is why I modified the case 4
> results in the table, and also lower-cased the word "unreliable" in
> case 13.
>
> Below I will return to this problem separately; let's go over the rest
> of the table first.
>
> * Case 17: I guess this is not a real failure, I'm just including it for
> completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
> additional SMRAM demand (see the commit message on patch V2 4/6). This
> case fails with
>
>> SmmLockBox Command - 4
>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>> SmmLockBox SmmLockBoxHandler Exit
>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>
> which is an SMRAM allocation failure. If I lower the VCPU count to
> 50x2x2, then the guest boots fine.
>
> ----*----
>
> Before I get to the S3 resume problem (which, again, reproduces without
> this series, although much less frequently), I'd like to comment on the
> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
> function, on the return value of SmmBlockingStartupThisAp(). This change
> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>
>> !mSmmMpSyncData->CpuData[1].Present
>> !mSmmMpSyncData->CpuData[2].Present
>> !mSmmMpSyncData->CpuData[3].Present
>> ...
>
> messages in the OVMF boot log, interspersed with
>
>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>
> style messages. (That is, one error message for each AP, per
> ConvertPageEntryAttribute() message.)
>
> Is this okay / intentional? The number of these messages can go up to
> several thousands and that sort of drowns out everything else in the
> log.
>
> It's also not easy to mask the message, because it's logged on the
> DEBUG_ERROR level.
>
> ----*----
>
> * Okay, so the S3 problem. Last time I suspected that the failure point
> (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
> 9A1D0, according to the OVMF log). In order to test this idea, I
> exercised this series with S3 against a Windows 8.1 guest (--> case 13
> again). The failure reproduced on the second S3 resume, with identical
> RIP, despite the Windows wakeup vector being located elsewhere (at
> 0x1000).
>
> Quoting the OVMF log leading up to the resume:
>
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 1000
>
> QEMU log (same as before):
>
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT= 000000007f294000 00000047
>> IDT= 000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT= 000000007f294000 00000047
>> IDT= 000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>
> So, we can exclude the suspicion that the problem is guest OS
> dependent.
>
> * Then I looked for the base address of the page containing the
> RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
> some firmware component might have allocated that area actually. Here
> we go:
>
>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>
> That is, the failure hits (when it hits -- not always) in the area
> where the CpuMpPei driver *borrows* memory for the startup vector of
> the APs, for the purposes of the MP service PPI. ("Wakeup" is an
> overloaded word here; the "wakeup buffer" has nothing to do with S3
> resume, it just serves for booting the APs temporarily in PEI, for
> implementing the MP service PPI.)
>
> When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
> the original contents of this area. This occurs just before
> transfering control to the guest OS wakeup vector: see the
> "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
> quoted from the OVMF log.
>
> I documented (parts of) this logic in OVMF commit
>
> https://github.com/tianocore/edk2/commit/e3e3090a959a0
>
> (see the code comments as well).
>
> * At that time, I thought to have identified a memory management bug in
> CpuMpPei; see the following discussion and bug report for details:
>
> https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
> https://bugzilla.tianocore.org/show_bug.cgi?id=67
>
> However, with the extraction / introduction of MpInitLib, this issue
> has been fixed: GetWakeupBuffer() now calls
> CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
> no longer; we shouldn't be looking there for the root cause.
>
> * Either way, I don't understand why anything would want to execute code
> in the one page that happens to host the MP services PPI startup
> buffer for APs during PEI.
>
> Not understanding the "why", I looked at the "what", and resorted to
> tracing KVM. Because the problem readily reproduces with this series
> applied (case 13), it wasn't hard to start the tracing while the guest
> was suspended, and capture just the actions that led from the
> KVM-level wakeup to the failure.
>
> The QEMU state dumps are visible above in the email. I've also
> uploaded the compressed OVMF log and the textual KVM trace here:
>
> http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>
> I sincerely hope that Paolo will have a field day with the KVM trace
> :) I managed to identify the following curiosities (remember this is
> all on the S3 resume path):
>
> * First, the VCPUs (there are four of them) enter and leave SMM in a
> really funky pattern:
>
> vcpu#0 vcpu#1 vcpu#2 vcpu#3
> ------ ------ ------ ------
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter enter
> enter | enter |
> | | | |
> leave | | |
> | | |
> enter | | |
> | | | |
> leave leave leave leave
>
> That is, first we have each VCPU enter and leave SMM in complete
> isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
> followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
> temporarily (it comes back in later), while the other three remain
> in SMM. Finally all four of them leave SMM together.
>
> After which the problem occurs.
>
> * Second, the instruction that causes things to blow up is <0f aa>,
> i.e., RSM. I have absolutely no clue why RSM is executed:
>
> (a) in the area that used to host the AP startup routine for the MP
> services PPI -- note that we also have "Transfer to 16bit OS waking
> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
> area completeley! --,
>
> (b) and why *after* all four VCPUs have just left SMM, together.
>
> * The RSM instruction is handled successfully elsewhere, for example
> when all four VCPUs leave SMM, at the bottom of the diagram above:
>
>> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
>> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
>> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
>> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>
> * The guest-phys address 7ff7f000 that we see just before the error:
>
>> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
>> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
>> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
>
> can be found higher up in the trace; namely, it is written to CR3
> several times. It's the root of the page tables.
>
> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
> qemu-monitor-command --hmp", while the guest was auto-paused after the
> crash.
>
> I cannot provide results: QEMU appeared to return a message that would
> be longer than 16MB after encoding by libvirt, and libvirt rejected
> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
> Anyway, the KVM trace, and the QEMU register dump, look consistent
> with what Paolo said about "Code=?? ?? ??...":
>
> The question marks usually mean that the page tables do not map a
> page at that address.
>
> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
> (SMM=0). We can't translate *any* guest-virtual address, as we can't
> even begin walking the page tables.
>
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-08 13:22 ` Laszlo Ersek
@ 2016-11-08 13:41 ` Yao, Jiewen
0 siblings, 0 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-08 13:41 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
Yes, it is a good idea to check "(mSmmMpSyncData->EffectiveSyncMode == SmmCpuSyncModeTradition)".
I agree.
Thank you
Yao Jiewen
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Tuesday, November 8, 2016 9:23 PM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 11/08/16 13:59, Yao, Jiewen wrote:
> HI Laszlo
>
> Thanks for the detail test result.
>
>
>
> Quick comment for the debug message:
>
> 1) For "ConvertPageEntryAttribute 0x7F92B067->0x7F92B065", I agree
> to change to DEBUG_VERBOSE, because it pure debug purpose.
>
>
>
> 2) For "!mSmmMpSyncData->CpuData[1].Present", I think people has
> interest to know startup failure reason. I would prefer to keep current
> DEBUG_ERROR.
I agree that DEBUG_ERROR is approprite for messages that can directly
relate to startup failures.
However, does this condition unavoidably imply startup failure? Because,
as demonstrated by QEMU + OVMF, a platform where an SMI does not pull
all processors into SMM at once can still work with PiSmmCpuDxeSmm,
assuming the appropriate PCD settings.
Therefore, can we make this error message conditional on
(mSmmMpSyncData->EffectiveSyncMode == SmmCpuSyncModeTradition)
? Because, "not present" is an error for the traditional sync mode, but
for the relaxed / directed mode, "not present" is expected. Isn't it?
> At same time, I understand your OVMF concern on too many debug message
> in FlushTlb. So I plan to resolve problem in another way.
>
> I will check "mSmmMpSyncData->CpuData[1].Present" before calling
> SmmBlockingStartupThisAp(). So you will not see any debug message in
> FlashTlb(). J
>
>
>
> What about your idea?
If we cannot omit (or downgrade) the message for
SmmCpuSyncModeRelaxedAp, then decreasing its frequency would be appreciated.
Thanks
Laszlo
>
>
> *From:*edk2-devel [mailto:edk2-devel-bounces@lists.01.org] *On Behalf Of
> *Laszlo Ersek
> *Sent:* Tuesday, November 8, 2016 9:22 AM
> *To:* Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
> *Cc:* Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney,
> Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini
> <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star
> <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> *Subject:* Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
>
>
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2 X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com <mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com %3cmailto:jeff.fan@intel.com>>>
>> Cc: Feng Tian <feng.tian@intel.com <mailto:feng.tian@intel.com<mailto:feng.tian@intel.com %3cmailto:feng.tian@intel.com>>>
>> Cc: Star Zeng <star.zeng@intel.com <mailto:star.zeng@intel.com<mailto:star.zeng@intel.com %3cmailto:star.zeng@intel.com>>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com %3cmailto:michael.d.kinney@intel.com>>>
>> Cc: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com<mailto:lersek@redhat.com %3cmailto:lersek@redhat.com>>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com %3cmailto:jiewen.yao@intel.com>>>
>
> I have new test results. Let's start with the table again:
>
> Legend:
>
> - "untested" means the test was not executed because the same test
> failed or proved unreliable in a less demanding configuration already,
>
> - "n/a" means a setting or test case was impossible,
>
> - "fail" and "unreliable" (lower case) are outside the scope of this
> series; they either capture the pre-series status, or are expected
> even with the series applied due to the pre-series status,
>
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
> series.
>
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
>
> series OVMF VCPU boot S3 resume
> # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
> -- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
> 1 no Ia32 64 n/a 1x2x2 pass unreliable
> 2 no Ia32 255 n/a 52x2x2 pass untested
> 3 no Ia32 255 n/a 53x2x2 unreliable untested
> 4 no Ia32X64 64 n/a 1x2x2 pass unreliable
> 5 no Ia32X64 255 n/a 52x2x2 pass untested
> 6 no Ia32X64 255 n/a 54x2x2 fail n/a
> 7 v2 Ia32 64 FALSE 1x2x2 pass untested
> 8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
> 9 v2 Ia32 255 FALSE 52x2x2 pass untested
> 10 v2 Ia32 255 FALSE 53x2x2 untested untested
> 11 v2 Ia32 255 TRUE 52x2x2 untested untested
> 12 v2 Ia32 255 TRUE 53x2x2 untested untested
> 13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
> 14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
> 15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
> 16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
> 17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
> 18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
>
> * Case 8: this test case failed with v2 as well, but this time with
> different symptoms:
>
>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitExitBootServicesCallback() done!
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>
> I didn't try to narrow this down.
>
> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
> and good. The good news is for Jiewen: this patch series does not
> cause the unreliability, it "only" amplifies it severely. The bad news
> is correspondingly for everyone else: S3 resume is actually unreliable
> even in case 4, that is, without this series applied, it's just the
> failure rate is much-much lower.
>
> Namely, in my new testing, in case 13, S3 resume failed 8 times out of
> 21 tries. (I stopped testing at the 8th failure.)
>
> Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
> exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
> #12 that failed; I continued testing and aborted the test after the
> 55th try.)
>
> So, while the series hugely amplifies the failure rate, the failure
> does exist without the series. Which is why I modified the case 4
> results in the table, and also lower-cased the word "unreliable" in
> case 13.
>
> Below I will return to this problem separately; let's go over the rest
> of the table first.
>
> * Case 17: I guess this is not a real failure, I'm just including it for
> completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
> additional SMRAM demand (see the commit message on patch V2 4/6). This
> case fails with
>
>> SmmLockBox Command - 4
>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>> SmmLockBox SmmLockBoxHandler Exit
>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>
> which is an SMRAM allocation failure. If I lower the VCPU count to
> 50x2x2, then the guest boots fine.
>
> ----*----
>
> Before I get to the S3 resume problem (which, again, reproduces without
> this series, although much less frequently), I'd like to comment on the
> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
> function, on the return value of SmmBlockingStartupThisAp(). This change
> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>
>> !mSmmMpSyncData->CpuData[1].Present
>> !mSmmMpSyncData->CpuData[2].Present
>> !mSmmMpSyncData->CpuData[3].Present
>> ...
>
> messages in the OVMF boot log, interspersed with
>
>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>
> style messages. (That is, one error message for each AP, per
> ConvertPageEntryAttribute() message.)
>
> Is this okay / intentional? The number of these messages can go up to
> several thousands and that sort of drowns out everything else in the
> log.
>
> It's also not easy to mask the message, because it's logged on the
> DEBUG_ERROR level.
>
> ----*----
>
> * Okay, so the S3 problem. Last time I suspected that the failure point
> (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
> 9A1D0, according to the OVMF log). In order to test this idea, I
> exercised this series with S3 against a Windows 8.1 guest (--> case 13
> again). The failure reproduced on the second S3 resume, with identical
> RIP, despite the Windows wakeup vector being located elsewhere (at
> 0x1000).
>
> Quoting the OVMF log leading up to the resume:
>
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 1000
>
> QEMU log (same as before):
>
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT= 000000007f294000 00000047
>> IDT= 000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT= 000000007f294000 00000047
>> IDT= 000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>
> So, we can exclude the suspicion that the problem is guest OS
> dependent.
>
> * Then I looked for the base address of the page containing the
> RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
> some firmware component might have allocated that area actually. Here
> we go:
>
>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>
> That is, the failure hits (when it hits -- not always) in the area
> where the CpuMpPei driver *borrows* memory for the startup vector of
> the APs, for the purposes of the MP service PPI. ("Wakeup" is an
> overloaded word here; the "wakeup buffer" has nothing to do with S3
> resume, it just serves for booting the APs temporarily in PEI, for
> implementing the MP service PPI.)
>
> When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
> the original contents of this area. This occurs just before
> transfering control to the guest OS wakeup vector: see the
> "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
> quoted from the OVMF log.
>
> I documented (parts of) this logic in OVMF commit
>
> https://github.com/tianocore/edk2/commit/e3e3090a959a0
>
> (see the code comments as well).
>
> * At that time, I thought to have identified a memory management bug in
> CpuMpPei; see the following discussion and bug report for details:
>
> https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
> https://bugzilla.tianocore.org/show_bug.cgi?id=67
>
> However, with the extraction / introduction of MpInitLib, this issue
> has been fixed: GetWakeupBuffer() now calls
> CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
> no longer; we shouldn't be looking there for the root cause.
>
> * Either way, I don't understand why anything would want to execute code
> in the one page that happens to host the MP services PPI startup
> buffer for APs during PEI.
>
> Not understanding the "why", I looked at the "what", and resorted to
> tracing KVM. Because the problem readily reproduces with this series
> applied (case 13), it wasn't hard to start the tracing while the guest
> was suspended, and capture just the actions that led from the
> KVM-level wakeup to the failure.
>
> The QEMU state dumps are visible above in the email. I've also
> uploaded the compressed OVMF log and the textual KVM trace here:
>
> http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>
> I sincerely hope that Paolo will have a field day with the KVM trace
> :) I managed to identify the following curiosities (remember this is
> all on the S3 resume path):
>
> * First, the VCPUs (there are four of them) enter and leave SMM in a
> really funky pattern:
>
> vcpu#0 vcpu#1 vcpu#2 vcpu#3
> ------ ------ ------ ------
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter enter
> enter | enter |
> | | | |
> leave | | |
> | | |
> enter | | |
> | | | |
> leave leave leave leave
>
> That is, first we have each VCPU enter and leave SMM in complete
> isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
> followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
> temporarily (it comes back in later), while the other three remain
> in SMM. Finally all four of them leave SMM together.
>
> After which the problem occurs.
>
> * Second, the instruction that causes things to blow up is <0f aa>,
> i.e., RSM. I have absolutely no clue why RSM is executed:
>
> (a) in the area that used to host the AP startup routine for the MP
> services PPI -- note that we also have "Transfer to 16bit OS waking
> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
> area completeley! --,
>
> (b) and why *after* all four VCPUs have just left SMM, together.
>
> * The RSM instruction is handled successfully elsewhere, for example
> when all four VCPUs leave SMM, at the bottom of the diagram above:
>
>> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
>> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
>> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
>> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>
> * The guest-phys address 7ff7f000 that we see just before the error:
>
>> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
>> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
>> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
>
> can be found higher up in the trace; namely, it is written to CR3
> several times. It's the root of the page tables.
>
> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
> qemu-monitor-command --hmp", while the guest was auto-paused after the
> crash.
>
> I cannot provide results: QEMU appeared to return a message that would
> be longer than 16MB after encoding by libvirt, and libvirt rejected
> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
> Anyway, the KVM trace, and the QEMU register dump, look consistent
> with what Paolo said about "Code=?? ?? ??...":
>
> The question marks usually mean that the page tables do not map a
> page at that address.
>
> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
> (SMM=0). We can't translate *any* guest-virtual address, as we can't
> even begin walking the page tables.
>
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org> <mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-08 1:22 ` Laszlo Ersek
2016-11-08 12:59 ` Yao, Jiewen
@ 2016-11-09 6:25 ` Yao, Jiewen
2016-11-09 11:30 ` Paolo Bonzini
2016-11-09 20:46 ` Laszlo Ersek
2016-11-09 11:23 ` Paolo Bonzini
2 siblings, 2 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-09 6:25 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
Hi Laszlo
I will fix DEBUG message issue in V3 patch.
Below is rest issues:
l Case 13: S3 fails randomly.
A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
1) We believe the dead CPU is AP. Not BSP.
The reason is that:
1.1) The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
1.2) The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
1.3) The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
2) Based upon the 1), we reviewed S3 resume AP flow.
Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
3) The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
4) I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
5) The fix, I think, should be below:
We should always put AP to protected mode, so that no paging is needed.
We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
There is no need to do more investigation. Thanks for your great help on that. :)
l Case 17 - I do not think it is a real issue, because SMM is out of resource.
l Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
SPIN_LOCK *
EFIAPI
InitializeSpinLock (
OUT SPIN_LOCK *SpinLock
)
{
ASSERT (SpinLock != NULL);
_ReadWriteBarrier();
*SpinLock = SPIN_LOCK_RELEASED;
_ReadWriteBarrier();
return SpinLock;
}
If you can have a quick check on below, that would be great.
1) Which processor triggers this ASSERT? BSP or AP.
2) Which module triggers this ASSERT? Which module contains current RIP value?
At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
If you can share a step by step to me, that would be great.
Thank you
Yao Jiewen
From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Tuesday, November 8, 2016 9:22 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>
> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2 X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
I have new test results. Let's start with the table again:
Legend:
- "untested" means the test was not executed because the same test
failed or proved unreliable in a less demanding configuration already,
- "n/a" means a setting or test case was impossible,
- "fail" and "unreliable" (lower case) are outside the scope of this
series; they either capture the pre-series status, or are expected
even with the series applied due to the pre-series status,
- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
series.
In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).
series OVMF VCPU boot S3 resume
# applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
-- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
1 no Ia32 64 n/a 1x2x2 pass unreliable
2 no Ia32 255 n/a 52x2x2 pass untested
3 no Ia32 255 n/a 53x2x2 unreliable untested
4 no Ia32X64 64 n/a 1x2x2 pass unreliable
5 no Ia32X64 255 n/a 52x2x2 pass untested
6 no Ia32X64 255 n/a 54x2x2 fail n/a
7 v2 Ia32 64 FALSE 1x2x2 pass untested
8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
9 v2 Ia32 255 FALSE 52x2x2 pass untested
10 v2 Ia32 255 FALSE 53x2x2 untested untested
11 v2 Ia32 255 TRUE 52x2x2 untested untested
12 v2 Ia32 255 TRUE 53x2x2 untested untested
13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
* Case 8: this test case failed with v2 as well, but this time with
different symptoms:
> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
I didn't try to narrow this down.
* Case 13 (the "unreliable S3 resume" case): Here the news are both bad
and good. The good news is for Jiewen: this patch series does not
cause the unreliability, it "only" amplifies it severely. The bad news
is correspondingly for everyone else: S3 resume is actually unreliable
even in case 4, that is, without this series applied, it's just the
failure rate is much-much lower.
Namely, in my new testing, in case 13, S3 resume failed 8 times out of
21 tries. (I stopped testing at the 8th failure.)
Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
#12 that failed; I continued testing and aborted the test after the
55th try.)
So, while the series hugely amplifies the failure rate, the failure
does exist without the series. Which is why I modified the case 4
results in the table, and also lower-cased the word "unreliable" in
case 13.
Below I will return to this problem separately; let's go over the rest
of the table first.
* Case 17: I guess this is not a real failure, I'm just including it for
completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
additional SMRAM demand (see the commit message on patch V2 4/6). This
case fails with
> SmmLockBox Command - 4
> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
> SmmLockBox SmmLockBoxHandler Exit
> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
which is an SMRAM allocation failure. If I lower the VCPU count to
50x2x2, then the guest boots fine.
----*----
Before I get to the S3 resume problem (which, again, reproduces without
this series, although much less frequently), I'd like to comment on the
removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
function, on the return value of SmmBlockingStartupThisAp(). This change
allows v2 to proceed past that point; however, I'm seeing a whole lot of
> !mSmmMpSyncData->CpuData[1].Present
> !mSmmMpSyncData->CpuData[2].Present
> !mSmmMpSyncData->CpuData[3].Present
> ...
messages in the OVMF boot log, interspersed with
> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
style messages. (That is, one error message for each AP, per
ConvertPageEntryAttribute() message.)
Is this okay / intentional? The number of these messages can go up to
several thousands and that sort of drowns out everything else in the
log.
It's also not easy to mask the message, because it's logged on the
DEBUG_ERROR level.
----*----
* Okay, so the S3 problem. Last time I suspected that the failure point
(RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
9A1D0, according to the OVMF log). In order to test this idea, I
exercised this series with S3 against a Windows 8.1 guest (--> case 13
again). The failure reproduced on the second S3 resume, with identical
RIP, despite the Windows wakeup vector being located elsewhere (at
0x1000).
Quoting the OVMF log leading up to the resume:
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 1000
QEMU log (same as before):
> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT= 000000007f294000 00000047
> IDT= 000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT= 000000007f294000 00000047
> IDT= 000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
So, we can exclude the suspicion that the problem is guest OS
dependent.
* Then I looked for the base address of the page containing the
RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
some firmware component might have allocated that area actually. Here
we go:
> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
That is, the failure hits (when it hits -- not always) in the area
where the CpuMpPei driver *borrows* memory for the startup vector of
the APs, for the purposes of the MP service PPI. ("Wakeup" is an
overloaded word here; the "wakeup buffer" has nothing to do with S3
resume, it just serves for booting the APs temporarily in PEI, for
implementing the MP service PPI.)
When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
the original contents of this area. This occurs just before
transfering control to the guest OS wakeup vector: see the
"EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
quoted from the OVMF log.
I documented (parts of) this logic in OVMF commit
https://github.com/tianocore/edk2/commit/e3e3090a959a0
(see the code comments as well).
* At that time, I thought to have identified a memory management bug in
CpuMpPei; see the following discussion and bug report for details:
https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
https://bugzilla.tianocore.org/show_bug.cgi?id=67
However, with the extraction / introduction of MpInitLib, this issue
has been fixed: GetWakeupBuffer() now calls
CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
no longer; we shouldn't be looking there for the root cause.
* Either way, I don't understand why anything would want to execute code
in the one page that happens to host the MP services PPI startup
buffer for APs during PEI.
Not understanding the "why", I looked at the "what", and resorted to
tracing KVM. Because the problem readily reproduces with this series
applied (case 13), it wasn't hard to start the tracing while the guest
was suspended, and capture just the actions that led from the
KVM-level wakeup to the failure.
The QEMU state dumps are visible above in the email. I've also
uploaded the compressed OVMF log and the textual KVM trace here:
http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
I sincerely hope that Paolo will have a field day with the KVM trace
:) I managed to identify the following curiosities (remember this is
all on the S3 resume path):
* First, the VCPUs (there are four of them) enter and leave SMM in a
really funky pattern:
vcpu#0 vcpu#1 vcpu#2 vcpu#3
------ ------ ------ ------
enter
|
leave
enter
|
leave
enter
|
leave
enter
|
leave
enter enter
enter | enter |
| | | |
leave | | |
| | |
enter | | |
| | | |
leave leave leave leave
That is, first we have each VCPU enter and leave SMM in complete
isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
temporarily (it comes back in later), while the other three remain
in SMM. Finally all four of them leave SMM together.
After which the problem occurs.
* Second, the instruction that causes things to blow up is <0f aa>,
i.e., RSM. I have absolutely no clue why RSM is executed:
(a) in the area that used to host the AP startup routine for the MP
services PPI -- note that we also have "Transfer to 16bit OS waking
vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
area completeley! --,
(b) and why *after* all four VCPUs have just left SMM, together.
* The RSM instruction is handled successfully elsewhere, for example
when all four VCPUs leave SMM, at the bottom of the diagram above:
> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
* The guest-phys address 7ff7f000 that we see just before the error:
> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
can be found higher up in the trace; namely, it is written to CR3
several times. It's the root of the page tables.
* The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
* I also tried the "info tlb" monitor command, via "virsh
qemu-monitor-command --hmp", while the guest was auto-paused after the
crash.
I cannot provide results: QEMU appeared to return a message that would
be longer than 16MB after encoding by libvirt, and libvirt rejected
that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
Anyway, the KVM trace, and the QEMU register dump, look consistent
with what Paolo said about "Code=?? ?? ??...":
The question marks usually mean that the page tables do not map a
page at that address.
CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
(SMM=0). We can't translate *any* guest-virtual address, as we can't
even begin walking the page tables.
Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-08 1:22 ` Laszlo Ersek
2016-11-08 12:59 ` Yao, Jiewen
2016-11-09 6:25 ` Yao, Jiewen
@ 2016-11-09 11:23 ` Paolo Bonzini
2016-11-09 15:16 ` Yao, Jiewen
2 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 11:23 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Jiewen Yao, edk2-devel, Michael D Kinney, Feng Tian, Jeff Fan,
Star Zeng
> * Second, the instruction that causes things to blow up is <0f aa>,
> i.e., RSM. I have absolutely no clue why RSM is executed:
It's probably not RSM. RSM is probably the last instruction executed
before, and it's still in the buffer because, as you said, there's no
way that you can fetch an instruction while CR3 points into SMM.
My first thought was that the MMU is for some reason out of contact
with reality, but actually the CR3 write is correct:
CPU-24446 [002] 39841.871040: kvm_exit: reason CR_ACCESS rip 0x9f05e info 103 0
CPU-24446 [002] 39841.871040: kvm_cr: cr_write 3 = 0x7ff7f000
and it's coming from the stub as well. So the second thought was that
the wakeup buffer has the wrong CR3 put into the wakeup buffer's Cr3 location.
I'm glad I kept looking because it was much more entertaining. Especially
knowing that I (probably) will not have to fix it. :)
The basic idea for debugging was to look for interesting events and
use 0x402 writes to correlate them to the debug log. For example, most
accesses to 0x9f??? are obviously not traced by KVM, but the first ones
are:
31519- CPU-24444 [006] 39841.783344: kvm_exit: reason EPT_VIOLATION rip 0x855d82 info 181 0
31520: CPU-24444 [006] 39841.783344: kvm_page_fault: address 9f000 error_code 181
280224- CPU-24444 [006] 39841.860940: kvm_exit: reason EPT_VIOLATION rip 0x7ffd0d15 info 182 0
280225: CPU-24444 [006] 39841.860940: kvm_page_fault: address 9f000 error_code 182
(The number is just the line number in the trace). Luckily your machine
didn't have EPT accessed/dirty bits, so KVM trapped both the first read
and the first write.
The read is at
WakeupBufferStart = 9F000, WakeupBufferSize = 1000
but it's not too interesting. The second is a good one to start debugging
because it's from SMRAM (though not from SMM, since the first kvm_enter_smm
happens later at 305930). So it makes sense that it writes an SMRAM CR3.
There is a write to the debug log just before, at 279993, and it writes
"SmmRestoreCpu()". As expected, the write is followed by a flurry of MSR
writes, the APIC programming at 280131, so I am pretty sure that the write to
mExchangeInfo->Cr3 comes from PrepareApStartupVector.
FWIW, I first looked at the call chain up from BackupAndPrepareWakeupBuffer,
but that led me nowhere for an hour. So I was a bit lucky indeed. :)
Anyhow, SmmRestoreCpu is the SmmS3ResumeEntryPoint for S3Resume2Pei, and
indeed, earlier in the log you have this debugging output from S3Resume2Pei:
SMM S3 CR3 = 7FF7F000
Doh, maybe I should have looked at the log before the trace. Who knows.
Anyway, the SMM_S3_RESUME_STATE is initialized by InitSmmS3ResumeState,
so the CR3 is the one that is initialized by InitSmmS3Cr3 in
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c. At this point I
was still thinking that this CR3 was wrong, but by looking at the
places where SMM is entered, and correlating that with debug log writes,
the puzzle was relatively easy to solve:
1) SMBASE relocation, done by SmmRestoreCpu:
305930: CPU-24445 [005] 39841.871264: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000
306000: CPU-24445 [005] 39841.871318: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000
306051: CPU-24446 [002] 39841.871349: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000
306108: CPU-24446 [002] 39841.871390: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000
306161: CPU-24447 [004] 39841.871421: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000
306218: CPU-24447 [004] 39841.871463: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000
306254: CPU-24444 [006] 39841.871473: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000
306311: CPU-24444 [006] 39841.871512: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
2) S3ResumeExecuteBootScript (again, the previous 0x402 write ends
at 334597 and promptly gives us a clue):
334698: CPU-24445 [005] 39841.882706: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000
334699: CPU-24447 [004] 39841.882706: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000
334741: CPU-24444 [006] 39841.882723: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000
334742: CPU-24446 [002] 39841.882724: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000
334875: CPU-24444 [006] 39841.882755: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
Here I think that it's where things go awry. The lines after
S3ResumeExecuteBootScript() are
Close all SMRAM regions before executing boot script
Lock all SMRAM regions before executing boot script
and indeed the first is at 334898, immediately after VCPU0 leaves
SMM. But, closing and locking of SMRAM happens while the APs are
still in SMM! The BSP instead goes on merrily and, after the debug
log has "PeiMpInitLib: CpuMpEndOfPeiCallback () invoked" (0x402
write ends at 364869) we have another access to 0x9f000, this time a
write. It's RestoreWakeupBuffer:
364908- CPU-24444 [006] 39841.890320: kvm_exit: reason EPT_VIOLATION rip 0x855d82 info 182 0
364909: CPU-24444 [006] 39841.890320: kvm_page_fault: address 9f000 error_code 182
Again VCPUs 1..3 are still in SMM, but the BSP couldn't care less. :)
We're only 35% through the trace but we're actually close to the end.
At 365704 OVMF says it's transferring control to the Linux's wakeup
vector, and Linux takes control real soon:
365805: CPU-24444 [006] 39841.890477: kvm_exit: reason CR_ACCESS rip 0x9aec5 info 4 0
365807: CPU-24444 [006] 39841.890477: kvm_cr: cr_write 4 = 0xb0
365817: CPU-24444 [006] 39841.890479: kvm_entry: vcpu 0
We don't even need to look closer at what happens after this point,
as we can imagine that the APs are just waiting for something to happen.
But if you do look, all you see is reads to the PMTimer, which makes sense.
And a while after, once they are fed up, they bring VCPU 0 back to SMM:
994855 CPU-24446 [000] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200
994856 CPU-24447 [002] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200
994857 CPU-24445 [005] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200
994858 CPU-24446 [000] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst)
994859 CPU-24445 [005] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst)
994860 CPU-24447 [002] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst)
994861 CPU-24446 [000] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge)
994862 CPU-24445 [005] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge)
994863 CPU-24447 [002] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge)
The rendezvous completes, the APs can finally leave SMM but all they can do
is meet their fate and crash horribly:
994869 CPU-24444 [006] 39841.982776: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a9548 info 0 800000fd
...
994880 CPU-24444 [006] 39841.982777: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000
995135: CPU-24444 [006] 39841.982821: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
995136: CPU-24445 [005] 39841.982821: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000
995137: CPU-24446 [000] 39841.982821: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000
995138: CPU-24447 [002] 39841.982821: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000
995148: CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
995152: CPU-24446 [000] 39841.982828: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
I hope you enjoyed it more than the poor APs. :)
Paolo
> (a) in the area that used to host the AP startup routine for the MP
> services PPI -- note that we also have "Transfer to 16bit OS waking
> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
> area completeley! --,
>
> (b) and why *after* all four VCPUs have just left SMM, together.
>
> * The RSM instruction is handled successfully elsewhere, for example
> when all four VCPUs leave SMM, at the bottom of the diagram above:
>
> > CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
> > CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
> > CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
> > CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>
> * The guest-phys address 7ff7f000 that we see just before the error:
>
> > CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000
> > error_code 83
> > CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000
> > error_code 83
> > CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
> > CPU-24444 [006] 39841.982827: kvm_exit: reason
> > EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> > CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
> > CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason
> > KVM_EXIT_INTERNAL_ERROR (17)
>
> can be found higher up in the trace; namely, it is written to CR3
> several times. It's the root of the page tables.
>
> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
> qemu-monitor-command --hmp", while the guest was auto-paused after the
> crash.
>
> I cannot provide results: QEMU appeared to return a message that would
> be longer than 16MB after encoding by libvirt, and libvirt rejected
> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
> Anyway, the KVM trace, and the QEMU register dump, look consistent
> with what Paolo said about "Code=?? ?? ??...":
>
> The question marks usually mean that the page tables do not map a
> page at that address.
>
> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
> (SMM=0). We can't translate *any* guest-virtual address, as we can't
> even begin walking the page tables.
>
> Thanks
> Laszlo
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 6:25 ` Yao, Jiewen
@ 2016-11-09 11:30 ` Paolo Bonzini
2016-11-09 15:01 ` Yao, Jiewen
2016-11-09 20:46 ` Laszlo Ersek
1 sibling, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 11:30 UTC (permalink / raw)
To: Yao, Jiewen, Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
Zeng, Star
On 09/11/2016 07:25, Yao, Jiewen wrote:
> Current BSP just uses its own context to initialize AP. So that AP
> takes BSP CR3, which is SMM CR3, unfortunately. After BSP initialized
> APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw,
> because X64 mode halt still need paging.
>
> 3) The error happen, once the AP receives an interrupt (for
> whatever reason), AP starts executing code. However, that that time
> the AP might not be in SMM mode. It means SMM CR3 is not available.
> And then we see this.
>
> 4) I guess we did not see the error, or this is RANDOM issue,
> because it depends on if AP receives an interrupt before BSP send
> INIT-SIPI-SIPI.
>
> 5) The fix, I think, should be below: We should always put AP to
> protected mode, so that no paging is needed. We should put AP in
> above 1M reserved memory, instead of <1M memory, because <1M memory
> is restored.
For what it's worth, this is not what I observed. What I found is that
the BSP doesn't wait for the AP rendezvous before closing SMRAM.
I'm not sure if the two things are related, but (3) would be a much
worse bug. APs should not be receiving an interrupt. Perhaps an NMI if
API is sitting in a CLI;HLT loop, but this is not what is happening.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 11:30 ` Paolo Bonzini
@ 2016-11-09 15:01 ` Yao, Jiewen
2016-11-09 15:54 ` Paolo Bonzini
0 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-09 15:01 UTC (permalink / raw)
To: Paolo Bonzini, Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
Zeng, Star
What I found is that the BSP doesn't wait for the AP rendezvous before closing SMRAM.
[Jiewen] That is a good catch. Thanks to explain.
I believe that is more convincible than AP getting interrupt. :)
We have some places where BSP talking to AP in S3.
1) CpuS3.c - EarlyInitializeCpu()
2) CpuS3.c - SmmRelocateBases()
3) CpuS3.c - InitializeCpu()
4) S3Resume.c - SendSmiIpiAllExcludingSelf()
I believe we can guarantee 1/2/3 is good, because I found we check BSP check mNumberToFinish.
4 is a risk, because there is no AP finish check. If the AP is in below 1M with CR3 in SMRAM, it will be a trouble.
Once the AP executes RSM and return to non-SMM, the CR3 is no longer valid and AP must be crashed immediately. WoW!
The fix, I believe, is same.
We should make 1) AP is in above 1M reserved memory, and 2) AP is in protected mode with paging disabled.
Thank you
Yao Jiewen
From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini
Sent: Wednesday, November 9, 2016 7:30 PM
To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 09/11/2016 07:25, Yao, Jiewen wrote:
> Current BSP just uses its own context to initialize AP. So that AP
> takes BSP CR3, which is SMM CR3, unfortunately. After BSP initialized
> APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw,
> because X64 mode halt still need paging.
>
> 3) The error happen, once the AP receives an interrupt (for
> whatever reason), AP starts executing code. However, that that time
> the AP might not be in SMM mode. It means SMM CR3 is not available.
> And then we see this.
>
> 4) I guess we did not see the error, or this is RANDOM issue,
> because it depends on if AP receives an interrupt before BSP send
> INIT-SIPI-SIPI.
>
> 5) The fix, I think, should be below: We should always put AP to
> protected mode, so that no paging is needed. We should put AP in
> above 1M reserved memory, instead of <1M memory, because <1M memory
> is restored.
For what it's worth, this is not what I observed. What I found is that
the BSP doesn't wait for the AP rendezvous before closing SMRAM.
I'm not sure if the two things are related, but (3) would be a much
worse bug. APs should not be receiving an interrupt. Perhaps an NMI if
API is sitting in a CLI;HLT loop, but this is not what is happening.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 11:23 ` Paolo Bonzini
@ 2016-11-09 15:16 ` Yao, Jiewen
0 siblings, 0 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-09 15:16 UTC (permalink / raw)
To: Paolo Bonzini, Laszlo Ersek
Cc: edk2-devel@ml01.01.org, Kinney, Michael D, Tian, Feng, Fan, Jeff,
Zeng, Star
Great work! I appreciate that.
It seems the slow emulated SMM keeps exposing the corner case on the code. :)
We will fix the bad AP in another patch.
Thank you
Yao Jiewen
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Wednesday, November 9, 2016 7:24 PM
To: Laszlo Ersek <lersek@redhat.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> * Second, the instruction that causes things to blow up is <0f aa>,
> i.e., RSM. I have absolutely no clue why RSM is executed:
It's probably not RSM. RSM is probably the last instruction executed
before, and it's still in the buffer because, as you said, there's no
way that you can fetch an instruction while CR3 points into SMM.
My first thought was that the MMU is for some reason out of contact
with reality, but actually the CR3 write is correct:
CPU-24446 [002] 39841.871040: kvm_exit: reason CR_ACCESS rip 0x9f05e info 103 0
CPU-24446 [002] 39841.871040: kvm_cr: cr_write 3 = 0x7ff7f000
and it's coming from the stub as well. So the second thought was that
the wakeup buffer has the wrong CR3 put into the wakeup buffer's Cr3 location.
I'm glad I kept looking because it was much more entertaining. Especially
knowing that I (probably) will not have to fix it. :)
The basic idea for debugging was to look for interesting events and
use 0x402 writes to correlate them to the debug log. For example, most
accesses to 0x9f??? are obviously not traced by KVM, but the first ones
are:
31519- CPU-24444 [006] 39841.783344: kvm_exit: reason EPT_VIOLATION rip 0x855d82 info 181 0
31520: CPU-24444 [006] 39841.783344: kvm_page_fault: address 9f000 error_code 181
280224- CPU-24444 [006] 39841.860940: kvm_exit: reason EPT_VIOLATION rip 0x7ffd0d15 info 182 0
280225: CPU-24444 [006] 39841.860940: kvm_page_fault: address 9f000 error_code 182
(The number is just the line number in the trace). Luckily your machine
didn't have EPT accessed/dirty bits, so KVM trapped both the first read
and the first write.
The read is at
WakeupBufferStart = 9F000, WakeupBufferSize = 1000
but it's not too interesting. The second is a good one to start debugging
because it's from SMRAM (though not from SMM, since the first kvm_enter_smm
happens later at 305930). So it makes sense that it writes an SMRAM CR3.
There is a write to the debug log just before, at 279993, and it writes
"SmmRestoreCpu()". As expected, the write is followed by a flurry of MSR
writes, the APIC programming at 280131, so I am pretty sure that the write to
mExchangeInfo->Cr3 comes from PrepareApStartupVector.
FWIW, I first looked at the call chain up from BackupAndPrepareWakeupBuffer,
but that led me nowhere for an hour. So I was a bit lucky indeed. :)
Anyhow, SmmRestoreCpu is the SmmS3ResumeEntryPoint for S3Resume2Pei, and
indeed, earlier in the log you have this debugging output from S3Resume2Pei:
SMM S3 CR3 = 7FF7F000
Doh, maybe I should have looked at the log before the trace. Who knows.
Anyway, the SMM_S3_RESUME_STATE is initialized by InitSmmS3ResumeState,
so the CR3 is the one that is initialized by InitSmmS3Cr3 in
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c. At this point I
was still thinking that this CR3 was wrong, but by looking at the
places where SMM is entered, and correlating that with debug log writes,
the puzzle was relatively easy to solve:
1) SMBASE relocation, done by SmmRestoreCpu:
305930: CPU-24445 [005] 39841.871264: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000
306000: CPU-24445 [005] 39841.871318: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000
306051: CPU-24446 [002] 39841.871349: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000
306108: CPU-24446 [002] 39841.871390: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000
306161: CPU-24447 [004] 39841.871421: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000
306218: CPU-24447 [004] 39841.871463: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000
306254: CPU-24444 [006] 39841.871473: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000
306311: CPU-24444 [006] 39841.871512: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
2) S3ResumeExecuteBootScript (again, the previous 0x402 write ends
at 334597 and promptly gives us a clue):
334698: CPU-24445 [005] 39841.882706: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000
334699: CPU-24447 [004] 39841.882706: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000
334741: CPU-24444 [006] 39841.882723: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000
334742: CPU-24446 [002] 39841.882724: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000
334875: CPU-24444 [006] 39841.882755: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
Here I think that it's where things go awry. The lines after
S3ResumeExecuteBootScript() are
Close all SMRAM regions before executing boot script
Lock all SMRAM regions before executing boot script
and indeed the first is at 334898, immediately after VCPU0 leaves
SMM. But, closing and locking of SMRAM happens while the APs are
still in SMM! The BSP instead goes on merrily and, after the debug
log has "PeiMpInitLib: CpuMpEndOfPeiCallback () invoked" (0x402
write ends at 364869) we have another access to 0x9f000, this time a
write. It's RestoreWakeupBuffer:
364908- CPU-24444 [006] 39841.890320: kvm_exit: reason EPT_VIOLATION rip 0x855d82 info 182 0
364909: CPU-24444 [006] 39841.890320: kvm_page_fault: address 9f000 error_code 182
Again VCPUs 1..3 are still in SMM, but the BSP couldn't care less. :)
We're only 35% through the trace but we're actually close to the end.
At 365704 OVMF says it's transferring control to the Linux's wakeup
vector, and Linux takes control real soon:
365805: CPU-24444 [006] 39841.890477: kvm_exit: reason CR_ACCESS rip 0x9aec5 info 4 0
365807: CPU-24444 [006] 39841.890477: kvm_cr: cr_write 4 = 0xb0
365817: CPU-24444 [006] 39841.890479: kvm_entry: vcpu 0
We don't even need to look closer at what happens after this point,
as we can imagine that the APs are just waiting for something to happen.
But if you do look, all you see is reads to the PMTimer, which makes sense.
And a while after, once they are fed up, they bring VCPU 0 back to SMM:
994855 CPU-24446 [000] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200
994856 CPU-24447 [002] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200
994857 CPU-24445 [005] 39841.982774: kvm_apic: apic_write APIC_ICR = 0x4200
994858 CPU-24446 [000] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst)
994859 CPU-24445 [005] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst)
994860 CPU-24447 [002] 39841.982774: kvm_apic_ipi: dst 0 vec 0 (SMI|physical|assert|edge|dst)
994861 CPU-24446 [000] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge)
994862 CPU-24445 [005] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge)
994863 CPU-24447 [002] 39841.982775: kvm_apic_accept_irq: apicid 0 vec 0 (SMI|edge)
The rendezvous completes, the APs can finally leave SMM but all they can do
is meet their fate and crash horribly:
994869 CPU-24444 [006] 39841.982776: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a9548 info 0 800000fd
...
994880 CPU-24444 [006] 39841.982777: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000
995135: CPU-24444 [006] 39841.982821: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000
995136: CPU-24445 [005] 39841.982821: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000
995137: CPU-24446 [000] 39841.982821: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000
995138: CPU-24447 [002] 39841.982821: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000
995148: CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
995152: CPU-24446 [000] 39841.982828: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
I hope you enjoyed it more than the poor APs. :)
Paolo
> (a) in the area that used to host the AP startup routine for the MP
> services PPI -- note that we also have "Transfer to 16bit OS waking
> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
> area completeley! --,
>
> (b) and why *after* all four VCPUs have just left SMM, together.
>
> * The RSM instruction is handled successfully elsewhere, for example
> when all four VCPUs leave SMM, at the bottom of the diagram above:
>
> > CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
> > CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
> > CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
> > CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>
> * The guest-phys address 7ff7f000 that we see just before the error:
>
> > CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000
> > error_code 83
> > CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000
> > error_code 83
> > CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
> > CPU-24444 [006] 39841.982827: kvm_exit: reason
> > EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> > CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
> > CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason
> > KVM_EXIT_INTERNAL_ERROR (17)
>
> can be found higher up in the trace; namely, it is written to CR3
> several times. It's the root of the page tables.
>
> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
> qemu-monitor-command --hmp", while the guest was auto-paused after the
> crash.
>
> I cannot provide results: QEMU appeared to return a message that would
> be longer than 16MB after encoding by libvirt, and libvirt rejected
> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
> Anyway, the KVM trace, and the QEMU register dump, look consistent
> with what Paolo said about "Code=?? ?? ??...":
>
> The question marks usually mean that the page tables do not map a
> page at that address.
>
> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
> (SMM=0). We can't translate *any* guest-virtual address, as we can't
> even begin walking the page tables.
>
> Thanks
> Laszlo
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 15:01 ` Yao, Jiewen
@ 2016-11-09 15:54 ` Paolo Bonzini
2016-11-09 16:06 ` Paolo Bonzini
2016-11-09 22:28 ` Laszlo Ersek
0 siblings, 2 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 15:54 UTC (permalink / raw)
To: Yao, Jiewen, Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
Zeng, Star
On 09/11/2016 16:01, Yao, Jiewen wrote:
> 1) CpuS3.c – EarlyInitializeCpu()
> 2) CpuS3.c – SmmRelocateBases()
> 3) CpuS3.c – InitializeCpu()
> 4) S3Resume.c – SendSmiIpiAllExcludingSelf()
>
> I believe we can guarantee 1/2/3 is good, because I found we check BSP
> check mNumberToFinish.
>
> 4 is a risk, because there is no AP finish check. If the AP is in below
> 1M with CR3 in SMRAM, it will be a trouble.
>
> Once the AP executes RSM and return to non-SMM, the CR3 is no longer
> valid and AP must be crashed immediately. WoW!
>
> The fix, I believe, is same.
>
> We should make 1) AP is in above 1M reserved memory,
Is this because of the NMI case?
> and 2) AP is in protected mode with paging disabled.
It is not clear to me what the (4) SIPI done is there for, and why it is
triggered in S3Resume.c rather than CpuS3.c. And why does it take so
much for APs to complete it?
That said, by the time you close and lock SMRAM, you aren't even sure
that you have reached the cli;hlt loop in the rendezvous funnel. In
practice you will be there, but there is still a theoretical race.
InterlockedDecrement (&mNumberToFinish) should be moved from
EarlyMPRendezvousProcedure/MPRendezvousProcedure to GoToSleep, and
GoToSleep should leave 64-bit mode before doing it. This will fix the
S3 bug as well. It's only needed for 64-bit mode, but it is doable for
the Ia32 version as well.
Perhaps EarlyMPRendezvousProcedure and MPRendezvousProcedure can return
&mNumberToFinish; what do you think?
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 15:54 ` Paolo Bonzini
@ 2016-11-09 16:06 ` Paolo Bonzini
2016-11-09 22:28 ` Laszlo Ersek
1 sibling, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 16:06 UTC (permalink / raw)
To: Yao, Jiewen, Laszlo Ersek
Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
Fan, Jeff
On 09/11/2016 16:54, Paolo Bonzini wrote:
>> > and 2) AP is in protected mode with paging disabled.
> It is not clear to me what the (4) SIPI done is there for, and why it is
> triggered in S3Resume.c rather than CpuS3.c. And why does it take so
> much for APs to complete it?
SMI of course, not SIPI.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 6:25 ` Yao, Jiewen
2016-11-09 11:30 ` Paolo Bonzini
@ 2016-11-09 20:46 ` Laszlo Ersek
2016-11-10 10:41 ` Yao, Jiewen
1 sibling, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-09 20:46 UTC (permalink / raw)
To: Yao, Jiewen
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
On 11/09/16 07:25, Yao, Jiewen wrote:
> Hi Laszlo
> I will fix DEBUG message issue in V3 patch.
>
> Below is rest issues:
>
>
> l Case 13: S3 fails randomly.
> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>
>
> 1) We believe the dead CPU is AP. Not BSP.
> The reason is that:
>
> 1.1) The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>
> 1.2) The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>
> 1.3) The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>
>
> 2) Based upon the 1), we reviewed S3 resume AP flow.
> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>
>
> 3) The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>
>
> 4) I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>
>
> 5) The fix, I think, should be below:
> We should always put AP to protected mode, so that no paging is needed.
> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>
>
> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>
> There is no need to do more investigation. Thanks for your great help on that. :)
Thank you for your help!
I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is
BSP exits SMM and closes SMRAM on the S3 resume path before
meeting with AP(s)
I hope the title is mostly right. I didn't add any other details (I
haven't gone through the thread in detail yet, and without that I can't
even write up a semi-reasonable report myself). Instead, I referenced
this message of yours in the report, and I also linked Paolo's analysis
from elsewhere in the thread. I hope this will do for the report.
(Also, thank you Paolo, from the amazing analysis -- I haven't digested
it yet, but I can already tell it's amazing! :))
> l Case 17 - I do not think it is a real issue, because SMM is out of resource.
>
>
> l Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
> SPIN_LOCK *
> EFIAPI
> InitializeSpinLock (
> OUT SPIN_LOCK *SpinLock
> )
> {
> ASSERT (SpinLock != NULL);
>
> _ReadWriteBarrier();
> *SpinLock = SPIN_LOCK_RELEASED;
> _ReadWriteBarrier();
>
> return SpinLock;
> }
>
> If you can have a quick check on below, that would be great.
>
> 1) Which processor triggers this ASSERT? BSP or AP.
>
> 2) Which module triggers this ASSERT? Which module contains current RIP value?
First, one additional piece of info I have learned is that the issue
does not always present itself. Sometimes the boot just works fine,
other times the assert fires.
Using the QEMU monitor, I managed to get the following information with
the "info cpus" command:
* CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
CPU #3: pc=0x000000007ffd17ca thread_id=7838
VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
point into SMRAM again.
In the OVMF log, I see
Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
PiSmmCpuDxeSmm.efi
So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
entry point, 0x8577, 0x253 bytes less).
Running
objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug
first I see confirmation that
start address 0x00000253
and then
000087bd <CpuDeadLoop>:
VOID
EFIAPI
CpuDeadLoop (
VOID
)
{
87bd: 55 push %ebp
87be: 89 e5 mov %esp,%ebp
87c0: 83 ec 10 sub $0x10,%esp
volatile UINTN Index;
for (Index = 0; Index == 0;);
87c3: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
87ca: 8b 45 fc mov -0x4(%ebp),%eax <-- HERE
87cd: 85 c0 test %eax,%eax
87cf: 74 f9 je 87ca <CpuDeadLoop+0xd>
}
87d1: c9 leave
87d2: c3 ret
This seems consistent with an assertion failure.
I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
like a possible caller:
//
// The BUSY lock is initialized to Released state. This needs to
// be done early enough to be ready for BSP's SmmStartupThisAp()
// call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
// called immediately after AP's present flag is detected.
//
InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
Just a guess, of course.
> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
> If you can share a step by step to me, that would be great.
(1) Grab a host computer with a CPU that supports VMX and EPT.
(2) Download and install Fedora 24 (for example):
https://getfedora.org/en/workstation/download/
http://docs.fedoraproject.org/install-guide
(3) Install the "qemu-system-x86" package with DNF
dnf install qemu-system-x86
(4) clone edk2 with git
(5) embed OpenSSL optionally (for secure boot); see
"CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"
(6) build OVMF:
source edksetup.sh
make -C "$EDK_TOOLS_PATH"
# Ia32
build \
-a IA32 \
-p OvmfPkg/OvmfPkgIa32.dsc \
-D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
-t GCC5 -b DEBUG
# Ia32X64
build \
-a IA32 -a X64 \
-p OvmfPkg/OvmfPkgIa32X64.dsc \
-D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
-t GCC5 -b DEBUG
(7) Create disk images:
qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
-o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G
qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
-o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G
(8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
that you downloaded already (the ISO image).
For 32-bit guest OS, this one used to work:
https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/
minimally the 20141209 release. Hm... actually, I think the maintainer
of that image has discontinued the downloadable files :(
So, I don't know what 32-bit UEFI OS to recommend for testing.
32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
times, with some help from a Microsoft developer, but we couldn't solve
it), so I can't recommend Windows as an alternative.
Perhaps you can use
https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html
as a 32-bit guest OS, I never tried.
(9) Anyway, once you have an installer ISO, set the "ISO" environment
variable to the ISO image's full pathname, and then run QEMU like this:
# Settings for Ia32 only:
ISO=...
DISK=.../disk-ia32.img
FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
VARS=vars-32.fd
QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
DEBUG=debug-32.log
# Settings for Ia32X64 only:
ISO=...
DISK=.../disk-ia32x64.img
FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
VARS=vars-3264.fd
QEMU_COMMAND=qemu-system-x86_64
DEBUG=debug-3264.log
# Common commands for both target arches:
# create variable store from varstore template
# if the former doesn't exist yet
if ! [ -e "$VARS" ]; then
cp -- "$TEMPLATE" "$VARS"
fi
$QEMU_COMMAND \
-machine q35,smm=on,accel=kvm \
-m 4096 \
-smp sockets=1,cores=2,threads=2 \
-global driver=cfi.pflash01,property=secure,value=on \
-drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
-drive if=pflash,format=raw,unit=1,file=${VARS} \
\
-chardev file,id=debugfile,path=$DEBUG \
-device isa-debugcon,iobase=0x402,chardev=debugfile \
\
-chardev stdio,id=char0,signal=off,mux=on \
-mon chardev=char0,mode=readline,default \
-serial chardev:char0 \
\
-drive id=iso,if=none,format=raw,readonly,file=$ISO \
-drive id=disk,if=none,format=qcow2,file=$DISK \
\
-device virtio-scsi-pci,id=scsi0 \
-device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
-device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
\
-device VGA
This will capture the OVMF debug output in the $DEBUG file. Also, the
terminal where you run the command can be switched between the guest's
serial console and the QEMU monitor with [Ctrl-A C].
Thanks
Laszlo
>
> Thank you
> Yao Jiewen
>
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Tuesday, November 8, 2016 9:22 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2 X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> I have new test results. Let's start with the table again:
>
> Legend:
>
> - "untested" means the test was not executed because the same test
> failed or proved unreliable in a less demanding configuration already,
>
> - "n/a" means a setting or test case was impossible,
>
> - "fail" and "unreliable" (lower case) are outside the scope of this
> series; they either capture the pre-series status, or are expected
> even with the series applied due to the pre-series status,
>
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
> series.
>
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
>
> series OVMF VCPU boot S3 resume
> # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
> -- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
> 1 no Ia32 64 n/a 1x2x2 pass unreliable
> 2 no Ia32 255 n/a 52x2x2 pass untested
> 3 no Ia32 255 n/a 53x2x2 unreliable untested
> 4 no Ia32X64 64 n/a 1x2x2 pass unreliable
> 5 no Ia32X64 255 n/a 52x2x2 pass untested
> 6 no Ia32X64 255 n/a 54x2x2 fail n/a
> 7 v2 Ia32 64 FALSE 1x2x2 pass untested
> 8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
> 9 v2 Ia32 255 FALSE 52x2x2 pass untested
> 10 v2 Ia32 255 FALSE 53x2x2 untested untested
> 11 v2 Ia32 255 TRUE 52x2x2 untested untested
> 12 v2 Ia32 255 TRUE 53x2x2 untested untested
> 13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
> 14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
> 15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
> 16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
> 17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
> 18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
>
> * Case 8: this test case failed with v2 as well, but this time with
> different symptoms:
>
>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitExitBootServicesCallback() done!
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>
> I didn't try to narrow this down.
>
> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
> and good. The good news is for Jiewen: this patch series does not
> cause the unreliability, it "only" amplifies it severely. The bad news
> is correspondingly for everyone else: S3 resume is actually unreliable
> even in case 4, that is, without this series applied, it's just the
> failure rate is much-much lower.
>
> Namely, in my new testing, in case 13, S3 resume failed 8 times out of
> 21 tries. (I stopped testing at the 8th failure.)
>
> Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
> exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
> #12 that failed; I continued testing and aborted the test after the
> 55th try.)
>
> So, while the series hugely amplifies the failure rate, the failure
> does exist without the series. Which is why I modified the case 4
> results in the table, and also lower-cased the word "unreliable" in
> case 13.
>
> Below I will return to this problem separately; let's go over the rest
> of the table first.
>
> * Case 17: I guess this is not a real failure, I'm just including it for
> completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
> additional SMRAM demand (see the commit message on patch V2 4/6). This
> case fails with
>
>> SmmLockBox Command - 4
>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>> SmmLockBox SmmLockBoxHandler Exit
>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>
> which is an SMRAM allocation failure. If I lower the VCPU count to
> 50x2x2, then the guest boots fine.
>
> ----*----
>
> Before I get to the S3 resume problem (which, again, reproduces without
> this series, although much less frequently), I'd like to comment on the
> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
> function, on the return value of SmmBlockingStartupThisAp(). This change
> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>
>> !mSmmMpSyncData->CpuData[1].Present
>> !mSmmMpSyncData->CpuData[2].Present
>> !mSmmMpSyncData->CpuData[3].Present
>> ...
>
> messages in the OVMF boot log, interspersed with
>
>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>
> style messages. (That is, one error message for each AP, per
> ConvertPageEntryAttribute() message.)
>
> Is this okay / intentional? The number of these messages can go up to
> several thousands and that sort of drowns out everything else in the
> log.
>
> It's also not easy to mask the message, because it's logged on the
> DEBUG_ERROR level.
>
> ----*----
>
> * Okay, so the S3 problem. Last time I suspected that the failure point
> (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
> 9A1D0, according to the OVMF log). In order to test this idea, I
> exercised this series with S3 against a Windows 8.1 guest (--> case 13
> again). The failure reproduced on the second S3 resume, with identical
> RIP, despite the Windows wakeup vector being located elsewhere (at
> 0x1000).
>
> Quoting the OVMF log leading up to the resume:
>
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 1000
>
> QEMU log (same as before):
>
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT= 000000007f294000 00000047
>> IDT= 000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT= 000000007f294000 00000047
>> IDT= 000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>
> So, we can exclude the suspicion that the problem is guest OS
> dependent.
>
> * Then I looked for the base address of the page containing the
> RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
> some firmware component might have allocated that area actually. Here
> we go:
>
>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>
> That is, the failure hits (when it hits -- not always) in the area
> where the CpuMpPei driver *borrows* memory for the startup vector of
> the APs, for the purposes of the MP service PPI. ("Wakeup" is an
> overloaded word here; the "wakeup buffer" has nothing to do with S3
> resume, it just serves for booting the APs temporarily in PEI, for
> implementing the MP service PPI.)
>
> When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
> the original contents of this area. This occurs just before
> transfering control to the guest OS wakeup vector: see the
> "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
> quoted from the OVMF log.
>
> I documented (parts of) this logic in OVMF commit
>
> https://github.com/tianocore/edk2/commit/e3e3090a959a0
>
> (see the code comments as well).
>
> * At that time, I thought to have identified a memory management bug in
> CpuMpPei; see the following discussion and bug report for details:
>
> https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
> https://bugzilla.tianocore.org/show_bug.cgi?id=67
>
> However, with the extraction / introduction of MpInitLib, this issue
> has been fixed: GetWakeupBuffer() now calls
> CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
> no longer; we shouldn't be looking there for the root cause.
>
> * Either way, I don't understand why anything would want to execute code
> in the one page that happens to host the MP services PPI startup
> buffer for APs during PEI.
>
> Not understanding the "why", I looked at the "what", and resorted to
> tracing KVM. Because the problem readily reproduces with this series
> applied (case 13), it wasn't hard to start the tracing while the guest
> was suspended, and capture just the actions that led from the
> KVM-level wakeup to the failure.
>
> The QEMU state dumps are visible above in the email. I've also
> uploaded the compressed OVMF log and the textual KVM trace here:
>
> http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>
> I sincerely hope that Paolo will have a field day with the KVM trace
> :) I managed to identify the following curiosities (remember this is
> all on the S3 resume path):
>
> * First, the VCPUs (there are four of them) enter and leave SMM in a
> really funky pattern:
>
> vcpu#0 vcpu#1 vcpu#2 vcpu#3
> ------ ------ ------ ------
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter enter
> enter | enter |
> | | | |
> leave | | |
> | | |
> enter | | |
> | | | |
> leave leave leave leave
>
> That is, first we have each VCPU enter and leave SMM in complete
> isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
> followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
> temporarily (it comes back in later), while the other three remain
> in SMM. Finally all four of them leave SMM together.
>
> After which the problem occurs.
>
> * Second, the instruction that causes things to blow up is <0f aa>,
> i.e., RSM. I have absolutely no clue why RSM is executed:
>
> (a) in the area that used to host the AP startup routine for the MP
> services PPI -- note that we also have "Transfer to 16bit OS waking
> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
> area completeley! --,
>
> (b) and why *after* all four VCPUs have just left SMM, together.
>
> * The RSM instruction is handled successfully elsewhere, for example
> when all four VCPUs leave SMM, at the bottom of the diagram above:
>
>> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
>> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
>> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
>> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>
> * The guest-phys address 7ff7f000 that we see just before the error:
>
>> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
>> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
>> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
>
> can be found higher up in the trace; namely, it is written to CR3
> several times. It's the root of the page tables.
>
> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
> qemu-monitor-command --hmp", while the guest was auto-paused after the
> crash.
>
> I cannot provide results: QEMU appeared to return a message that would
> be longer than 16MB after encoding by libvirt, and libvirt rejected
> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
> Anyway, the KVM trace, and the QEMU register dump, look consistent
> with what Paolo said about "Code=?? ?? ??...":
>
> The question marks usually mean that the page tables do not map a
> page at that address.
>
> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
> (SMM=0). We can't translate *any* guest-virtual address, as we can't
> even begin walking the page tables.
>
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 15:54 ` Paolo Bonzini
2016-11-09 16:06 ` Paolo Bonzini
@ 2016-11-09 22:28 ` Laszlo Ersek
2016-11-09 22:59 ` Paolo Bonzini
1 sibling, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-09 22:28 UTC (permalink / raw)
To: Paolo Bonzini, Yao, Jiewen
Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
Fan, Jeff
On 11/09/16 16:54, Paolo Bonzini wrote:
>
>
> On 09/11/2016 16:01, Yao, Jiewen wrote:
>> 1) CpuS3.c – EarlyInitializeCpu()
>> 2) CpuS3.c – SmmRelocateBases()
>> 3) CpuS3.c – InitializeCpu()
>> 4) S3Resume.c – SendSmiIpiAllExcludingSelf()
>>
>> I believe we can guarantee 1/2/3 is good, because I found we check BSP
>> check mNumberToFinish.
>>
>> 4 is a risk, because there is no AP finish check. If the AP is in below
>> 1M with CR3 in SMRAM, it will be a trouble.
>>
>> Once the AP executes RSM and return to non-SMM, the CR3 is no longer
>> valid and AP must be crashed immediately. WoW!
>>
>> The fix, I believe, is same.
>>
>> We should make 1) AP is in above 1M reserved memory,
>
> Is this because of the NMI case?
>
>> and 2) AP is in protected mode with paging disabled.
>
> It is not clear to me what the (4) SIPI done is there for,
After reading through your great analysis with a keen focus :), I wanted
to ask the exact same thing. I managed to follow / recall the control
flow mostly, but when I saw that SMI, I didn't (and don't) understand
that it was (is) good for.
After all, we're not setting up any request parameters etc. for the
processors to handle in SMM. What's happening there?
Another question I have -- and I feel I should really know it, but I
don't... -- is *why* the APs are executing code from the page at
0x9f000. When the BSP exits SMM, replays the S3 boot script, and finally
finishes off the PEI phase and restores the page at 0x9f000, the APs
seem to be affected -- but why do they care about that page at all? That
page never belonged to PiSmmCpuSmmDxe, it belongs CpuMpPei.
I do understand that the CR3 registers for the APs point into SMRAM,
while they wait for the BSP in SMM. Thus, the BSP closing/locking down
SMRAM, in S3ResumeExecuteBootScript(), breaks the APs -- that's
understandable.
What I don't get is, again:
(1) why S3ResumeExecuteBootScript() raises SMIs at all, before locking
down SMRAM,
(2) what the AP SMM routine (from PiSmmCpuDxeSmm) has to do with the
Wakeup buffer that is allocated and used *solely* by CpuMpPei.
I could be utterly and inexcusably wrong, but I think that the
RIP=0x9f0fd symptom is a red herring. I wrote,
> vcpu#0 vcpu#1 vcpu#2 vcpu#3
> ------ ------ ------ ------
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter enter
> enter | enter |
> | | | |
> leave | | |
> <--------------------------- BAD
> enter | | |
> | | | |
> leave leave leave leave
Thanks to Paolo's analysis, we now know where that gap comes from and
what it does (so I marked it with BAD now) -- in the gap, the BSP leaves
SMM alone, closes/locks SMRAM, finishes off the PEI phase, restores the
contents of the borrowed wakeup buffer of CpuMpPei, and even transfers
control to Linux's S3 resume vector.
I don't understand why we don't get horrible faults on the APs
*immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
page tables, executable code, everything, will read as 0xff on QEMU. How
can the APs continue in SMM long enough to
(a) time out and pull the BSP back into SMM,
(b) complete the rendezvous and exit SMM?
... Anyway, I think I do have an idea for question (2). Namely, when the
BSP starts executing S3ResumeExecuteBootScript(), in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" -- for which the cue
is ultimately given by the DXE IPL PEIM, as the last action in PEI --,
CpuMpPei has been dispatched already! And, CpuMpPei has placed all the
APs into their comfy HLT loops, so that the MP services PPI could serve
multiprocessing requests.
Thus, the APs are executing code (the HLT loop) from CpuMpPei's wakeup
buffer on page 0x9f000 as *normal business*. That is where the SMI,
raised by the BSP in S3ResumeExecuteBootScript(), rips them out of. And
that's also where KVM tries to return them to, once they finish in SMM
and execute RSM. Too bad by the time KVM returns them there, the wakeup
page has been restored by the BSP.
In other words, the address RIP=0x9f0fd *is* a red herring, that's
simply where the APs happened to be when the SMI was raised, and where
KVM remembers to return the APs to, once the APs execute RSM.
I think I sort of answered question (2). (Apologies if Paolo and Jiewen
explained the exact same thing before; I had to spell it out for
myself.) That leaves question (1) open. Why enter SMM in
S3ResumeExecuteBootScript() at all?
Anyway, I think if the BSP and the APs are properly synchronized around
the SMI injections in S3ResumeExecuteBootScript(), then this bug is
fixed. In that case, the APs' RSMs will restore the full context for the
APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
buffer -- but the APs will sleep on), and then Linux will bring up the
APs, after taking control.
Thanks
Laszlo
> and why it is
> triggered in S3Resume.c rather than CpuS3.c. And why does it take so
> much for APs to complete it?
>
> That said, by the time you close and lock SMRAM, you aren't even sure
> that you have reached the cli;hlt loop in the rendezvous funnel. In
> practice you will be there, but there is still a theoretical race.
>
> InterlockedDecrement (&mNumberToFinish) should be moved from
> EarlyMPRendezvousProcedure/MPRendezvousProcedure to GoToSleep, and
> GoToSleep should leave 64-bit mode before doing it. This will fix the
> S3 bug as well. It's only needed for 64-bit mode, but it is doable for
> the Ia32 version as well.
>
> Perhaps EarlyMPRendezvousProcedure and MPRendezvousProcedure can return
> &mNumberToFinish; what do you think?
>
> Paolo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 22:28 ` Laszlo Ersek
@ 2016-11-09 22:59 ` Paolo Bonzini
2016-11-09 23:27 ` Laszlo Ersek
` (2 more replies)
0 siblings, 3 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 22:59 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Jiewen Yao, Michael D Kinney, Feng Tian, edk2-devel, Star Zeng,
Jeff Fan
> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.
This I can answer. :)
The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop. When the AP exits SMM, it is in the JMP instruction.
As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?). After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like
POP EAX ; pop return address
POP EAX ; pop Context1 which is &mNumberToFinish
DEC [EAX]
1: CLI
HLT
JMP 1
> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.
I wouldn't call it a red herring. After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.
What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.
> > vcpu#0 vcpu#1 vcpu#2 vcpu#3
> > ------ ------ ------ ------
> > enter enter
> > enter | enter |
> > | | | |
> > leave | | |
> > <--------------------------- BAD
> > enter | | |
> > | | | |
> > leave leave leave leave
>
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
>
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?
Because the "0xff" only applies when you're out of SMM. The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed". (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).
> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
>
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 22:59 ` Paolo Bonzini
@ 2016-11-09 23:27 ` Laszlo Ersek
2016-11-10 1:13 ` Yao, Jiewen
2016-11-10 0:49 ` Yao, Jiewen
2016-11-10 0:50 ` Yao, Jiewen
2 siblings, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-09 23:27 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Jiewen Yao, Michael D Kinney, Feng Tian, edk2-devel, Star Zeng,
Jeff Fan
On 11/09/16 23:59, Paolo Bonzini wrote:
>
>> Another question I have -- and I feel I should really know it, but I
>> don't... -- is *why* the APs are executing code from the page at
>> 0x9f000.
>
> This I can answer. :)
>
> The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
> loop. When the AP exits SMM, it is in the JMP instruction.
>
> As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
> in the 0-640K area (perhaps it could be in what your doc calls the
> "permanent PEI memory for the S3 resume path"?). After thinking a
> bit more about it, it seems simplest to me if CpuS3.c just uses
> SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
> to jump to a small stub like
>
> POP EAX ; pop return address
> POP EAX ; pop Context1 which is &mNumberToFinish
> DEC [EAX]
> 1: CLI
> HLT
> JMP 1
>
>> I could be utterly and inexcusably wrong, but I think that the
>> RIP=0x9f0fd symptom is a red herring.
>
> I wouldn't call it a red herring. After all, CR3 points to SMM
> exactly because the CR3 that was set up for the 0x9f000 stub is
> CpuS3.c's SMRAM page table root.
Hrmpf. The stub at 0x9f000 does not belong to PiSmmCpuDxeSmm. Regardless
of the boot path (normal boot or S3 resume), it belongs to CpuMpPei, and
it partakes in the implementation of the MP services PPI. It is
practically the "parking lot" for the APs when they are not executing
any MP job, submitted by an MP services PPI client.
So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).
(*) For example, OVMF's PlatformPei uses this service to program
MSR_IA32_FEATURE_CONTROL from fw_cfg. On the resume path too, that
occurs before we do the SMBASE relocation.
(I.e., before S3RestoreConfig2() in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" calls
SmmRestoreCpu() in "UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c", via
SmmS3ResumeState->SmmS3ResumeEntryPoint.)
When an AP executes RSM, its CR3 should automatically be restored to the
original (non-SMM) value, should it not? I mean I do remember the CR3
value from the QEMU register dump, but now I don't understand how that's
possible with SMM=0.
Sorry if I'm being dense :)
> What _is_ a red herring is KVM's trace showing a RSM instruction
> at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last
> instruction executed _before_ getting to that RIP.
>
>>> vcpu#0 vcpu#1 vcpu#2 vcpu#3
>>> ------ ------ ------ ------
>>> enter enter
>>> enter | enter |
>>> | | | |
>>> leave | | |
>>> <--------------------------- BAD
>>> enter | | |
>>> | | | |
>>> leave leave leave leave
>>
>> I don't understand why we don't get horrible faults on the APs
>> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
>> page tables, executable code, everything, will read as 0xff on QEMU. How
>> can the APs continue in SMM long enough to
>>
>> (a) time out and pull the BSP back into SMM,
>> (b) complete the rendezvous and exit SMM?
>
> Because the "0xff" only applies when you're out of SMM. The three
> states (open, closed, closed/locked) only apply when you're not in SMM.
> While the AP is in SMM they are executing in a separate address space
> where SMRAM is "not closed". (In QEMU that's a separate AddressSpace
> struct, smram_address_space in target-i386/kvm.c).
Sigh, in retrospect, this should have been obvious. :) Thanks for
pointing it out!
Laszlo
>> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
>> explained the exact same thing before; I had to spell it out for
>> myself.) That leaves question (1) open. Why enter SMM in
>> S3ResumeExecuteBootScript() at all?
>>
>> Anyway, I think if the BSP and the APs are properly synchronized around
>> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
>> fixed. In that case, the APs' RSMs will restore the full context for the
>> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
>> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
>> buffer -- but the APs will sleep on), and then Linux will bring up the
>> APs, after taking control.
>
> Agreed.
>
> Paolo
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 22:59 ` Paolo Bonzini
2016-11-09 23:27 ` Laszlo Ersek
@ 2016-11-10 0:49 ` Yao, Jiewen
2016-11-10 0:50 ` Yao, Jiewen
2 siblings, 0 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10 0:49 UTC (permalink / raw)
To: Paolo Bonzini, Laszlo Ersek
Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
Fan, Jeff
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
[Jiewen] I hold different opinion on that.
If the AP is in hlt-loop at some below 1M memory with CR3 pointing to SMRAM, it has 2 issues.
* The below 1M memory (0x9f0fd) might be consumed by OS with other instruction.
* If AP starts running the code, it will get exception because CR3 is obviously wrong. The AP cannot fetch any code.
We might have 2 possibles way to trigger this scenario, at least.
A) Jeff and I have discovered one possible case – AP may receive NMI/SMI, such as periodic SMI, before OS sends INIT-SIPI-SIPI to wake up AP.
B) Paolo has found one real case - AP is in SMRAM when BSP is out and about to close SMRAM.
IMHO, letting BSP/AP sync in S3 resume just resolved B). But it does not help on A).
If the system has some special SMI, such as periodic SMI. It will definitely trigger case A).
So we have to fix the AP state anyway.
Now, if the AP state is fixed, I do not think we do not need worry about the BSP/AP out of sync issue.
BSP and AP can be independent. AP can receive NMI/SMI at any time and just work in HLT-loop.
Thank you
Yao Jiewen
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, November 10, 2016 7:00 AM
To: Laszlo Ersek <lersek@redhat.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Zeng, Star <star.zeng@intel.com>; Fan, Jeff <jeff.fan@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.
This I can answer. :)
The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop. When the AP exits SMM, it is in the JMP instruction.
As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?). After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like
POP EAX ; pop return address
POP EAX ; pop Context1 which is &mNumberToFinish
DEC [EAX]
1: CLI
HLT
JMP 1
> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.
I wouldn't call it a red herring. After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.
What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.
> > vcpu#0 vcpu#1 vcpu#2 vcpu#3
> > ------ ------ ------ ------
> > enter enter
> > enter | enter |
> > | | | |
> > leave | | |
> > <--------------------------- BAD
> > enter | | |
> > | | | |
> > leave leave leave leave
>
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
>
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?
Because the "0xff" only applies when you're out of SMM. The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed". (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).
> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
>
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 22:59 ` Paolo Bonzini
2016-11-09 23:27 ` Laszlo Ersek
2016-11-10 0:49 ` Yao, Jiewen
@ 2016-11-10 0:50 ` Yao, Jiewen
2016-11-10 1:02 ` Fan, Jeff
2 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10 0:50 UTC (permalink / raw)
To: Paolo Bonzini, Laszlo Ersek
Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
Fan, Jeff
Fix a typo.
From: Yao, Jiewen
Sent: Thursday, November 10, 2016 8:49 AM
To: 'Paolo Bonzini' <pbonzini@redhat.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Zeng, Star <star.zeng@intel.com>; Fan, Jeff <jeff.fan@intel.com>
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
[Jiewen] I hold different opinion on that.
If the AP is in hlt-loop at some below 1M memory with CR3 pointing to SMRAM, it has 2 issues.
* The below 1M memory (0x9f0fd) might be consumed by OS with other instruction.
* If AP starts running the code, it will get exception because CR3 is obviously wrong. The AP cannot fetch any code.
We might have 2 possibles way to trigger this scenario, at least.
A) Jeff and I have discovered one possible case – AP may receive NMI/SMI, such as periodic SMI, before OS sends INIT-SIPI-SIPI to wake up AP.
B) Paolo has found one real case - AP is in SMRAM when BSP is out and about to close SMRAM.
IMHO, letting BSP/AP sync in S3 resume just resolved B). But it does not help on A).
If the system has some special SMI, such as periodic SMI. It will definitely trigger case A).
So we have to fix the AP state anyway.
Now, if the AP state is fixed, I do not think we need worry about the BSP/AP out of sync issue.
BSP and AP can be independent. AP can receive NMI/SMI at any time and just work in HLT-loop.
Thank you
Yao Jiewen
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, November 10, 2016 7:00 AM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.
This I can answer. :)
The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop. When the AP exits SMM, it is in the JMP instruction.
As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?). After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like
POP EAX ; pop return address
POP EAX ; pop Context1 which is &mNumberToFinish
DEC [EAX]
1: CLI
HLT
JMP 1
> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.
I wouldn't call it a red herring. After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.
What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.
> > vcpu#0 vcpu#1 vcpu#2 vcpu#3
> > ------ ------ ------ ------
> > enter enter
> > enter | enter |
> > | | | |
> > leave | | |
> > <--------------------------- BAD
> > enter | | |
> > | | | |
> > leave leave leave leave
>
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
>
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?
Because the "0xff" only applies when you're out of SMM. The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed". (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).
> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
>
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 0:50 ` Yao, Jiewen
@ 2016-11-10 1:02 ` Fan, Jeff
0 siblings, 0 replies; 38+ messages in thread
From: Fan, Jeff @ 2016-11-10 1:02 UTC (permalink / raw)
To: Yao, Jiewen, Paolo Bonzini, Laszlo Ersek
Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star
I think it is necessary to place AP into one safe state: (hlt-loop, no page table required, > 1MB reserved space in non-SMM), just like we have done in MpInitExitBootServicesCallback() on normal boot path.
From: Yao, Jiewen
Sent: Thursday, November 10, 2016 8:51 AM
To: Paolo Bonzini; Laszlo Ersek
Cc: Kinney, Michael D; Tian, Feng; edk2-devel@ml01.01.org; Zeng, Star; Fan, Jeff
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
Fix a typo.
From: Yao, Jiewen
Sent: Thursday, November 10, 2016 8:49 AM
To: 'Paolo Bonzini' <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
[Jiewen] I hold different opinion on that.
If the AP is in hlt-loop at some below 1M memory with CR3 pointing to SMRAM, it has 2 issues.
* The below 1M memory (0x9f0fd) might be consumed by OS with other instruction.
* If AP starts running the code, it will get exception because CR3 is obviously wrong. The AP cannot fetch any code.
We might have 2 possibles way to trigger this scenario, at least.
A) Jeff and I have discovered one possible case – AP may receive NMI/SMI, such as periodic SMI, before OS sends INIT-SIPI-SIPI to wake up AP.
B) Paolo has found one real case - AP is in SMRAM when BSP is out and about to close SMRAM.
IMHO, letting BSP/AP sync in S3 resume just resolved B). But it does not help on A).
If the system has some special SMI, such as periodic SMI. It will definitely trigger case A).
So we have to fix the AP state anyway.
Now, if the AP state is fixed, I do not think we need worry about the BSP/AP out of sync issue.
BSP and AP can be independent. AP can receive NMI/SMI at any time and just work in HLT-loop.
Thank you
Yao Jiewen
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, November 10, 2016 7:00 AM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.
This I can answer. :)
The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop. When the AP exits SMM, it is in the JMP instruction.
As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?). After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like
POP EAX ; pop return address
POP EAX ; pop Context1 which is &mNumberToFinish
DEC [EAX]
1: CLI
HLT
JMP 1
> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.
I wouldn't call it a red herring. After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.
What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.
> > vcpu#0 vcpu#1 vcpu#2 vcpu#3
> > ------ ------ ------ ------
> > enter enter
> > enter | enter |
> > | | | |
> > leave | | |
> > <--------------------------- BAD
> > enter | | |
> > | | | |
> > leave leave leave leave
>
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
>
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?
Because the "0xff" only applies when you're out of SMM. The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed". (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).
> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
>
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 23:27 ` Laszlo Ersek
@ 2016-11-10 1:13 ` Yao, Jiewen
2016-11-10 6:30 ` Fan, Jeff
0 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10 1:13 UTC (permalink / raw)
To: Laszlo Ersek, Paolo Bonzini
Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
Fan, Jeff
So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).
[Jiewen] It is very tricky.
First, in normal boot, the SMM need prepare a CR3 as SMM page table, which is obvious.
In S3, the S3Resume calls AsmWriteCr3(SmmS3ResumeState->SmmS3Cr3) then jump to SmmS3ResumeState->SmmS3ResumeEntryPoint. Now BSP hold SmmS3Cr3 but in non-SMM mode.
In SmmRestoreCpu(), BSP calls EarlyInitializeCpu()/PrepareApStartupVector() in non-SMM mode. And mExchangeInfo->Cr3 = (UINT32) (AsmReadCr3 ()); Now AP holds SmmS3Cr3 in non-SMM mode. It is OK, because SMRAM is OPEN.
When SmmRelocateBases() is called, AP is waken up and does rebase. SmmS3Cr3 is used for AP in SMM. But it does not change the fact that SmmS3Cr3 is also used in non-SMM mode.
Later in InitializeCpu(), AP wakeup buffer is put to below 1M with SmmS3Cr3.
Thank you
Yao Jiewen
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, November 10, 2016 7:27 AM
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Zeng, Star <star.zeng@intel.com>; Fan, Jeff <jeff.fan@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 11/09/16 23:59, Paolo Bonzini wrote:
>
>> Another question I have -- and I feel I should really know it, but I
>> don't... -- is *why* the APs are executing code from the page at
>> 0x9f000.
>
> This I can answer. :)
>
> The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
> loop. When the AP exits SMM, it is in the JMP instruction.
>
> As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
> in the 0-640K area (perhaps it could be in what your doc calls the
> "permanent PEI memory for the S3 resume path"?). After thinking a
> bit more about it, it seems simplest to me if CpuS3.c just uses
> SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
> to jump to a small stub like
>
> POP EAX ; pop return address
> POP EAX ; pop Context1 which is &mNumberToFinish
> DEC [EAX]
> 1: CLI
> HLT
> JMP 1
>
>> I could be utterly and inexcusably wrong, but I think that the
>> RIP=0x9f0fd symptom is a red herring.
>
> I wouldn't call it a red herring. After all, CR3 points to SMM
> exactly because the CR3 that was set up for the 0x9f000 stub is
> CpuS3.c's SMRAM page table root.
Hrmpf. The stub at 0x9f000 does not belong to PiSmmCpuDxeSmm. Regardless
of the boot path (normal boot or S3 resume), it belongs to CpuMpPei, and
it partakes in the implementation of the MP services PPI. It is
practically the "parking lot" for the APs when they are not executing
any MP job, submitted by an MP services PPI client.
So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).
(*) For example, OVMF's PlatformPei uses this service to program
MSR_IA32_FEATURE_CONTROL from fw_cfg. On the resume path too, that
occurs before we do the SMBASE relocation.
(I.e., before S3RestoreConfig2() in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" calls
SmmRestoreCpu() in "UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c", via
SmmS3ResumeState->SmmS3ResumeEntryPoint.)
When an AP executes RSM, its CR3 should automatically be restored to the
original (non-SMM) value, should it not? I mean I do remember the CR3
value from the QEMU register dump, but now I don't understand how that's
possible with SMM=0.
Sorry if I'm being dense :)
> What _is_ a red herring is KVM's trace showing a RSM instruction
> at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last
> instruction executed _before_ getting to that RIP.
>
>>> vcpu#0 vcpu#1 vcpu#2 vcpu#3
>>> ------ ------ ------ ------
>>> enter enter
>>> enter | enter |
>>> | | | |
>>> leave | | |
>>> <--------------------------- BAD
>>> enter | | |
>>> | | | |
>>> leave leave leave leave
>>
>> I don't understand why we don't get horrible faults on the APs
>> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
>> page tables, executable code, everything, will read as 0xff on QEMU. How
>> can the APs continue in SMM long enough to
>>
>> (a) time out and pull the BSP back into SMM,
>> (b) complete the rendezvous and exit SMM?
>
> Because the "0xff" only applies when you're out of SMM. The three
> states (open, closed, closed/locked) only apply when you're not in SMM.
> While the AP is in SMM they are executing in a separate address space
> where SMRAM is "not closed". (In QEMU that's a separate AddressSpace
> struct, smram_address_space in target-i386/kvm.c).
Sigh, in retrospect, this should have been obvious. :) Thanks for
pointing it out!
Laszlo
>> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
>> explained the exact same thing before; I had to spell it out for
>> myself.) That leaves question (1) open. Why enter SMM in
>> S3ResumeExecuteBootScript() at all?
>>
>> Anyway, I think if the BSP and the APs are properly synchronized around
>> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
>> fixed. In that case, the APs' RSMs will restore the full context for the
>> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
>> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
>> buffer -- but the APs will sleep on), and then Linux will bring up the
>> APs, after taking control.
>
> Agreed.
>
> Paolo
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 1:13 ` Yao, Jiewen
@ 2016-11-10 6:30 ` Fan, Jeff
0 siblings, 0 replies; 38+ messages in thread
From: Fan, Jeff @ 2016-11-10 6:30 UTC (permalink / raw)
To: Yao, Jiewen, Laszlo Ersek, Paolo Bonzini
Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star
Laszlo,
I just sent the patch to place AP into safe hlt-loop code (in NVS range > 1MB, 32 bit protected mode).
Could you check if it could solve the S3 unstable issue on OVMF?
Thanks!
Jeff
From: Yao, Jiewen
Sent: Thursday, November 10, 2016 9:13 AM
To: Laszlo Ersek; Paolo Bonzini
Cc: Kinney, Michael D; Tian, Feng; edk2-devel@ml01.01.org; Zeng, Star; Fan, Jeff
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).
[Jiewen] It is very tricky.
First, in normal boot, the SMM need prepare a CR3 as SMM page table, which is obvious.
In S3, the S3Resume calls AsmWriteCr3(SmmS3ResumeState->SmmS3Cr3) then jump to SmmS3ResumeState->SmmS3ResumeEntryPoint. Now BSP hold SmmS3Cr3 but in non-SMM mode.
In SmmRestoreCpu(), BSP calls EarlyInitializeCpu()/PrepareApStartupVector() in non-SMM mode. And mExchangeInfo->Cr3 = (UINT32) (AsmReadCr3 ()); Now AP holds SmmS3Cr3 in non-SMM mode. It is OK, because SMRAM is OPEN.
When SmmRelocateBases() is called, AP is waken up and does rebase. SmmS3Cr3 is used for AP in SMM. But it does not change the fact that SmmS3Cr3 is also used in non-SMM mode.
Later in InitializeCpu(), AP wakeup buffer is put to below 1M with SmmS3Cr3.
Thank you
Yao Jiewen
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, November 10, 2016 7:27 AM
To: Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>
Cc: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 11/09/16 23:59, Paolo Bonzini wrote:
>
>> Another question I have -- and I feel I should really know it, but I
>> don't... -- is *why* the APs are executing code from the page at
>> 0x9f000.
>
> This I can answer. :)
>
> The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
> loop. When the AP exits SMM, it is in the JMP instruction.
>
> As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
> in the 0-640K area (perhaps it could be in what your doc calls the
> "permanent PEI memory for the S3 resume path"?). After thinking a
> bit more about it, it seems simplest to me if CpuS3.c just uses
> SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
> to jump to a small stub like
>
> POP EAX ; pop return address
> POP EAX ; pop Context1 which is &mNumberToFinish
> DEC [EAX]
> 1: CLI
> HLT
> JMP 1
>
>> I could be utterly and inexcusably wrong, but I think that the
>> RIP=0x9f0fd symptom is a red herring.
>
> I wouldn't call it a red herring. After all, CR3 points to SMM
> exactly because the CR3 that was set up for the 0x9f000 stub is
> CpuS3.c's SMRAM page table root.
Hrmpf. The stub at 0x9f000 does not belong to PiSmmCpuDxeSmm. Regardless
of the boot path (normal boot or S3 resume), it belongs to CpuMpPei, and
it partakes in the implementation of the MP services PPI. It is
practically the "parking lot" for the APs when they are not executing
any MP job, submitted by an MP services PPI client.
So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).
(*) For example, OVMF's PlatformPei uses this service to program
MSR_IA32_FEATURE_CONTROL from fw_cfg. On the resume path too, that
occurs before we do the SMBASE relocation.
(I.e., before S3RestoreConfig2() in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" calls
SmmRestoreCpu() in "UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c", via
SmmS3ResumeState->SmmS3ResumeEntryPoint.)
When an AP executes RSM, its CR3 should automatically be restored to the
original (non-SMM) value, should it not? I mean I do remember the CR3
value from the QEMU register dump, but now I don't understand how that's
possible with SMM=0.
Sorry if I'm being dense :)
> What _is_ a red herring is KVM's trace showing a RSM instruction
> at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last
> instruction executed _before_ getting to that RIP.
>
>>> vcpu#0 vcpu#1 vcpu#2 vcpu#3
>>> ------ ------ ------ ------
>>> enter enter
>>> enter | enter |
>>> | | | |
>>> leave | | |
>>> <--------------------------- BAD
>>> enter | | |
>>> | | | |
>>> leave leave leave leave
>>
>> I don't understand why we don't get horrible faults on the APs
>> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
>> page tables, executable code, everything, will read as 0xff on QEMU. How
>> can the APs continue in SMM long enough to
>>
>> (a) time out and pull the BSP back into SMM,
>> (b) complete the rendezvous and exit SMM?
>
> Because the "0xff" only applies when you're out of SMM. The three
> states (open, closed, closed/locked) only apply when you're not in SMM.
> While the AP is in SMM they are executing in a separate address space
> where SMRAM is "not closed". (In QEMU that's a separate AddressSpace
> struct, smram_address_space in target-i386/kvm.c).
Sigh, in retrospect, this should have been obvious. :) Thanks for
pointing it out!
Laszlo
>> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
>> explained the exact same thing before; I had to spell it out for
>> myself.) That leaves question (1) open. Why enter SMM in
>> S3ResumeExecuteBootScript() at all?
>>
>> Anyway, I think if the BSP and the APs are properly synchronized around
>> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
>> fixed. In that case, the APs' RSMs will restore the full context for the
>> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
>> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
>> buffer -- but the APs will sleep on), and then Linux will bring up the
>> APs, after taking control.
>
> Agreed.
>
> Paolo
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-09 20:46 ` Laszlo Ersek
@ 2016-11-10 10:41 ` Yao, Jiewen
2016-11-10 12:01 ` Laszlo Ersek
2016-11-10 12:27 ` Paolo Bonzini
0 siblings, 2 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10 10:41 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
Thanks to report case 3 issue on bugzillar.
Let's focus on Case 8.
It seems another random failure issue.
I did more test.
1) I tested some other our internal real platform for UEFI32 OS boot. I cannot reproduce the ASSERT.
2) I wrote a small test app to call ExitBootServices and send SMI. I run it on current my windows QEMU but I still cannot reproduce the ASSERT.
It seem your env is the only way to repo the issue. I am trying to follow your step by step to install OS on QEMU/KVM. I haven't finish all thing yet, because of some network proxy issue. :(
Your information and analysis is great. It gives us some clue.
I guess the same thing as you and checked: InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
This address is initialized in InitializeMpSyncData(), with gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus which is got from MpServices->GetNumberOfProcessors().
I do not know why this address is zero.
I also did not quite understand below log.
* CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
CPU #3: pc=0x000000007ffd17ca thread_id=7838
As I recall, writing to B2 only cause BSP get SMI on OVMF. AP does not enter SMM mode.
So why #3 can enter SMM mode? Is that expected behavior? Or unexpected behavior?
If this is expected, how this happened? Does OS send SendSmiIpiAllExcludingSelf, after ExitBootServices()?
I will see if I can finish QEMU/KVM installation tomorrow.
If you have some idea on why and how #3 enter SMM, please let us know.
Thank you
Yao Jiewen
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, November 10, 2016 4:46 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 11/09/16 07:25, Yao, Jiewen wrote:
> Hi Laszlo
> I will fix DEBUG message issue in V3 patch.
>
> Below is rest issues:
>
>
> l Case 13: S3 fails randomly.
> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>
>
> 1) We believe the dead CPU is AP. Not BSP.
> The reason is that:
>
> 1.1) The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>
> 1.2) The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>
> 1.3) The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>
>
> 2) Based upon the 1), we reviewed S3 resume AP flow.
> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>
>
> 3) The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>
>
> 4) I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>
>
> 5) The fix, I think, should be below:
> We should always put AP to protected mode, so that no paging is needed.
> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>
>
> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>
> There is no need to do more investigation. Thanks for your great help on that. :)
Thank you for your help!
I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is
BSP exits SMM and closes SMRAM on the S3 resume path before
meeting with AP(s)
I hope the title is mostly right. I didn't add any other details (I
haven't gone through the thread in detail yet, and without that I can't
even write up a semi-reasonable report myself). Instead, I referenced
this message of yours in the report, and I also linked Paolo's analysis
from elsewhere in the thread. I hope this will do for the report.
(Also, thank you Paolo, from the amazing analysis -- I haven't digested
it yet, but I can already tell it's amazing! :))
> l Case 17 - I do not think it is a real issue, because SMM is out of resource.
>
>
> l Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
> SPIN_LOCK *
> EFIAPI
> InitializeSpinLock (
> OUT SPIN_LOCK *SpinLock
> )
> {
> ASSERT (SpinLock != NULL);
>
> _ReadWriteBarrier();
> *SpinLock = SPIN_LOCK_RELEASED;
> _ReadWriteBarrier();
>
> return SpinLock;
> }
>
> If you can have a quick check on below, that would be great.
>
> 1) Which processor triggers this ASSERT? BSP or AP.
>
> 2) Which module triggers this ASSERT? Which module contains current RIP value?
First, one additional piece of info I have learned is that the issue
does not always present itself. Sometimes the boot just works fine,
other times the assert fires.
Using the QEMU monitor, I managed to get the following information with
the "info cpus" command:
* CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
CPU #3: pc=0x000000007ffd17ca thread_id=7838
VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
point into SMRAM again.
In the OVMF log, I see
Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
PiSmmCpuDxeSmm.efi
So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
entry point, 0x8577, 0x253 bytes less).
Running
objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug
first I see confirmation that
start address 0x00000253
and then
000087bd <CpuDeadLoop>:
VOID
EFIAPI
CpuDeadLoop (
VOID
)
{
87bd: 55 push %ebp
87be: 89 e5 mov %esp,%ebp
87c0: 83 ec 10 sub $0x10,%esp
volatile UINTN Index;
for (Index = 0; Index == 0;);
87c3: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
87ca: 8b 45 fc mov -0x4(%ebp),%eax <-- HERE
87cd: 85 c0 test %eax,%eax
87cf: 74 f9 je 87ca <CpuDeadLoop+0xd>
}
87d1: c9 leave
87d2: c3 ret
This seems consistent with an assertion failure.
I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
like a possible caller:
//
// The BUSY lock is initialized to Released state. This needs to
// be done early enough to be ready for BSP's SmmStartupThisAp()
// call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
// called immediately after AP's present flag is detected.
//
InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
Just a guess, of course.
> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
> If you can share a step by step to me, that would be great.
(1) Grab a host computer with a CPU that supports VMX and EPT.
(2) Download and install Fedora 24 (for example):
https://getfedora.org/en/workstation/download/
http://docs.fedoraproject.org/install-guide
(3) Install the "qemu-system-x86" package with DNF
dnf install qemu-system-x86
(4) clone edk2 with git
(5) embed OpenSSL optionally (for secure boot); see
"CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"
(6) build OVMF:
source edksetup.sh
make -C "$EDK_TOOLS_PATH"
# Ia32
build \
-a IA32 \
-p OvmfPkg/OvmfPkgIa32.dsc \
-D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
-t GCC5 -b DEBUG
# Ia32X64
build \
-a IA32 -a X64 \
-p OvmfPkg/OvmfPkgIa32X64.dsc \
-D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
-t GCC5 -b DEBUG
(7) Create disk images:
qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
-o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G
qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
-o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G
(8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
that you downloaded already (the ISO image).
For 32-bit guest OS, this one used to work:
https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/
minimally the 20141209 release. Hm... actually, I think the maintainer
of that image has discontinued the downloadable files :(
So, I don't know what 32-bit UEFI OS to recommend for testing.
32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
times, with some help from a Microsoft developer, but we couldn't solve
it), so I can't recommend Windows as an alternative.
Perhaps you can use
https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html
as a 32-bit guest OS, I never tried.
(9) Anyway, once you have an installer ISO, set the "ISO" environment
variable to the ISO image's full pathname, and then run QEMU like this:
# Settings for Ia32 only:
ISO=...
DISK=.../disk-ia32.img
FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
VARS=vars-32.fd
QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
DEBUG=debug-32.log
# Settings for Ia32X64 only:
ISO=...
DISK=.../disk-ia32x64.img
FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
VARS=vars-3264.fd
QEMU_COMMAND=qemu-system-x86_64
DEBUG=debug-3264.log
# Common commands for both target arches:
# create variable store from varstore template
# if the former doesn't exist yet
if ! [ -e "$VARS" ]; then
cp -- "$TEMPLATE" "$VARS"
fi
$QEMU_COMMAND \
-machine q35,smm=on,accel=kvm \
-m 4096 \
-smp sockets=1,cores=2,threads=2 \
-global driver=cfi.pflash01,property=secure,value=on \
-drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
-drive if=pflash,format=raw,unit=1,file=${VARS} \
\
-chardev file,id=debugfile,path=$DEBUG \
-device isa-debugcon,iobase=0x402,chardev=debugfile \
\
-chardev stdio,id=char0,signal=off,mux=on \
-mon chardev=char0,mode=readline,default \
-serial chardev:char0 \
\
-drive id=iso,if=none,format=raw,readonly,file=$ISO \
-drive id=disk,if=none,format=qcow2,file=$DISK \
\
-device virtio-scsi-pci,id=scsi0 \
-device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
-device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
\
-device VGA
This will capture the OVMF debug output in the $DEBUG file. Also, the
terminal where you run the command can be switched between the guest's
serial console and the QEMU monitor with [Ctrl-A C].
Thanks
Laszlo
>
> Thank you
> Yao Jiewen
>
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Tuesday, November 8, 2016 9:22 AM
> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2 X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>
>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>
>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com>>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>
>
> I have new test results. Let's start with the table again:
>
> Legend:
>
> - "untested" means the test was not executed because the same test
> failed or proved unreliable in a less demanding configuration already,
>
> - "n/a" means a setting or test case was impossible,
>
> - "fail" and "unreliable" (lower case) are outside the scope of this
> series; they either capture the pre-series status, or are expected
> even with the series applied due to the pre-series status,
>
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
> series.
>
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
>
> series OVMF VCPU boot S3 resume
> # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
> -- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
> 1 no Ia32 64 n/a 1x2x2 pass unreliable
> 2 no Ia32 255 n/a 52x2x2 pass untested
> 3 no Ia32 255 n/a 53x2x2 unreliable untested
> 4 no Ia32X64 64 n/a 1x2x2 pass unreliable
> 5 no Ia32X64 255 n/a 52x2x2 pass untested
> 6 no Ia32X64 255 n/a 54x2x2 fail n/a
> 7 v2 Ia32 64 FALSE 1x2x2 pass untested
> 8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
> 9 v2 Ia32 255 FALSE 52x2x2 pass untested
> 10 v2 Ia32 255 FALSE 53x2x2 untested untested
> 11 v2 Ia32 255 TRUE 52x2x2 untested untested
> 12 v2 Ia32 255 TRUE 53x2x2 untested untested
> 13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
> 14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
> 15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
> 16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
> 17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
> 18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
>
> * Case 8: this test case failed with v2 as well, but this time with
> different symptoms:
>
>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitExitBootServicesCallback() done!
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>
> I didn't try to narrow this down.
>
> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
> and good. The good news is for Jiewen: this patch series does not
> cause the unreliability, it "only" amplifies it severely. The bad news
> is correspondingly for everyone else: S3 resume is actually unreliable
> even in case 4, that is, without this series applied, it's just the
> failure rate is much-much lower.
>
> Namely, in my new testing, in case 13, S3 resume failed 8 times out of
> 21 tries. (I stopped testing at the 8th failure.)
>
> Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
> exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
> #12 that failed; I continued testing and aborted the test after the
> 55th try.)
>
> So, while the series hugely amplifies the failure rate, the failure
> does exist without the series. Which is why I modified the case 4
> results in the table, and also lower-cased the word "unreliable" in
> case 13.
>
> Below I will return to this problem separately; let's go over the rest
> of the table first.
>
> * Case 17: I guess this is not a real failure, I'm just including it for
> completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
> additional SMRAM demand (see the commit message on patch V2 4/6). This
> case fails with
>
>> SmmLockBox Command - 4
>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>> SmmLockBox SmmLockBoxHandler Exit
>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>
> which is an SMRAM allocation failure. If I lower the VCPU count to
> 50x2x2, then the guest boots fine.
>
> ----*----
>
> Before I get to the S3 resume problem (which, again, reproduces without
> this series, although much less frequently), I'd like to comment on the
> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
> function, on the return value of SmmBlockingStartupThisAp(). This change
> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>
>> !mSmmMpSyncData->CpuData[1].Present
>> !mSmmMpSyncData->CpuData[2].Present
>> !mSmmMpSyncData->CpuData[3].Present
>> ...
>
> messages in the OVMF boot log, interspersed with
>
>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>
> style messages. (That is, one error message for each AP, per
> ConvertPageEntryAttribute() message.)
>
> Is this okay / intentional? The number of these messages can go up to
> several thousands and that sort of drowns out everything else in the
> log.
>
> It's also not easy to mask the message, because it's logged on the
> DEBUG_ERROR level.
>
> ----*----
>
> * Okay, so the S3 problem. Last time I suspected that the failure point
> (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
> 9A1D0, according to the OVMF log). In order to test this idea, I
> exercised this series with S3 against a Windows 8.1 guest (--> case 13
> again). The failure reproduced on the second S3 resume, with identical
> RIP, despite the Windows wakeup vector being located elsewhere (at
> 0x1000).
>
> Quoting the OVMF log leading up to the resume:
>
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 1000
>
> QEMU log (same as before):
>
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT= 000000007f294000 00000047
>> IDT= 000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT= 000000007f294000 00000047
>> IDT= 000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>
> So, we can exclude the suspicion that the problem is guest OS
> dependent.
>
> * Then I looked for the base address of the page containing the
> RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
> some firmware component might have allocated that area actually. Here
> we go:
>
>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>
> That is, the failure hits (when it hits -- not always) in the area
> where the CpuMpPei driver *borrows* memory for the startup vector of
> the APs, for the purposes of the MP service PPI. ("Wakeup" is an
> overloaded word here; the "wakeup buffer" has nothing to do with S3
> resume, it just serves for booting the APs temporarily in PEI, for
> implementing the MP service PPI.)
>
> When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
> the original contents of this area. This occurs just before
> transfering control to the guest OS wakeup vector: see the
> "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
> quoted from the OVMF log.
>
> I documented (parts of) this logic in OVMF commit
>
> https://github.com/tianocore/edk2/commit/e3e3090a959a0
>
> (see the code comments as well).
>
> * At that time, I thought to have identified a memory management bug in
> CpuMpPei; see the following discussion and bug report for details:
>
> https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
> https://bugzilla.tianocore.org/show_bug.cgi?id=67
>
> However, with the extraction / introduction of MpInitLib, this issue
> has been fixed: GetWakeupBuffer() now calls
> CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
> no longer; we shouldn't be looking there for the root cause.
>
> * Either way, I don't understand why anything would want to execute code
> in the one page that happens to host the MP services PPI startup
> buffer for APs during PEI.
>
> Not understanding the "why", I looked at the "what", and resorted to
> tracing KVM. Because the problem readily reproduces with this series
> applied (case 13), it wasn't hard to start the tracing while the guest
> was suspended, and capture just the actions that led from the
> KVM-level wakeup to the failure.
>
> The QEMU state dumps are visible above in the email. I've also
> uploaded the compressed OVMF log and the textual KVM trace here:
>
> http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>
> I sincerely hope that Paolo will have a field day with the KVM trace
> :) I managed to identify the following curiosities (remember this is
> all on the S3 resume path):
>
> * First, the VCPUs (there are four of them) enter and leave SMM in a
> really funky pattern:
>
> vcpu#0 vcpu#1 vcpu#2 vcpu#3
> ------ ------ ------ ------
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter
> |
> leave
>
> enter enter
> enter | enter |
> | | | |
> leave | | |
> | | |
> enter | | |
> | | | |
> leave leave leave leave
>
> That is, first we have each VCPU enter and leave SMM in complete
> isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
> followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
> temporarily (it comes back in later), while the other three remain
> in SMM. Finally all four of them leave SMM together.
>
> After which the problem occurs.
>
> * Second, the instruction that causes things to blow up is <0f aa>,
> i.e., RSM. I have absolutely no clue why RSM is executed:
>
> (a) in the area that used to host the AP startup routine for the MP
> services PPI -- note that we also have "Transfer to 16bit OS waking
> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
> area completeley! --,
>
> (b) and why *after* all four VCPUs have just left SMM, together.
>
> * The RSM instruction is handled successfully elsewhere, for example
> when all four VCPUs leave SMM, at the bottom of the diagram above:
>
>> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
>> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
>> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
>> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>
> * The guest-phys address 7ff7f000 that we see just before the error:
>
>> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
>> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
>> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
>
> can be found higher up in the trace; namely, it is written to CR3
> several times. It's the root of the page tables.
>
> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
> qemu-monitor-command --hmp", while the guest was auto-paused after the
> crash.
>
> I cannot provide results: QEMU appeared to return a message that would
> be longer than 16MB after encoding by libvirt, and libvirt rejected
> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
> Anyway, the KVM trace, and the QEMU register dump, look consistent
> with what Paolo said about "Code=?? ?? ??...":
>
> The question marks usually mean that the page tables do not map a
> page at that address.
>
> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
> (SMM=0). We can't translate *any* guest-virtual address, as we can't
> even begin walking the page tables.
>
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 10:41 ` Yao, Jiewen
@ 2016-11-10 12:01 ` Laszlo Ersek
2016-11-10 14:48 ` Yao, Jiewen
2016-11-10 12:27 ` Paolo Bonzini
1 sibling, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-10 12:01 UTC (permalink / raw)
To: Yao, Jiewen
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
On 11/10/16 11:41, Yao, Jiewen wrote:
> Thanks to report case 3 issue on bugzillar.
>
> Let's focus on Case 8.
> It seems another random failure issue.
>
> I did more test.
>
> 1) I tested some other our internal real platform for UEFI32 OS boot. I cannot reproduce the ASSERT.
>
> 2) I wrote a small test app to call ExitBootServices and send SMI. I run it on current my windows QEMU but I still cannot reproduce the ASSERT.
>
> It seem your env is the only way to repo the issue. I am trying to follow your step by step to install OS on QEMU/KVM. I haven't finish all thing yet, because of some network proxy issue. :(
Right, when you run a guest on TCG (QEMU's emulator) vs. on KVM (the virtualizer / accelerator in the host Linux kernel), you get very-very different timing behavior and interleaving of actions. For one, with KVM, the VCPUs really execute in parallel -- they are represented by host OS threads, and the host OS schedules them to separate "physical logical CPUs".
>
> Your information and analysis is great. It gives us some clue.
>
> I guess the same thing as you and checked: InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>
> This address is initialized in InitializeMpSyncData(), with gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus which is got from MpServices->GetNumberOfProcessors().
> I do not know why this address is zero.
>
> I also did not quite understand below log.
>
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
> CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
> CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
> CPU #3: pc=0x000000007ffd17ca thread_id=7838
>
> As I recall, writing to B2 only cause BSP get SMI on OVMF. AP does not enter SMM mode.
> So why #3 can enter SMM mode? Is that expected behavior? Or unexpected behavior?
> If this is expected, how this happened? Does OS send SendSmiIpiAllExcludingSelf, after ExitBootServices()?
My theory is that the OS is calling a runtime variable service during boot. That is supposed to pull in all APs into SMM, one way or another.
Also, during boot, the OS may call the runtime variable services genuinely on VCPU#3.
>
> I will see if I can finish QEMU/KVM installation tomorrow.
Thanks! Once you can test with KVM on your side, that should speed up debugging considerably, I think!
> If you have some idea on why and how #3 enter SMM, please let us know.
Well, I captured a KVM trace for this as well (fresh boot, up to the failure). Grepping the trace for entering / leaving SMM, we see:
(1) the initial SMBASE relocation:
CPU-6948 [004] 11545.040294: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000
CPU-6948 [004] 11545.040335: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb5000
CPU-6949 [000] 11545.040363: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000
CPU-6949 [000] 11545.040389: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb7000
CPU-6950 [002] 11545.040417: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000
CPU-6950 [002] 11545.040443: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb9000
CPU-6947 [007] 11545.040453: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000
CPU-6947 [007] 11545.040474: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb3000
(2) a long stretch of VCPU#0 entering and leaving SMM, while the firmware uses variable services and such:
CPU-6947 [007] 11545.053169: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb3000
CPU-6947 [007] 11545.061272: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb3000
...
(3) a write to ioport 0xB2 from VCPU#3, then VCPU#3 entering SMM, then hitting the assert very-very soon:
CPU-6950 [005] 11550.521195: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521196: kvm_exit: reason IO_INSTRUCTION rip 0xf7c937b6 info b20000 0
CPU-6950 [005] 11550.521196: kvm_pio: pio_write at 0xb2 size 1 count 1 val 0x0
CPU-6950 [005] 11550.521196: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6947 [003] 11550.521196: kvm_inj_virq: irq 253
CPU-6950 [005] 11550.521196: kvm_fpu: unload
CPU-6947 [003] 11550.521197: kvm_fpu: load
CPU-6947 [003] 11550.521197: kvm_entry: vcpu 0
CPU-6950 [005] 11550.521200: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb9000
CPU-6947 [003] 11550.521207: kvm_eoi: apicid 0 vector 253
CPU-6950 [005] 11550.521207: kvm_fpu: load
CPU-6947 [003] 11550.521207: kvm_pv_eoi: apicid 0 vector 253
CPU-6950 [005] 11550.521207: kvm_entry: vcpu 3
CPU-6947 [003] 11550.521207: kvm_exit: reason HLT rip 0xc1844554 info 0 0
CPU-6950 [005] 11550.521209: kvm_exit: reason CR_ACCESS rip 0x8045 info 300 0
CPU-6950 [005] 11550.521209: kvm_cr: cr_write 0 = 0x33
CPU-6950 [005] 11550.521212: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521213: kvm_exit: reason CR_ACCESS rip 0x7ffc107d info 3 0
CPU-6950 [005] 11550.521213: kvm_cr: cr_write 3 = 0x7ff9a000
CPU-6950 [005] 11550.521214: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521214: kvm_exit: reason CPUID rip 0x7ffc1085 info 0 0
CPU-6950 [005] 11550.521214: kvm_cpuid: func 1 rax 6e8 rbx 3040800 rcx 80200001 rdx 1f89fbff
CPU-6950 [005] 11550.521215: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521215: kvm_exit: reason CR_ACCESS rip 0x7ffc10c4 info 4 0
CPU-6950 [005] 11550.521215: kvm_cr: cr_write 4 = 0x668
CPU-6950 [005] 11550.521217: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521218: kvm_exit: reason CR_ACCESS rip 0x7ffc110e info 300 0
CPU-6950 [005] 11550.521218: kvm_cr: cr_write 0 = 0x80010033
CPU-6950 [005] 11550.521220: kvm_entry: vcpu 3
CPU-6947 [003] 11550.521220: kvm_fpu: unload
CPU-6950 [005] 11550.521222: kvm_exit: reason EPT_VIOLATION rip 0x7ffcbe46 info 181 0
CPU-6950 [005] 11550.521223: kvm_page_fault: address 22004ebc error_code 181
CPU-6950 [005] 11550.521231: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521236: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521236: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x41 <----------------- "A"
CPU-6950 [005] 11550.521237: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521237: kvm_fpu: unload
CPU-6950 [005] 11550.521253: kvm_fpu: load
CPU-6950 [005] 11550.521253: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521254: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521254: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x53 <----------------- "S"
CPU-6950 [005] 11550.521254: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521254: kvm_fpu: unload
CPU-6950 [005] 11550.521257: kvm_fpu: load
CPU-6950 [005] 11550.521257: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521258: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521258: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x53 <----------------- "S"
CPU-6950 [005] 11550.521258: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521258: kvm_fpu: unload
CPU-6950 [005] 11550.521260: kvm_fpu: load
CPU-6950 [005] 11550.521260: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521261: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521261: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x45 <----------------- "E"
CPU-6950 [005] 11550.521261: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521262: kvm_fpu: unload
CPU-6950 [005] 11550.521264: kvm_fpu: load
CPU-6950 [005] 11550.521264: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521264: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521264: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x52 <----------------- "R"
CPU-6950 [005] 11550.521264: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521265: kvm_fpu: unload
CPU-6950 [005] 11550.521267: kvm_fpu: load
CPU-6950 [005] 11550.521267: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521267: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521267: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x54 <----------------- "T"
CPU-6950 [005] 11550.521268: kvm_userspace_exit: reason KVM_EXIT_IO (2)
This seems to be consistent with the OS calling a variable service on VCPU#3.
Also, as far as I can see, the above trace matches the assembly code in "UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm".
Is perhaps CpuIndex out of bounds?... Hmm, with the following debug patch:
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> index d0092d2f145a..29f6e783c58f 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> @@ -1143,6 +1143,9 @@ SmiRendezvous (
> // E.g., with Relaxed AP flow, SmmStartupThisAp() may be called immediately
> // after AP's present flag is detected.
> //
> + if (CpuIndex >= 4) {
> + DEBUG ((EFI_D_ERROR, "CpuIndex=%u\n", (UINT32)CpuIndex));
> + }
> InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
> }
>
>
I get the following debug output (note that my SMP configuration is 1x2x2):
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> CpuIndex=780161211
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
Ehm... what? :)
SmiRendezvous() is EFIAPI, is the calling convention followed in "Ia32/SmiEntry.nasm"?
Thanks,
Laszlo
> Thank you
> Yao Jiewen
>
>
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, November 10, 2016 4:46 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
> On 11/09/16 07:25, Yao, Jiewen wrote:
>> Hi Laszlo
>> I will fix DEBUG message issue in V3 patch.
>>
>> Below is rest issues:
>>
>>
>> l Case 13: S3 fails randomly.
>> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>>
>>
>> 1) We believe the dead CPU is AP. Not BSP.
>> The reason is that:
>>
>> 1.1) The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>>
>> 1.2) The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>>
>> 1.3) The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>>
>>
>> 2) Based upon the 1), we reviewed S3 resume AP flow.
>> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
>> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
>> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>>
>>
>> 3) The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>>
>>
>> 4) I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>>
>>
>> 5) The fix, I think, should be below:
>> We should always put AP to protected mode, so that no paging is needed.
>> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>>
>>
>> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>>
>> There is no need to do more investigation. Thanks for your great help on that. :)
>
> Thank you for your help!
>
> I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is
>
> BSP exits SMM and closes SMRAM on the S3 resume path before
> meeting with AP(s)
>
> I hope the title is mostly right. I didn't add any other details (I
> haven't gone through the thread in detail yet, and without that I can't
> even write up a semi-reasonable report myself). Instead, I referenced
> this message of yours in the report, and I also linked Paolo's analysis
> from elsewhere in the thread. I hope this will do for the report.
>
> (Also, thank you Paolo, from the amazing analysis -- I haven't digested
> it yet, but I can already tell it's amazing! :))
>
>> l Case 17 - I do not think it is a real issue, because SMM is out of resource.
>>
>>
>> l Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
>> SPIN_LOCK *
>> EFIAPI
>> InitializeSpinLock (
>> OUT SPIN_LOCK *SpinLock
>> )
>> {
>> ASSERT (SpinLock != NULL);
>>
>> _ReadWriteBarrier();
>> *SpinLock = SPIN_LOCK_RELEASED;
>> _ReadWriteBarrier();
>>
>> return SpinLock;
>> }
>>
>> If you can have a quick check on below, that would be great.
>>
>> 1) Which processor triggers this ASSERT? BSP or AP.
>>
>> 2) Which module triggers this ASSERT? Which module contains current RIP value?
>
> First, one additional piece of info I have learned is that the issue
> does not always present itself. Sometimes the boot just works fine,
> other times the assert fires.
>
> Using the QEMU monitor, I managed to get the following information with
> the "info cpus" command:
>
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
> CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
> CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
> CPU #3: pc=0x000000007ffd17ca thread_id=7838
>
> VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
> point into SMRAM again.
>
> In the OVMF log, I see
>
> Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
> PiSmmCpuDxeSmm.efi
>
> So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
> entry point, 0x8577, 0x253 bytes less).
>
> Running
>
> objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug
>
> first I see confirmation that
>
> start address 0x00000253
>
> and then
>
> 000087bd <CpuDeadLoop>:
> VOID
> EFIAPI
> CpuDeadLoop (
> VOID
> )
> {
> 87bd: 55 push %ebp
> 87be: 89 e5 mov %esp,%ebp
> 87c0: 83 ec 10 sub $0x10,%esp
> volatile UINTN Index;
>
> for (Index = 0; Index == 0;);
> 87c3: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
> 87ca: 8b 45 fc mov -0x4(%ebp),%eax <-- HERE
> 87cd: 85 c0 test %eax,%eax
> 87cf: 74 f9 je 87ca <CpuDeadLoop+0xd>
> }
> 87d1: c9 leave
> 87d2: c3 ret
>
> This seems consistent with an assertion failure.
>
> I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
> SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
> like a possible caller:
>
> //
> // The BUSY lock is initialized to Released state. This needs to
> // be done early enough to be ready for BSP's SmmStartupThisAp()
> // call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
> // called immediately after AP's present flag is detected.
> //
> InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>
> Just a guess, of course.
>
>> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
>> If you can share a step by step to me, that would be great.
>
> (1) Grab a host computer with a CPU that supports VMX and EPT.
>
> (2) Download and install Fedora 24 (for example):
>
> https://getfedora.org/en/workstation/download/
> http://docs.fedoraproject.org/install-guide
>
> (3) Install the "qemu-system-x86" package with DNF
>
> dnf install qemu-system-x86
>
> (4) clone edk2 with git
>
> (5) embed OpenSSL optionally (for secure boot); see
> "CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"
>
> (6) build OVMF:
>
> source edksetup.sh
> make -C "$EDK_TOOLS_PATH"
>
> # Ia32
> build \
> -a IA32 \
> -p OvmfPkg/OvmfPkgIa32.dsc \
> -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
> -t GCC5 -b DEBUG
>
> # Ia32X64
> build \
> -a IA32 -a X64 \
> -p OvmfPkg/OvmfPkgIa32X64.dsc \
> -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
> -t GCC5 -b DEBUG
>
> (7) Create disk images:
>
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
> -o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G
>
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
> -o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G
>
> (8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
> that you downloaded already (the ISO image).
>
> For 32-bit guest OS, this one used to work:
>
> https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/
>
> minimally the 20141209 release. Hm... actually, I think the maintainer
> of that image has discontinued the downloadable files :(
>
> So, I don't know what 32-bit UEFI OS to recommend for testing.
>
> 32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
> times, with some help from a Microsoft developer, but we couldn't solve
> it), so I can't recommend Windows as an alternative.
>
> Perhaps you can use
>
> https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html
>
> as a 32-bit guest OS, I never tried.
>
> (9) Anyway, once you have an installer ISO, set the "ISO" environment
> variable to the ISO image's full pathname, and then run QEMU like this:
>
> # Settings for Ia32 only:
>
> ISO=...
> DISK=.../disk-ia32.img
> FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-32.fd
> QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
> DEBUG=debug-32.log
>
> # Settings for Ia32X64 only:
>
> ISO=...
> DISK=.../disk-ia32x64.img
> FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-3264.fd
> QEMU_COMMAND=qemu-system-x86_64
> DEBUG=debug-3264.log
>
> # Common commands for both target arches:
>
> # create variable store from varstore template
> # if the former doesn't exist yet
> if ! [ -e "$VARS" ]; then
> cp -- "$TEMPLATE" "$VARS"
> fi
>
> $QEMU_COMMAND \
> -machine q35,smm=on,accel=kvm \
> -m 4096 \
> -smp sockets=1,cores=2,threads=2 \
> -global driver=cfi.pflash01,property=secure,value=on \
> -drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
> -drive if=pflash,format=raw,unit=1,file=${VARS} \
> \
> -chardev file,id=debugfile,path=$DEBUG \
> -device isa-debugcon,iobase=0x402,chardev=debugfile \
> \
> -chardev stdio,id=char0,signal=off,mux=on \
> -mon chardev=char0,mode=readline,default \
> -serial chardev:char0 \
> \
> -drive id=iso,if=none,format=raw,readonly,file=$ISO \
> -drive id=disk,if=none,format=qcow2,file=$DISK \
> \
> -device virtio-scsi-pci,id=scsi0 \
> -device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
> -device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
> \
> -device VGA
>
> This will capture the OVMF debug output in the $DEBUG file. Also, the
> terminal where you run the command can be switched between the guest's
> serial console and the QEMU monitor with [Ctrl-A C].
>
> Thanks
> Laszlo
>
>>
>> Thank you
>> Yao Jiewen
>>
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
>> Sent: Tuesday, November 8, 2016 9:22 AM
>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>>
>> On 11/04/16 10:30, Jiewen Yao wrote:
>>> ==== below is V2 description ====
>>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>>> 4) PiSmmCpu: Add protection detail in commit message.
>>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>>
>>> ==== below is V1 description ====
>>> This series patch enables SMM page level protection.
>>> Features are:
>>> 1) PiSmmCore reports SMM PE image code/data information
>>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>>> and set XD for data page and RO for code page.
>>> 3) PiSmmCpu enables Static Paging for X64 according to
>>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>>> is used as long as it is supported.
>>> 4) PiSmmCpu sets importance data structure to be read only,
>>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>>
>>> tested platform:
>>> 1) Intel internal platform (X64).
>>> 2) EDKII Quark IA32
>>> 3) EDKII Vlv2 X64
>>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>>
>>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>
>>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>
>>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>
>>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com>>>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>
>>
>> I have new test results. Let's start with the table again:
>>
>> Legend:
>>
>> - "untested" means the test was not executed because the same test
>> failed or proved unreliable in a less demanding configuration already,
>>
>> - "n/a" means a setting or test case was impossible,
>>
>> - "fail" and "unreliable" (lower case) are outside the scope of this
>> series; they either capture the pre-series status, or are expected
>> even with the series applied due to the pre-series status,
>>
>> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>> series.
>>
>> In all cases, 36 bits were used as address width in the CPU HOB (--> up
>> to 64GB guest-phys address space).
>>
>> series OVMF VCPU boot S3 resume
>> # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
>> -- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
>> 1 no Ia32 64 n/a 1x2x2 pass unreliable
>> 2 no Ia32 255 n/a 52x2x2 pass untested
>> 3 no Ia32 255 n/a 53x2x2 unreliable untested
>> 4 no Ia32X64 64 n/a 1x2x2 pass unreliable
>> 5 no Ia32X64 255 n/a 52x2x2 pass untested
>> 6 no Ia32X64 255 n/a 54x2x2 fail n/a
>> 7 v2 Ia32 64 FALSE 1x2x2 pass untested
>> 8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
>> 9 v2 Ia32 255 FALSE 52x2x2 pass untested
>> 10 v2 Ia32 255 FALSE 53x2x2 untested untested
>> 11 v2 Ia32 255 TRUE 52x2x2 untested untested
>> 12 v2 Ia32 255 TRUE 53x2x2 untested untested
>> 13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
>> 14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
>> 15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
>> 16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
>> 17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
>> 18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
>>
>> * Case 8: this test case failed with v2 as well, but this time with
>> different symptoms:
>>
>>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitExitBootServicesCallback() done!
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>>
>> I didn't try to narrow this down.
>>
>> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>> and good. The good news is for Jiewen: this patch series does not
>> cause the unreliability, it "only" amplifies it severely. The bad news
>> is correspondingly for everyone else: S3 resume is actually unreliable
>> even in case 4, that is, without this series applied, it's just the
>> failure rate is much-much lower.
>>
>> Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>> 21 tries. (I stopped testing at the 8th failure.)
>>
>> Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>> exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>> #12 that failed; I continued testing and aborted the test after the
>> 55th try.)
>>
>> So, while the series hugely amplifies the failure rate, the failure
>> does exist without the series. Which is why I modified the case 4
>> results in the table, and also lower-cased the word "unreliable" in
>> case 13.
>>
>> Below I will return to this problem separately; let's go over the rest
>> of the table first.
>>
>> * Case 17: I guess this is not a real failure, I'm just including it for
>> completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>> additional SMRAM demand (see the commit message on patch V2 4/6). This
>> case fails with
>>
>>> SmmLockBox Command - 4
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>>> SmmLockBox SmmLockBoxHandler Exit
>>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>>
>> which is an SMRAM allocation failure. If I lower the VCPU count to
>> 50x2x2, then the guest boots fine.
>>
>> ----*----
>>
>> Before I get to the S3 resume problem (which, again, reproduces without
>> this series, although much less frequently), I'd like to comment on the
>> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
>> function, on the return value of SmmBlockingStartupThisAp(). This change
>> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>>
>>> !mSmmMpSyncData->CpuData[1].Present
>>> !mSmmMpSyncData->CpuData[2].Present
>>> !mSmmMpSyncData->CpuData[3].Present
>>> ...
>>
>> messages in the OVMF boot log, interspersed with
>>
>>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>>
>> style messages. (That is, one error message for each AP, per
>> ConvertPageEntryAttribute() message.)
>>
>> Is this okay / intentional? The number of these messages can go up to
>> several thousands and that sort of drowns out everything else in the
>> log.
>>
>> It's also not easy to mask the message, because it's logged on the
>> DEBUG_ERROR level.
>>
>> ----*----
>>
>> * Okay, so the S3 problem. Last time I suspected that the failure point
>> (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>> 9A1D0, according to the OVMF log). In order to test this idea, I
>> exercised this series with S3 against a Windows 8.1 guest (--> case 13
>> again). The failure reproduced on the second S3 resume, with identical
>> RIP, despite the Windows wakeup vector being located elsewhere (at
>> 0x1000).
>>
>> Quoting the OVMF log leading up to the resume:
>>
>>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>>> Install PPI: [PeiPostScriptTablePpi]
>>> Install PPI: [EfiEndOfPeiSignalPpi]
>>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>>> Transfer to 16bit OS waking vector - 1000
>>
>> QEMU log (same as before):
>>
>>> KVM internal error. Suberror: 1
>>> KVM internal error. Suberror: 1
>>> emulation failure
>>> emulation failure
>>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT= 000000007f294000 00000047
>>> IDT= 000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT= 000000007f294000 00000047
>>> IDT= 000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>
>> So, we can exclude the suspicion that the problem is guest OS
>> dependent.
>>
>> * Then I looked for the base address of the page containing the
>> RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>> some firmware component might have allocated that area actually. Here
>> we go:
>>
>>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>>> AP Loop Mode is 1
>>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>>
>> That is, the failure hits (when it hits -- not always) in the area
>> where the CpuMpPei driver *borrows* memory for the startup vector of
>> the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>> overloaded word here; the "wakeup buffer" has nothing to do with S3
>> resume, it just serves for booting the APs temporarily in PEI, for
>> implementing the MP service PPI.)
>>
>> When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>> the original contents of this area. This occurs just before
>> transfering control to the guest OS wakeup vector: see the
>> "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>> quoted from the OVMF log.
>>
>> I documented (parts of) this logic in OVMF commit
>>
>> https://github.com/tianocore/edk2/commit/e3e3090a959a0
>>
>> (see the code comments as well).
>>
>> * At that time, I thought to have identified a memory management bug in
>> CpuMpPei; see the following discussion and bug report for details:
>>
>> https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>> https://bugzilla.tianocore.org/show_bug.cgi?id=67
>>
>> However, with the extraction / introduction of MpInitLib, this issue
>> has been fixed: GetWakeupBuffer() now calls
>> CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>> no longer; we shouldn't be looking there for the root cause.
>>
>> * Either way, I don't understand why anything would want to execute code
>> in the one page that happens to host the MP services PPI startup
>> buffer for APs during PEI.
>>
>> Not understanding the "why", I looked at the "what", and resorted to
>> tracing KVM. Because the problem readily reproduces with this series
>> applied (case 13), it wasn't hard to start the tracing while the guest
>> was suspended, and capture just the actions that led from the
>> KVM-level wakeup to the failure.
>>
>> The QEMU state dumps are visible above in the email. I've also
>> uploaded the compressed OVMF log and the textual KVM trace here:
>>
>> http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>>
>> I sincerely hope that Paolo will have a field day with the KVM trace
>> :) I managed to identify the following curiosities (remember this is
>> all on the S3 resume path):
>>
>> * First, the VCPUs (there are four of them) enter and leave SMM in a
>> really funky pattern:
>>
>> vcpu#0 vcpu#1 vcpu#2 vcpu#3
>> ------ ------ ------ ------
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter enter
>> enter | enter |
>> | | | |
>> leave | | |
>> | | |
>> enter | | |
>> | | | |
>> leave leave leave leave
>>
>> That is, first we have each VCPU enter and leave SMM in complete
>> isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>> followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>> temporarily (it comes back in later), while the other three remain
>> in SMM. Finally all four of them leave SMM together.
>>
>> After which the problem occurs.
>>
>> * Second, the instruction that causes things to blow up is <0f aa>,
>> i.e., RSM. I have absolutely no clue why RSM is executed:
>>
>> (a) in the area that used to host the AP startup routine for the MP
>> services PPI -- note that we also have "Transfer to 16bit OS waking
>> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>> area completeley! --,
>>
>> (b) and why *after* all four VCPUs have just left SMM, together.
>>
>> * The RSM instruction is handled successfully elsewhere, for example
>> when all four VCPUs leave SMM, at the bottom of the diagram above:
>>
>>> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
>>> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
>>> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
>>> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>>
>> * The guest-phys address 7ff7f000 that we see just before the error:
>>
>>> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>>> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>>> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
>>> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>>> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
>>> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
>>
>> can be found higher up in the trace; namely, it is written to CR3
>> several times. It's the root of the page tables.
>>
>> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>>
>> * I also tried the "info tlb" monitor command, via "virsh
>> qemu-monitor-command --hmp", while the guest was auto-paused after the
>> crash.
>>
>> I cannot provide results: QEMU appeared to return a message that would
>> be longer than 16MB after encoding by libvirt, and libvirt rejected
>> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>>
>> Anyway, the KVM trace, and the QEMU register dump, look consistent
>> with what Paolo said about "Code=?? ?? ??...":
>>
>> The question marks usually mean that the page tables do not map a
>> page at that address.
>>
>> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>> (SMM=0). We can't translate *any* guest-virtual address, as we can't
>> even begin walking the page tables.
>>
>> Thanks
>> Laszlo
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 10:41 ` Yao, Jiewen
2016-11-10 12:01 ` Laszlo Ersek
@ 2016-11-10 12:27 ` Paolo Bonzini
1 sibling, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-10 12:27 UTC (permalink / raw)
To: Yao, Jiewen, Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
Zeng, Star
On 10/11/2016 11:41, Yao, Jiewen wrote:
> I also did not quite understand below log.
>
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
> CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
> CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
> CPU #3: pc=0x000000007ffd17ca thread_id=7838
>
> As I recall, writing to B2 only cause BSP get SMI on OVMF. AP does not enter SMM mode.
It's not BSP that enters SMM, it's the currently executing processor.
So this means that CPU #3 has written to B2.
Thanks,
Paolo
> So why #3 can enter SMM mode? Is that expected behavior? Or unexpected behavior?
> If this is expected, how this happened? Does OS send SendSmiIpiAllExcludingSelf, after ExitBootServices()?
>
> I will see if I can finish QEMU/KVM installation tomorrow.
>
> If you have some idea on why and how #3 enter SMM, please let us know.
>
>
> Thank you
> Yao Jiewen
>
>
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, November 10, 2016 4:46 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
> On 11/09/16 07:25, Yao, Jiewen wrote:
>> Hi Laszlo
>> I will fix DEBUG message issue in V3 patch.
>>
>> Below is rest issues:
>>
>>
>> l Case 13: S3 fails randomly.
>> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>>
>>
>> 1) We believe the dead CPU is AP. Not BSP.
>> The reason is that:
>>
>> 1.1) The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>>
>> 1.2) The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>>
>> 1.3) The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>>
>>
>> 2) Based upon the 1), we reviewed S3 resume AP flow.
>> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
>> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
>> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>>
>>
>> 3) The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>>
>>
>> 4) I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>>
>>
>> 5) The fix, I think, should be below:
>> We should always put AP to protected mode, so that no paging is needed.
>> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>>
>>
>> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>>
>> There is no need to do more investigation. Thanks for your great help on that. :)
>
> Thank you for your help!
>
> I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is
>
> BSP exits SMM and closes SMRAM on the S3 resume path before
> meeting with AP(s)
>
> I hope the title is mostly right. I didn't add any other details (I
> haven't gone through the thread in detail yet, and without that I can't
> even write up a semi-reasonable report myself). Instead, I referenced
> this message of yours in the report, and I also linked Paolo's analysis
> from elsewhere in the thread. I hope this will do for the report.
>
> (Also, thank you Paolo, from the amazing analysis -- I haven't digested
> it yet, but I can already tell it's amazing! :))
>
>> l Case 17 - I do not think it is a real issue, because SMM is out of resource.
>>
>>
>> l Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
>> SPIN_LOCK *
>> EFIAPI
>> InitializeSpinLock (
>> OUT SPIN_LOCK *SpinLock
>> )
>> {
>> ASSERT (SpinLock != NULL);
>>
>> _ReadWriteBarrier();
>> *SpinLock = SPIN_LOCK_RELEASED;
>> _ReadWriteBarrier();
>>
>> return SpinLock;
>> }
>>
>> If you can have a quick check on below, that would be great.
>>
>> 1) Which processor triggers this ASSERT? BSP or AP.
>>
>> 2) Which module triggers this ASSERT? Which module contains current RIP value?
>
> First, one additional piece of info I have learned is that the issue
> does not always present itself. Sometimes the boot just works fine,
> other times the assert fires.
>
> Using the QEMU monitor, I managed to get the following information with
> the "info cpus" command:
>
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
> CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
> CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
> CPU #3: pc=0x000000007ffd17ca thread_id=7838
>
> VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
> point into SMRAM again.
>
> In the OVMF log, I see
>
> Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
> PiSmmCpuDxeSmm.efi
>
> So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
> entry point, 0x8577, 0x253 bytes less).
>
> Running
>
> objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug
>
> first I see confirmation that
>
> start address 0x00000253
>
> and then
>
> 000087bd <CpuDeadLoop>:
> VOID
> EFIAPI
> CpuDeadLoop (
> VOID
> )
> {
> 87bd: 55 push %ebp
> 87be: 89 e5 mov %esp,%ebp
> 87c0: 83 ec 10 sub $0x10,%esp
> volatile UINTN Index;
>
> for (Index = 0; Index == 0;);
> 87c3: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
> 87ca: 8b 45 fc mov -0x4(%ebp),%eax <-- HERE
> 87cd: 85 c0 test %eax,%eax
> 87cf: 74 f9 je 87ca <CpuDeadLoop+0xd>
> }
> 87d1: c9 leave
> 87d2: c3 ret
>
> This seems consistent with an assertion failure.
>
> I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
> SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
> like a possible caller:
>
> //
> // The BUSY lock is initialized to Released state. This needs to
> // be done early enough to be ready for BSP's SmmStartupThisAp()
> // call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
> // called immediately after AP's present flag is detected.
> //
> InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>
> Just a guess, of course.
>
>> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
>> If you can share a step by step to me, that would be great.
>
> (1) Grab a host computer with a CPU that supports VMX and EPT.
>
> (2) Download and install Fedora 24 (for example):
>
> https://getfedora.org/en/workstation/download/
> http://docs.fedoraproject.org/install-guide
>
> (3) Install the "qemu-system-x86" package with DNF
>
> dnf install qemu-system-x86
>
> (4) clone edk2 with git
>
> (5) embed OpenSSL optionally (for secure boot); see
> "CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"
>
> (6) build OVMF:
>
> source edksetup.sh
> make -C "$EDK_TOOLS_PATH"
>
> # Ia32
> build \
> -a IA32 \
> -p OvmfPkg/OvmfPkgIa32.dsc \
> -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
> -t GCC5 -b DEBUG
>
> # Ia32X64
> build \
> -a IA32 -a X64 \
> -p OvmfPkg/OvmfPkgIa32X64.dsc \
> -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
> -t GCC5 -b DEBUG
>
> (7) Create disk images:
>
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
> -o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G
>
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
> -o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G
>
> (8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
> that you downloaded already (the ISO image).
>
> For 32-bit guest OS, this one used to work:
>
> https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/
>
> minimally the 20141209 release. Hm... actually, I think the maintainer
> of that image has discontinued the downloadable files :(
>
> So, I don't know what 32-bit UEFI OS to recommend for testing.
>
> 32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
> times, with some help from a Microsoft developer, but we couldn't solve
> it), so I can't recommend Windows as an alternative.
>
> Perhaps you can use
>
> https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html
>
> as a 32-bit guest OS, I never tried.
>
> (9) Anyway, once you have an installer ISO, set the "ISO" environment
> variable to the ISO image's full pathname, and then run QEMU like this:
>
> # Settings for Ia32 only:
>
> ISO=...
> DISK=.../disk-ia32.img
> FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-32.fd
> QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
> DEBUG=debug-32.log
>
> # Settings for Ia32X64 only:
>
> ISO=...
> DISK=.../disk-ia32x64.img
> FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-3264.fd
> QEMU_COMMAND=qemu-system-x86_64
> DEBUG=debug-3264.log
>
> # Common commands for both target arches:
>
> # create variable store from varstore template
> # if the former doesn't exist yet
> if ! [ -e "$VARS" ]; then
> cp -- "$TEMPLATE" "$VARS"
> fi
>
> $QEMU_COMMAND \
> -machine q35,smm=on,accel=kvm \
> -m 4096 \
> -smp sockets=1,cores=2,threads=2 \
> -global driver=cfi.pflash01,property=secure,value=on \
> -drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
> -drive if=pflash,format=raw,unit=1,file=${VARS} \
> \
> -chardev file,id=debugfile,path=$DEBUG \
> -device isa-debugcon,iobase=0x402,chardev=debugfile \
> \
> -chardev stdio,id=char0,signal=off,mux=on \
> -mon chardev=char0,mode=readline,default \
> -serial chardev:char0 \
> \
> -drive id=iso,if=none,format=raw,readonly,file=$ISO \
> -drive id=disk,if=none,format=qcow2,file=$DISK \
> \
> -device virtio-scsi-pci,id=scsi0 \
> -device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
> -device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
> \
> -device VGA
>
> This will capture the OVMF debug output in the $DEBUG file. Also, the
> terminal where you run the command can be switched between the guest's
> serial console and the QEMU monitor with [Ctrl-A C].
>
> Thanks
> Laszlo
>
>>
>> Thank you
>> Yao Jiewen
>>
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
>> Sent: Tuesday, November 8, 2016 9:22 AM
>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>>
>> On 11/04/16 10:30, Jiewen Yao wrote:
>>> ==== below is V2 description ====
>>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>>> 4) PiSmmCpu: Add protection detail in commit message.
>>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>>
>>> ==== below is V1 description ====
>>> This series patch enables SMM page level protection.
>>> Features are:
>>> 1) PiSmmCore reports SMM PE image code/data information
>>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>>> and set XD for data page and RO for code page.
>>> 3) PiSmmCpu enables Static Paging for X64 according to
>>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>>> is used as long as it is supported.
>>> 4) PiSmmCpu sets importance data structure to be read only,
>>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>>
>>> tested platform:
>>> 1) Intel internal platform (X64).
>>> 2) EDKII Quark IA32
>>> 3) EDKII Vlv2 X64
>>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>>
>>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>
>>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>
>>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>
>>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com>>>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>
>>
>> I have new test results. Let's start with the table again:
>>
>> Legend:
>>
>> - "untested" means the test was not executed because the same test
>> failed or proved unreliable in a less demanding configuration already,
>>
>> - "n/a" means a setting or test case was impossible,
>>
>> - "fail" and "unreliable" (lower case) are outside the scope of this
>> series; they either capture the pre-series status, or are expected
>> even with the series applied due to the pre-series status,
>>
>> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>> series.
>>
>> In all cases, 36 bits were used as address width in the CPU HOB (--> up
>> to 64GB guest-phys address space).
>>
>> series OVMF VCPU boot S3 resume
>> # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
>> -- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
>> 1 no Ia32 64 n/a 1x2x2 pass unreliable
>> 2 no Ia32 255 n/a 52x2x2 pass untested
>> 3 no Ia32 255 n/a 53x2x2 unreliable untested
>> 4 no Ia32X64 64 n/a 1x2x2 pass unreliable
>> 5 no Ia32X64 255 n/a 52x2x2 pass untested
>> 6 no Ia32X64 255 n/a 54x2x2 fail n/a
>> 7 v2 Ia32 64 FALSE 1x2x2 pass untested
>> 8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
>> 9 v2 Ia32 255 FALSE 52x2x2 pass untested
>> 10 v2 Ia32 255 FALSE 53x2x2 untested untested
>> 11 v2 Ia32 255 TRUE 52x2x2 untested untested
>> 12 v2 Ia32 255 TRUE 53x2x2 untested untested
>> 13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
>> 14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
>> 15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
>> 16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
>> 17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
>> 18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
>>
>> * Case 8: this test case failed with v2 as well, but this time with
>> different symptoms:
>>
>>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitExitBootServicesCallback() done!
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>>
>> I didn't try to narrow this down.
>>
>> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>> and good. The good news is for Jiewen: this patch series does not
>> cause the unreliability, it "only" amplifies it severely. The bad news
>> is correspondingly for everyone else: S3 resume is actually unreliable
>> even in case 4, that is, without this series applied, it's just the
>> failure rate is much-much lower.
>>
>> Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>> 21 tries. (I stopped testing at the 8th failure.)
>>
>> Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>> exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>> #12 that failed; I continued testing and aborted the test after the
>> 55th try.)
>>
>> So, while the series hugely amplifies the failure rate, the failure
>> does exist without the series. Which is why I modified the case 4
>> results in the table, and also lower-cased the word "unreliable" in
>> case 13.
>>
>> Below I will return to this problem separately; let's go over the rest
>> of the table first.
>>
>> * Case 17: I guess this is not a real failure, I'm just including it for
>> completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>> additional SMRAM demand (see the commit message on patch V2 4/6). This
>> case fails with
>>
>>> SmmLockBox Command - 4
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>>> SmmLockBox SmmLockBoxHandler Exit
>>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>>
>> which is an SMRAM allocation failure. If I lower the VCPU count to
>> 50x2x2, then the guest boots fine.
>>
>> ----*----
>>
>> Before I get to the S3 resume problem (which, again, reproduces without
>> this series, although much less frequently), I'd like to comment on the
>> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
>> function, on the return value of SmmBlockingStartupThisAp(). This change
>> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>>
>>> !mSmmMpSyncData->CpuData[1].Present
>>> !mSmmMpSyncData->CpuData[2].Present
>>> !mSmmMpSyncData->CpuData[3].Present
>>> ...
>>
>> messages in the OVMF boot log, interspersed with
>>
>>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>>
>> style messages. (That is, one error message for each AP, per
>> ConvertPageEntryAttribute() message.)
>>
>> Is this okay / intentional? The number of these messages can go up to
>> several thousands and that sort of drowns out everything else in the
>> log.
>>
>> It's also not easy to mask the message, because it's logged on the
>> DEBUG_ERROR level.
>>
>> ----*----
>>
>> * Okay, so the S3 problem. Last time I suspected that the failure point
>> (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>> 9A1D0, according to the OVMF log). In order to test this idea, I
>> exercised this series with S3 against a Windows 8.1 guest (--> case 13
>> again). The failure reproduced on the second S3 resume, with identical
>> RIP, despite the Windows wakeup vector being located elsewhere (at
>> 0x1000).
>>
>> Quoting the OVMF log leading up to the resume:
>>
>>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>>> Install PPI: [PeiPostScriptTablePpi]
>>> Install PPI: [EfiEndOfPeiSignalPpi]
>>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>>> Transfer to 16bit OS waking vector - 1000
>>
>> QEMU log (same as before):
>>
>>> KVM internal error. Suberror: 1
>>> KVM internal error. Suberror: 1
>>> emulation failure
>>> emulation failure
>>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT= 000000007f294000 00000047
>>> IDT= 000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT= 000000007f294000 00000047
>>> IDT= 000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>
>> So, we can exclude the suspicion that the problem is guest OS
>> dependent.
>>
>> * Then I looked for the base address of the page containing the
>> RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>> some firmware component might have allocated that area actually. Here
>> we go:
>>
>>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>>> AP Loop Mode is 1
>>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>>
>> That is, the failure hits (when it hits -- not always) in the area
>> where the CpuMpPei driver *borrows* memory for the startup vector of
>> the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>> overloaded word here; the "wakeup buffer" has nothing to do with S3
>> resume, it just serves for booting the APs temporarily in PEI, for
>> implementing the MP service PPI.)
>>
>> When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>> the original contents of this area. This occurs just before
>> transfering control to the guest OS wakeup vector: see the
>> "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>> quoted from the OVMF log.
>>
>> I documented (parts of) this logic in OVMF commit
>>
>> https://github.com/tianocore/edk2/commit/e3e3090a959a0
>>
>> (see the code comments as well).
>>
>> * At that time, I thought to have identified a memory management bug in
>> CpuMpPei; see the following discussion and bug report for details:
>>
>> https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>> https://bugzilla.tianocore.org/show_bug.cgi?id=67
>>
>> However, with the extraction / introduction of MpInitLib, this issue
>> has been fixed: GetWakeupBuffer() now calls
>> CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>> no longer; we shouldn't be looking there for the root cause.
>>
>> * Either way, I don't understand why anything would want to execute code
>> in the one page that happens to host the MP services PPI startup
>> buffer for APs during PEI.
>>
>> Not understanding the "why", I looked at the "what", and resorted to
>> tracing KVM. Because the problem readily reproduces with this series
>> applied (case 13), it wasn't hard to start the tracing while the guest
>> was suspended, and capture just the actions that led from the
>> KVM-level wakeup to the failure.
>>
>> The QEMU state dumps are visible above in the email. I've also
>> uploaded the compressed OVMF log and the textual KVM trace here:
>>
>> http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>>
>> I sincerely hope that Paolo will have a field day with the KVM trace
>> :) I managed to identify the following curiosities (remember this is
>> all on the S3 resume path):
>>
>> * First, the VCPUs (there are four of them) enter and leave SMM in a
>> really funky pattern:
>>
>> vcpu#0 vcpu#1 vcpu#2 vcpu#3
>> ------ ------ ------ ------
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter enter
>> enter | enter |
>> | | | |
>> leave | | |
>> | | |
>> enter | | |
>> | | | |
>> leave leave leave leave
>>
>> That is, first we have each VCPU enter and leave SMM in complete
>> isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>> followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>> temporarily (it comes back in later), while the other three remain
>> in SMM. Finally all four of them leave SMM together.
>>
>> After which the problem occurs.
>>
>> * Second, the instruction that causes things to blow up is <0f aa>,
>> i.e., RSM. I have absolutely no clue why RSM is executed:
>>
>> (a) in the area that used to host the AP startup routine for the MP
>> services PPI -- note that we also have "Transfer to 16bit OS waking
>> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>> area completeley! --,
>>
>> (b) and why *after* all four VCPUs have just left SMM, together.
>>
>> * The RSM instruction is handled successfully elsewhere, for example
>> when all four VCPUs leave SMM, at the bottom of the diagram above:
>>
>>> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
>>> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
>>> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
>>> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>>
>> * The guest-phys address 7ff7f000 that we see just before the error:
>>
>>> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>>> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>>> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
>>> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>>> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
>>> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
>>
>> can be found higher up in the trace; namely, it is written to CR3
>> several times. It's the root of the page tables.
>>
>> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>>
>> * I also tried the "info tlb" monitor command, via "virsh
>> qemu-monitor-command --hmp", while the guest was auto-paused after the
>> crash.
>>
>> I cannot provide results: QEMU appeared to return a message that would
>> be longer than 16MB after encoding by libvirt, and libvirt rejected
>> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>>
>> Anyway, the KVM trace, and the QEMU register dump, look consistent
>> with what Paolo said about "Code=?? ?? ??...":
>>
>> The question marks usually mean that the page tables do not map a
>> page at that address.
>>
>> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>> (SMM=0). We can't translate *any* guest-virtual address, as we can't
>> even begin walking the page tables.
>>
>> Thanks
>> Laszlo
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 12:01 ` Laszlo Ersek
@ 2016-11-10 14:48 ` Yao, Jiewen
2016-11-10 14:53 ` Paolo Bonzini
2016-11-10 16:25 ` Laszlo Ersek
0 siblings, 2 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10 14:48 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
Nice shot!!!!
After I reviewed SMI entry code again, I know the root-cause.
I made a huge mistake there.
We always save CpuIndex on the top of stack.
However, during SMI entry, below code *conditionally* push EDX.
; enable NXE if supported
DB 0b0h ; mov al, imm8
ASM_PFX(mXdSupported): DB 0
cmp al, 0
jz @SkipXd
;
; Check XD disable bit
;
mov ecx, MSR_IA32_MISC_ENABLE
rdmsr
push edx ; save MSR_IA32_MISC_ENABLE[63-32]
then later, below code *unconditionally* set CpuIndex above pushed EDX.
mov ebx, [esp + 4] ; CPU Index
I cannot reproduce it before, because all my real hardware supports XD. My Windows QEMU also supports XD (to my surprise.)
Now I did reproduce it, after I hardcode XD to be disabled.
Laszlo, your analysis will save me one day to install the Linux QEMU. :)
Thank you
Yao Jiewen
From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Thursday, November 10, 2016 8:02 PM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
On 11/10/16 11:41, Yao, Jiewen wrote:
> Thanks to report case 3 issue on bugzillar.
>
> Let's focus on Case 8.
> It seems another random failure issue.
>
> I did more test.
>
> 1) I tested some other our internal real platform for UEFI32 OS boot. I cannot reproduce the ASSERT.
>
> 2) I wrote a small test app to call ExitBootServices and send SMI. I run it on current my windows QEMU but I still cannot reproduce the ASSERT.
>
> It seem your env is the only way to repo the issue. I am trying to follow your step by step to install OS on QEMU/KVM. I haven't finish all thing yet, because of some network proxy issue. :(
Right, when you run a guest on TCG (QEMU's emulator) vs. on KVM (the virtualizer / accelerator in the host Linux kernel), you get very-very different timing behavior and interleaving of actions. For one, with KVM, the VCPUs really execute in parallel -- they are represented by host OS threads, and the host OS schedules them to separate "physical logical CPUs".
>
> Your information and analysis is great. It gives us some clue.
>
> I guess the same thing as you and checked: InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>
> This address is initialized in InitializeMpSyncData(), with gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus which is got from MpServices->GetNumberOfProcessors().
> I do not know why this address is zero.
>
> I also did not quite understand below log.
>
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
> CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
> CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
> CPU #3: pc=0x000000007ffd17ca thread_id=7838
>
> As I recall, writing to B2 only cause BSP get SMI on OVMF. AP does not enter SMM mode.
> So why #3 can enter SMM mode? Is that expected behavior? Or unexpected behavior?
> If this is expected, how this happened? Does OS send SendSmiIpiAllExcludingSelf, after ExitBootServices()?
My theory is that the OS is calling a runtime variable service during boot. That is supposed to pull in all APs into SMM, one way or another.
Also, during boot, the OS may call the runtime variable services genuinely on VCPU#3.
>
> I will see if I can finish QEMU/KVM installation tomorrow.
Thanks! Once you can test with KVM on your side, that should speed up debugging considerably, I think!
> If you have some idea on why and how #3 enter SMM, please let us know.
Well, I captured a KVM trace for this as well (fresh boot, up to the failure). Grepping the trace for entering / leaving SMM, we see:
(1) the initial SMBASE relocation:
CPU-6948 [004] 11545.040294: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000
CPU-6948 [004] 11545.040335: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb5000
CPU-6949 [000] 11545.040363: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000
CPU-6949 [000] 11545.040389: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb7000
CPU-6950 [002] 11545.040417: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000
CPU-6950 [002] 11545.040443: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb9000
CPU-6947 [007] 11545.040453: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000
CPU-6947 [007] 11545.040474: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb3000
(2) a long stretch of VCPU#0 entering and leaving SMM, while the firmware uses variable services and such:
CPU-6947 [007] 11545.053169: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb3000
CPU-6947 [007] 11545.061272: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb3000
...
(3) a write to ioport 0xB2 from VCPU#3, then VCPU#3 entering SMM, then hitting the assert very-very soon:
CPU-6950 [005] 11550.521195: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521196: kvm_exit: reason IO_INSTRUCTION rip 0xf7c937b6 info b20000 0
CPU-6950 [005] 11550.521196: kvm_pio: pio_write at 0xb2 size 1 count 1 val 0x0
CPU-6950 [005] 11550.521196: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6947 [003] 11550.521196: kvm_inj_virq: irq 253
CPU-6950 [005] 11550.521196: kvm_fpu: unload
CPU-6947 [003] 11550.521197: kvm_fpu: load
CPU-6947 [003] 11550.521197: kvm_entry: vcpu 0
CPU-6950 [005] 11550.521200: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb9000
CPU-6947 [003] 11550.521207: kvm_eoi: apicid 0 vector 253
CPU-6950 [005] 11550.521207: kvm_fpu: load
CPU-6947 [003] 11550.521207: kvm_pv_eoi: apicid 0 vector 253
CPU-6950 [005] 11550.521207: kvm_entry: vcpu 3
CPU-6947 [003] 11550.521207: kvm_exit: reason HLT rip 0xc1844554 info 0 0
CPU-6950 [005] 11550.521209: kvm_exit: reason CR_ACCESS rip 0x8045 info 300 0
CPU-6950 [005] 11550.521209: kvm_cr: cr_write 0 = 0x33
CPU-6950 [005] 11550.521212: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521213: kvm_exit: reason CR_ACCESS rip 0x7ffc107d info 3 0
CPU-6950 [005] 11550.521213: kvm_cr: cr_write 3 = 0x7ff9a000
CPU-6950 [005] 11550.521214: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521214: kvm_exit: reason CPUID rip 0x7ffc1085 info 0 0
CPU-6950 [005] 11550.521214: kvm_cpuid: func 1 rax 6e8 rbx 3040800 rcx 80200001 rdx 1f89fbff
CPU-6950 [005] 11550.521215: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521215: kvm_exit: reason CR_ACCESS rip 0x7ffc10c4 info 4 0
CPU-6950 [005] 11550.521215: kvm_cr: cr_write 4 = 0x668
CPU-6950 [005] 11550.521217: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521218: kvm_exit: reason CR_ACCESS rip 0x7ffc110e info 300 0
CPU-6950 [005] 11550.521218: kvm_cr: cr_write 0 = 0x80010033
CPU-6950 [005] 11550.521220: kvm_entry: vcpu 3
CPU-6947 [003] 11550.521220: kvm_fpu: unload
CPU-6950 [005] 11550.521222: kvm_exit: reason EPT_VIOLATION rip 0x7ffcbe46 info 181 0
CPU-6950 [005] 11550.521223: kvm_page_fault: address 22004ebc error_code 181
CPU-6950 [005] 11550.521231: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521236: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521236: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x41 <----------------- "A"
CPU-6950 [005] 11550.521237: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521237: kvm_fpu: unload
CPU-6950 [005] 11550.521253: kvm_fpu: load
CPU-6950 [005] 11550.521253: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521254: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521254: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x53 <----------------- "S"
CPU-6950 [005] 11550.521254: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521254: kvm_fpu: unload
CPU-6950 [005] 11550.521257: kvm_fpu: load
CPU-6950 [005] 11550.521257: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521258: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521258: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x53 <----------------- "S"
CPU-6950 [005] 11550.521258: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521258: kvm_fpu: unload
CPU-6950 [005] 11550.521260: kvm_fpu: load
CPU-6950 [005] 11550.521260: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521261: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521261: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x45 <----------------- "E"
CPU-6950 [005] 11550.521261: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521262: kvm_fpu: unload
CPU-6950 [005] 11550.521264: kvm_fpu: load
CPU-6950 [005] 11550.521264: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521264: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521264: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x52 <----------------- "R"
CPU-6950 [005] 11550.521264: kvm_userspace_exit: reason KVM_EXIT_IO (2)
CPU-6950 [005] 11550.521265: kvm_fpu: unload
CPU-6950 [005] 11550.521267: kvm_fpu: load
CPU-6950 [005] 11550.521267: kvm_entry: vcpu 3
CPU-6950 [005] 11550.521267: kvm_exit: reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
CPU-6950 [005] 11550.521267: kvm_pio: pio_write at 0x402 size 1 count 1 val 0x54 <----------------- "T"
CPU-6950 [005] 11550.521268: kvm_userspace_exit: reason KVM_EXIT_IO (2)
This seems to be consistent with the OS calling a variable service on VCPU#3.
Also, as far as I can see, the above trace matches the assembly code in "UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm".
Is perhaps CpuIndex out of bounds?... Hmm, with the following debug patch:
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> index d0092d2f145a..29f6e783c58f 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> @@ -1143,6 +1143,9 @@ SmiRendezvous (
> // E.g., with Relaxed AP flow, SmmStartupThisAp() may be called immediately
> // after AP's present flag is detected.
> //
> + if (CpuIndex >= 4) {
> + DEBUG ((EFI_D_ERROR, "CpuIndex=%u\n", (UINT32)CpuIndex));
> + }
> InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
> }
>
>
I get the following debug output (note that my SMP configuration is 1x2x2):
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> CpuIndex=780161211
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
Ehm... what? :)
SmiRendezvous() is EFIAPI, is the calling convention followed in "Ia32/SmiEntry.nasm"?
Thanks,
Laszlo
> Thank you
> Yao Jiewen
>
>
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, November 10, 2016 4:46 AM
> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
> On 11/09/16 07:25, Yao, Jiewen wrote:
>> Hi Laszlo
>> I will fix DEBUG message issue in V3 patch.
>>
>> Below is rest issues:
>>
>>
>> l Case 13: S3 fails randomly.
>> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>>
>>
>> 1) We believe the dead CPU is AP. Not BSP.
>> The reason is that:
>>
>> 1.1) The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>>
>> 1.2) The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>>
>> 1.3) The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>>
>>
>> 2) Based upon the 1), we reviewed S3 resume AP flow.
>> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
>> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
>> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>>
>>
>> 3) The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>>
>>
>> 4) I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>>
>>
>> 5) The fix, I think, should be below:
>> We should always put AP to protected mode, so that no paging is needed.
>> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>>
>>
>> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>>
>> There is no need to do more investigation. Thanks for your great help on that. :)
>
> Thank you for your help!
>
> I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is
>
> BSP exits SMM and closes SMRAM on the S3 resume path before
> meeting with AP(s)
>
> I hope the title is mostly right. I didn't add any other details (I
> haven't gone through the thread in detail yet, and without that I can't
> even write up a semi-reasonable report myself). Instead, I referenced
> this message of yours in the report, and I also linked Paolo's analysis
> from elsewhere in the thread. I hope this will do for the report.
>
> (Also, thank you Paolo, from the amazing analysis -- I haven't digested
> it yet, but I can already tell it's amazing! :))
>
>> l Case 17 - I do not think it is a real issue, because SMM is out of resource.
>>
>>
>> l Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
>> SPIN_LOCK *
>> EFIAPI
>> InitializeSpinLock (
>> OUT SPIN_LOCK *SpinLock
>> )
>> {
>> ASSERT (SpinLock != NULL);
>>
>> _ReadWriteBarrier();
>> *SpinLock = SPIN_LOCK_RELEASED;
>> _ReadWriteBarrier();
>>
>> return SpinLock;
>> }
>>
>> If you can have a quick check on below, that would be great.
>>
>> 1) Which processor triggers this ASSERT? BSP or AP.
>>
>> 2) Which module triggers this ASSERT? Which module contains current RIP value?
>
> First, one additional piece of info I have learned is that the issue
> does not always present itself. Sometimes the boot just works fine,
> other times the assert fires.
>
> Using the QEMU monitor, I managed to get the following information with
> the "info cpus" command:
>
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
> CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
> CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
> CPU #3: pc=0x000000007ffd17ca thread_id=7838
>
> VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
> point into SMRAM again.
>
> In the OVMF log, I see
>
> Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
> PiSmmCpuDxeSmm.efi
>
> So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
> entry point, 0x8577, 0x253 bytes less).
>
> Running
>
> objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug
>
> first I see confirmation that
>
> start address 0x00000253
>
> and then
>
> 000087bd <CpuDeadLoop>:
> VOID
> EFIAPI
> CpuDeadLoop (
> VOID
> )
> {
> 87bd: 55 push %ebp
> 87be: 89 e5 mov %esp,%ebp
> 87c0: 83 ec 10 sub $0x10,%esp
> volatile UINTN Index;
>
> for (Index = 0; Index == 0;);
> 87c3: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
> 87ca: 8b 45 fc mov -0x4(%ebp),%eax <-- HERE
> 87cd: 85 c0 test %eax,%eax
> 87cf: 74 f9 je 87ca <CpuDeadLoop+0xd>
> }
> 87d1: c9 leave
> 87d2: c3 ret
>
> This seems consistent with an assertion failure.
>
> I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
> SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
> like a possible caller:
>
> //
> // The BUSY lock is initialized to Released state. This needs to
> // be done early enough to be ready for BSP's SmmStartupThisAp()
> // call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
> // called immediately after AP's present flag is detected.
> //
> InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>
> Just a guess, of course.
>
>> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
>> If you can share a step by step to me, that would be great.
>
> (1) Grab a host computer with a CPU that supports VMX and EPT.
>
> (2) Download and install Fedora 24 (for example):
>
> https://getfedora.org/en/workstation/download/
> http://docs.fedoraproject.org/install-guide
>
> (3) Install the "qemu-system-x86" package with DNF
>
> dnf install qemu-system-x86
>
> (4) clone edk2 with git
>
> (5) embed OpenSSL optionally (for secure boot); see
> "CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"
>
> (6) build OVMF:
>
> source edksetup.sh
> make -C "$EDK_TOOLS_PATH"
>
> # Ia32
> build \
> -a IA32 \
> -p OvmfPkg/OvmfPkgIa32.dsc \
> -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
> -t GCC5 -b DEBUG
>
> # Ia32X64
> build \
> -a IA32 -a X64 \
> -p OvmfPkg/OvmfPkgIa32X64.dsc \
> -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
> -t GCC5 -b DEBUG
>
> (7) Create disk images:
>
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
> -o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G
>
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
> -o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G
>
> (8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
> that you downloaded already (the ISO image).
>
> For 32-bit guest OS, this one used to work:
>
> https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/
>
> minimally the 20141209 release. Hm... actually, I think the maintainer
> of that image has discontinued the downloadable files :(
>
> So, I don't know what 32-bit UEFI OS to recommend for testing.
>
> 32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
> times, with some help from a Microsoft developer, but we couldn't solve
> it), so I can't recommend Windows as an alternative.
>
> Perhaps you can use
>
> https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html
>
> as a 32-bit guest OS, I never tried.
>
> (9) Anyway, once you have an installer ISO, set the "ISO" environment
> variable to the ISO image's full pathname, and then run QEMU like this:
>
> # Settings for Ia32 only:
>
> ISO=...
> DISK=.../disk-ia32.img
> FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-32.fd
> QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
> DEBUG=debug-32.log
>
> # Settings for Ia32X64 only:
>
> ISO=...
> DISK=.../disk-ia32x64.img
> FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-3264.fd
> QEMU_COMMAND=qemu-system-x86_64
> DEBUG=debug-3264.log
>
> # Common commands for both target arches:
>
> # create variable store from varstore template
> # if the former doesn't exist yet
> if ! [ -e "$VARS" ]; then
> cp -- "$TEMPLATE" "$VARS"
> fi
>
> $QEMU_COMMAND \
> -machine q35,smm=on,accel=kvm \
> -m 4096 \
> -smp sockets=1,cores=2,threads=2 \
> -global driver=cfi.pflash01,property=secure,value=on \
> -drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
> -drive if=pflash,format=raw,unit=1,file=${VARS} \
> \
> -chardev file,id=debugfile,path=$DEBUG \
> -device isa-debugcon,iobase=0x402,chardev=debugfile \
> \
> -chardev stdio,id=char0,signal=off,mux=on \
> -mon chardev=char0,mode=readline,default \
> -serial chardev:char0 \
> \
> -drive id=iso,if=none,format=raw,readonly,file=$ISO \
> -drive id=disk,if=none,format=qcow2,file=$DISK \
> \
> -device virtio-scsi-pci,id=scsi0 \
> -device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
> -device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
> \
> -device VGA
>
> This will capture the OVMF debug output in the $DEBUG file. Also, the
> terminal where you run the command can be switched between the guest's
> serial console and the QEMU monitor with [Ctrl-A C].
>
> Thanks
> Laszlo
>
>>
>> Thank you
>> Yao Jiewen
>>
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
>> Sent: Tuesday, November 8, 2016 9:22 AM
>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>
>> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org%3cmailto:edk2-devel@ml01.01.org>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com<mailto:pbonzini@redhat.com%3cmailto:pbonzini@redhat.com>>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>>
>> On 11/04/16 10:30, Jiewen Yao wrote:
>>> ==== below is V2 description ====
>>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>>> 4) PiSmmCpu: Add protection detail in commit message.
>>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>>
>>> ==== below is V1 description ====
>>> This series patch enables SMM page level protection.
>>> Features are:
>>> 1) PiSmmCore reports SMM PE image code/data information
>>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>>> and set XD for data page and RO for code page.
>>> 3) PiSmmCpu enables Static Paging for X64 according to
>>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>>> is used as long as it is supported.
>>> 4) PiSmmCpu sets importance data structure to be read only,
>>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>>
>>> tested platform:
>>> 1) Intel internal platform (X64).
>>> 2) EDKII Quark IA32
>>> 3) EDKII Vlv2 X64
>>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>>
>>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>>
>>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>>
>>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>>
>>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>>
>>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com%3cmailto:lersek@redhat.com%3cmailto:lersek@redhat.com>>>>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>>
>>
>> I have new test results. Let's start with the table again:
>>
>> Legend:
>>
>> - "untested" means the test was not executed because the same test
>> failed or proved unreliable in a less demanding configuration already,
>>
>> - "n/a" means a setting or test case was impossible,
>>
>> - "fail" and "unreliable" (lower case) are outside the scope of this
>> series; they either capture the pre-series status, or are expected
>> even with the series applied due to the pre-series status,
>>
>> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>> series.
>>
>> In all cases, 36 bits were used as address width in the CPU HOB (--> up
>> to 64GB guest-phys address space).
>>
>> series OVMF VCPU boot S3 resume
>> # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result result
>> -- ------- -------- ------------------------------- ------------------------ -------- ------ ---------
>> 1 no Ia32 64 n/a 1x2x2 pass unreliable
>> 2 no Ia32 255 n/a 52x2x2 pass untested
>> 3 no Ia32 255 n/a 53x2x2 unreliable untested
>> 4 no Ia32X64 64 n/a 1x2x2 pass unreliable
>> 5 no Ia32X64 255 n/a 52x2x2 pass untested
>> 6 no Ia32X64 255 n/a 54x2x2 fail n/a
>> 7 v2 Ia32 64 FALSE 1x2x2 pass untested
>> 8 v2 Ia32 64 TRUE 1x2x2 FAIL untested
>> 9 v2 Ia32 255 FALSE 52x2x2 pass untested
>> 10 v2 Ia32 255 FALSE 53x2x2 untested untested
>> 11 v2 Ia32 255 TRUE 52x2x2 untested untested
>> 12 v2 Ia32 255 TRUE 53x2x2 untested untested
>> 13 v2 Ia32X64 64 FALSE 1x2x2 pass unreliable
>> 14 v2 Ia32X64 64 TRUE 1x2x2 pass untested
>> 15 v2 Ia32X64 255 FALSE 52x2x2 pass untested
>> 16 v2 Ia32X64 255 FALSE 54x2x2 untested untested
>> 17 v2 Ia32X64 255 TRUE 52x2x2 FAIL untested
>> 18 v2 Ia32X64 255 TRUE 54x2x2 untested untested
>>
>> * Case 8: this test case failed with v2 as well, but this time with
>> different symptoms:
>>
>>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitExitBootServicesCallback() done!
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>>
>> I didn't try to narrow this down.
>>
>> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>> and good. The good news is for Jiewen: this patch series does not
>> cause the unreliability, it "only" amplifies it severely. The bad news
>> is correspondingly for everyone else: S3 resume is actually unreliable
>> even in case 4, that is, without this series applied, it's just the
>> failure rate is much-much lower.
>>
>> Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>> 21 tries. (I stopped testing at the 8th failure.)
>>
>> Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>> exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>> #12 that failed; I continued testing and aborted the test after the
>> 55th try.)
>>
>> So, while the series hugely amplifies the failure rate, the failure
>> does exist without the series. Which is why I modified the case 4
>> results in the table, and also lower-cased the word "unreliable" in
>> case 13.
>>
>> Below I will return to this problem separately; let's go over the rest
>> of the table first.
>>
>> * Case 17: I guess this is not a real failure, I'm just including it for
>> completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>> additional SMRAM demand (see the commit message on patch V2 4/6). This
>> case fails with
>>
>>> SmmLockBox Command - 4
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>>> SmmLockBox SmmLockBoxHandler Exit
>>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>>
>> which is an SMRAM allocation failure. If I lower the VCPU count to
>> 50x2x2, then the guest boots fine.
>>
>> ----*----
>>
>> Before I get to the S3 resume problem (which, again, reproduces without
>> this series, although much less frequently), I'd like to comment on the
>> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
>> function, on the return value of SmmBlockingStartupThisAp(). This change
>> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>>
>>> !mSmmMpSyncData->CpuData[1].Present
>>> !mSmmMpSyncData->CpuData[2].Present
>>> !mSmmMpSyncData->CpuData[3].Present
>>> ...
>>
>> messages in the OVMF boot log, interspersed with
>>
>>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>>
>> style messages. (That is, one error message for each AP, per
>> ConvertPageEntryAttribute() message.)
>>
>> Is this okay / intentional? The number of these messages can go up to
>> several thousands and that sort of drowns out everything else in the
>> log.
>>
>> It's also not easy to mask the message, because it's logged on the
>> DEBUG_ERROR level.
>>
>> ----*----
>>
>> * Okay, so the S3 problem. Last time I suspected that the failure point
>> (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>> 9A1D0, according to the OVMF log). In order to test this idea, I
>> exercised this series with S3 against a Windows 8.1 guest (--> case 13
>> again). The failure reproduced on the second S3 resume, with identical
>> RIP, despite the Windows wakeup vector being located elsewhere (at
>> 0x1000).
>>
>> Quoting the OVMF log leading up to the resume:
>>
>>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>>> Install PPI: [PeiPostScriptTablePpi]
>>> Install PPI: [EfiEndOfPeiSignalPpi]
>>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>>> Transfer to 16bit OS waking vector - 1000
>>
>> QEMU log (same as before):
>>
>>> KVM internal error. Suberror: 1
>>> KVM internal error. Suberror: 1
>>> emulation failure
>>> emulation failure
>>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT= 000000007f294000 00000047
>>> IDT= 000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT= 000000007f294000 00000047
>>> IDT= 000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>
>> So, we can exclude the suspicion that the problem is guest OS
>> dependent.
>>
>> * Then I looked for the base address of the page containing the
>> RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>> some firmware component might have allocated that area actually. Here
>> we go:
>>
>>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>>> AP Loop Mode is 1
>>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>>
>> That is, the failure hits (when it hits -- not always) in the area
>> where the CpuMpPei driver *borrows* memory for the startup vector of
>> the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>> overloaded word here; the "wakeup buffer" has nothing to do with S3
>> resume, it just serves for booting the APs temporarily in PEI, for
>> implementing the MP service PPI.)
>>
>> When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>> the original contents of this area. This occurs just before
>> transfering control to the guest OS wakeup vector: see the
>> "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>> quoted from the OVMF log.
>>
>> I documented (parts of) this logic in OVMF commit
>>
>> https://github.com/tianocore/edk2/commit/e3e3090a959a0
>>
>> (see the code comments as well).
>>
>> * At that time, I thought to have identified a memory management bug in
>> CpuMpPei; see the following discussion and bug report for details:
>>
>> https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>> https://bugzilla.tianocore.org/show_bug.cgi?id=67
>>
>> However, with the extraction / introduction of MpInitLib, this issue
>> has been fixed: GetWakeupBuffer() now calls
>> CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>> no longer; we shouldn't be looking there for the root cause.
>>
>> * Either way, I don't understand why anything would want to execute code
>> in the one page that happens to host the MP services PPI startup
>> buffer for APs during PEI.
>>
>> Not understanding the "why", I looked at the "what", and resorted to
>> tracing KVM. Because the problem readily reproduces with this series
>> applied (case 13), it wasn't hard to start the tracing while the guest
>> was suspended, and capture just the actions that led from the
>> KVM-level wakeup to the failure.
>>
>> The QEMU state dumps are visible above in the email. I've also
>> uploaded the compressed OVMF log and the textual KVM trace here:
>>
>> http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>>
>> I sincerely hope that Paolo will have a field day with the KVM trace
>> :) I managed to identify the following curiosities (remember this is
>> all on the S3 resume path):
>>
>> * First, the VCPUs (there are four of them) enter and leave SMM in a
>> really funky pattern:
>>
>> vcpu#0 vcpu#1 vcpu#2 vcpu#3
>> ------ ------ ------ ------
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter
>> |
>> leave
>>
>> enter enter
>> enter | enter |
>> | | | |
>> leave | | |
>> | | |
>> enter | | |
>> | | | |
>> leave leave leave leave
>>
>> That is, first we have each VCPU enter and leave SMM in complete
>> isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>> followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>> temporarily (it comes back in later), while the other three remain
>> in SMM. Finally all four of them leave SMM together.
>>
>> After which the problem occurs.
>>
>> * Second, the instruction that causes things to blow up is <0f aa>,
>> i.e., RSM. I have absolutely no clue why RSM is executed:
>>
>> (a) in the area that used to host the AP startup routine for the MP
>> services PPI -- note that we also have "Transfer to 16bit OS waking
>> vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>> area completeley! --,
>>
>> (b) and why *after* all four VCPUs have just left SMM, together.
>>
>> * The RSM instruction is handled successfully elsewhere, for example
>> when all four VCPUs leave SMM, at the bottom of the diagram above:
>>
>>> CPU-24447 [002] 39841.982810: kvm_emulate_insn: 0:7ffbf179: 0f aa
>>> CPU-24446 [000] 39841.982810: kvm_emulate_insn: 0:7ffbd179: 0f aa
>>> CPU-24445 [005] 39841.982810: kvm_emulate_insn: 0:7ffbb179: 0f aa
>>> CPU-24444 [006] 39841.982811: kvm_emulate_insn: 0:7ffb9179: 0f aa
>>
>> * The guest-phys address 7ff7f000 that we see just before the error:
>>
>>> CPU-24447 [002] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>>> CPU-24446 [000] 39841.982825: kvm_page_fault: address 7ff7f000 error_code 83
>>> CPU-24447 [002] 39841.982826: kvm_emulate_insn: 0:9f0fd: 0f aa
>>> CPU-24444 [006] 39841.982827: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>>> CPU-24447 [002] 39841.982827: kvm_emulate_insn: 0:9f0fd: 0f aa FAIL
>>> CPU-24447 [002] 39841.982827: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)
>>
>> can be found higher up in the trace; namely, it is written to CR3
>> several times. It's the root of the page tables.
>>
>> * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>>
>> * I also tried the "info tlb" monitor command, via "virsh
>> qemu-monitor-command --hmp", while the guest was auto-paused after the
>> crash.
>>
>> I cannot provide results: QEMU appeared to return a message that would
>> be longer than 16MB after encoding by libvirt, and libvirt rejected
>> that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>>
>> Anyway, the KVM trace, and the QEMU register dump, look consistent
>> with what Paolo said about "Code=?? ?? ??...":
>>
>> The question marks usually mean that the page tables do not map a
>> page at that address.
>>
>> CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>> (SMM=0). We can't translate *any* guest-virtual address, as we can't
>> even begin walking the page tables.
>>
>> Thanks
>> Laszlo
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 14:48 ` Yao, Jiewen
@ 2016-11-10 14:53 ` Paolo Bonzini
2016-11-10 16:22 ` Laszlo Ersek
2016-11-10 16:25 ` Laszlo Ersek
1 sibling, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-10 14:53 UTC (permalink / raw)
To: Yao, Jiewen, Laszlo Ersek
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
Zeng, Star
On 10/11/2016 15:48, Yao, Jiewen wrote:
> I cannot reproduce it before, because all my real hardware supports XD.
> My Windows QEMU also supports XD (to my surprise.)
QEMU can be configured to support XD or not. Possibly Laszlo was using
some different default, or testing both cases.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 14:53 ` Paolo Bonzini
@ 2016-11-10 16:22 ` Laszlo Ersek
2016-11-10 16:39 ` Paolo Bonzini
0 siblings, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-10 16:22 UTC (permalink / raw)
To: Paolo Bonzini, Yao, Jiewen
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
Zeng, Star
On 11/10/16 15:53, Paolo Bonzini wrote:
>
>
> On 10/11/2016 15:48, Yao, Jiewen wrote:
>> I cannot reproduce it before, because all my real hardware supports XD.
>> My Windows QEMU also supports XD (to my surprise.)
>
> QEMU can be configured to support XD or not. Possibly Laszlo was using
> some different default, or testing both cases.
When QEMU emulates an Ia32 (32-bit) target, the SMM state save area has
no room for capturing the fact whether NX is set or clear. This is an
issue that dates back to the inception of OVMF's SMM support. The
explanation was given by Paolo, actually :)
https://www.mail-archive.com/edk2-devel@lists.01.org/msg00970.html
We adjusted the OvmfPkg/README file accordingly:
> * QEMU binary and options specific to 32-bit guests:
>
> $ qemu-system-i386 -cpu coreduo,-nx \
>
> or
>
> $ qemu-system-x86_64 -cpu <MODEL>,-lm,-nx \
>
Note the "-nx" bit.
And, in my recent KVM / QEMU usage instructions for Jiewen:
https://www.mail-archive.com/edk2-devel@lists.01.org/msg19446.html
I provided the following settings:
> # Settings for Ia32 only:
> [...]
> QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
>
> # Settings for Ia32X64 only:
> [...]
> QEMU_COMMAND=qemu-system-x86_64
I guess the "-nx" bit can be left off with TCG, but AFAIR it is required
for KVM.
Thanks!
Laszlo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 14:48 ` Yao, Jiewen
2016-11-10 14:53 ` Paolo Bonzini
@ 2016-11-10 16:25 ` Laszlo Ersek
1 sibling, 0 replies; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-10 16:25 UTC (permalink / raw)
To: Yao, Jiewen
Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
Paolo Bonzini, Fan, Jeff, Zeng, Star
On 11/10/16 15:48, Yao, Jiewen wrote:
> Laszlo, your analysis will save me one day to install the Linux QEMU. J
Perfect; I can't wait till you guys adopt QEMU/KVM as a test platform! :)
Cheers
Laszlo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2 0/6] Enable SMM page level protection.
2016-11-10 16:22 ` Laszlo Ersek
@ 2016-11-10 16:39 ` Paolo Bonzini
0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-10 16:39 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Jiewen Yao, Feng Tian, edk2-devel, Michael D Kinney, Jeff Fan,
Star Zeng
> And, in my recent KVM / QEMU usage instructions for Jiewen:
>
> https://www.mail-archive.com/edk2-devel@lists.01.org/msg19446.html
>
> I provided the following settings:
>
> > # Settings for Ia32 only:
> > [...]
> > QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
> >
> > # Settings for Ia32X64 only:
> > [...]
> > QEMU_COMMAND=qemu-system-x86_64
>
> I guess the "-nx" bit can be left off with TCG, but AFAIR it is required
> for KVM.
Oh right now I remember. The same problem exists: EFER is not saved in the
32-bit state save map. AFAIK all processors with XD also have long mode.
That said, qemu-system-x86_64 and no -cpu option should work even with Ia32
PEI/DXE/SMM and no -cpu option. In that case you could use XD.
Now if only Intel made the *full* format of the state save map public, we
could emulate everything more accurately... I'm told it's in the BIOS
writers guide.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2016-11-10 16:39 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm " Jiewen Yao
2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
2016-11-04 22:46 ` Yao, Jiewen
2016-11-04 23:08 ` Laszlo Ersek
2016-11-08 1:22 ` Laszlo Ersek
2016-11-08 12:59 ` Yao, Jiewen
2016-11-08 13:22 ` Laszlo Ersek
2016-11-08 13:41 ` Yao, Jiewen
2016-11-09 6:25 ` Yao, Jiewen
2016-11-09 11:30 ` Paolo Bonzini
2016-11-09 15:01 ` Yao, Jiewen
2016-11-09 15:54 ` Paolo Bonzini
2016-11-09 16:06 ` Paolo Bonzini
2016-11-09 22:28 ` Laszlo Ersek
2016-11-09 22:59 ` Paolo Bonzini
2016-11-09 23:27 ` Laszlo Ersek
2016-11-10 1:13 ` Yao, Jiewen
2016-11-10 6:30 ` Fan, Jeff
2016-11-10 0:49 ` Yao, Jiewen
2016-11-10 0:50 ` Yao, Jiewen
2016-11-10 1:02 ` Fan, Jeff
2016-11-09 20:46 ` Laszlo Ersek
2016-11-10 10:41 ` Yao, Jiewen
2016-11-10 12:01 ` Laszlo Ersek
2016-11-10 14:48 ` Yao, Jiewen
2016-11-10 14:53 ` Paolo Bonzini
2016-11-10 16:22 ` Laszlo Ersek
2016-11-10 16:39 ` Paolo Bonzini
2016-11-10 16:25 ` Laszlo Ersek
2016-11-10 12:27 ` Paolo Bonzini
2016-11-09 11:23 ` Paolo Bonzini
2016-11-09 15:16 ` Yao, Jiewen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox