public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH V2 0/6] Enable SMM page level protection.
@ 2016-11-04  9:30 Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
                   ` (7 more replies)
  0 siblings, 8 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04  9:30 UTC (permalink / raw)
  To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek

==== below is V2 description ====
1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
2) PiSmmCpu: Add debug info on StartupAp() fails.
3) PiSmmCpu: Add ASSERT for AllocatePages().
4) PiSmmCpu: Add protection detail in commit message.
5) UefiCpuPkg.dsc: Add page table footprint info in commit message.

==== below is V1 description ====
This series patch enables SMM page level protection.
Features are:
1) PiSmmCore reports SMM PE image code/data information
in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
and set XD for data page and RO for code page.
3) PiSmmCpu enables Static Paging for X64 according to
PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
is used as long as it is supported.
4) PiSmmCpu sets importance data structure to be read only,
such as Gdt, Idt, SmmEntrypoint, and PageTable itself.

tested platform:
1) Intel internal platform (X64).
2) EDKII Quark IA32
3) EDKII Vlv2  X64
4) EDKII OVMF IA32 and IA32X64. (with -smp 8)

Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>

Jiewen Yao (6):
  MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
  MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
  MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
  UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
  UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
  QuarkPlatformPkg/dsc: enable Smm paging protection.

 MdeModulePkg/Core/PiSmmCore/Dispatcher.c               |   66 +
 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c    | 1509 ++++++++++++++++++++
 MdeModulePkg/Core/PiSmmCore/Page.c                     |  775 +++++++++-
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.c                |   40 +
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.h                |   91 ++
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf              |    2 +
 MdeModulePkg/Core/PiSmmCore/Pool.c                     |   16 +
 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h |   51 +
 MdeModulePkg/MdeModulePkg.dec                          |    3 +
 QuarkPlatformPkg/Quark.dsc                             |    6 +
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c               |   71 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S              |   67 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm            |   68 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm           |   70 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S          |  226 +--
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm        |   36 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm       |   36 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c          |   37 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c        |    4 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c                  |  127 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c             |  142 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h             |  156 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf           |    5 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c     |  871 +++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c                 |   39 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h                 |   15 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c                |  274 +++-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S               |   51 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm             |   54 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm            |   61 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S           |  250 +---
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm         |   35 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm        |   31 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c           |   30 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c         |    7 +-
 UefiCpuPkg/UefiCpuPkg.dec                              |    8 +
 36 files changed, 4529 insertions(+), 801 deletions(-)
 create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
 create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
 create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c

-- 
2.7.4.windows.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
  2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
@ 2016-11-04  9:30 ` Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid Jiewen Yao
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04  9:30 UTC (permalink / raw)
  To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek

This table describes the SMM memory attributes.

Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h | 51 ++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h b/MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
new file mode 100644
index 0000000..317eae1
--- /dev/null
+++ b/MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
@@ -0,0 +1,51 @@
+/** @file
+  Define the GUID of the EDKII PI SMM memory attribute table, which
+  is published by PI SMM Core.
+
+Copyright (c) 2016, Intel Corporation. All rights reserved.<BR>
+This program and the accompanying materials are licensed and made available under
+the terms and conditions of the BSD License that accompanies this distribution.
+The full text of the license may be found at
+http://opensource.org/licenses/bsd-license.php.
+
+THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+
+**/
+
+#ifndef _PI_SMM_MEMORY_ATTRIBUTES_TABLE_H_
+#define _PI_SMM_MEMORY_ATTRIBUTES_TABLE_H_
+
+#define EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE_GUID {\
+  0x6b9fd3f7, 0x16df, 0x45e8, {0xbd, 0x39, 0xb9, 0x4a, 0x66, 0x54, 0x1a, 0x5d} \
+}
+
+//
+// The PI SMM memory attribute table contains the SMM memory map for SMM image.
+//
+// This table is installed to SMST as SMM configuration table.
+//
+// This table is published at gEfiSmmEndOfDxeProtocolGuid notification, because
+// there should be no more SMM driver loaded after that. The EfiRuntimeServicesCode
+// region should not be changed any more.
+//
+// This table is published, if and only if all SMM PE/COFF have aligned section
+// as specified in UEFI specification Section 2.3. For example, IA32/X64 alignment is 4KiB.
+//
+// If this table is published, the EfiRuntimeServicesCode contains code only
+// and it is EFI_MEMORY_RO; the EfiRuntimeServicesData contains data only
+// and it is EFI_MEMORY_XP.
+//
+typedef struct {
+  UINT32                Version;
+  UINT32                NumberOfEntries;
+  UINT32                DescriptorSize;
+  UINT32                Reserved;
+//EFI_MEMORY_DESCRIPTOR Entry[1];
+} EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE;
+
+#define EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE_VERSION  0x00000001
+
+extern EFI_GUID gEdkiiPiSmmMemoryAttributesTableGuid;
+
+#endif
-- 
2.7.4.windows.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
  2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
@ 2016-11-04  9:30 ` Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support Jiewen Yao
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04  9:30 UTC (permalink / raw)
  To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek

This table describes the SMM memory attributes.

Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
 MdeModulePkg/MdeModulePkg.dec | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/MdeModulePkg/MdeModulePkg.dec b/MdeModulePkg/MdeModulePkg.dec
index 74b8700..99a028f 100644
--- a/MdeModulePkg/MdeModulePkg.dec
+++ b/MdeModulePkg/MdeModulePkg.dec
@@ -355,6 +355,9 @@
   ## Include/Guid/PiSmmCommunicationRegionTable.h
   gEdkiiPiSmmCommunicationRegionTableGuid = { 0x4e28ca50, 0xd582, 0x44ac, {0xa1, 0x1f, 0xe3, 0xd5, 0x65, 0x26, 0xdb, 0x34}}
 
+  ## Include/Guid/PiSmmMemoryAttributesTable.h
+  gEdkiiPiSmmMemoryAttributesTableGuid = { 0x6b9fd3f7, 0x16df, 0x45e8, {0xbd, 0x39, 0xb9, 0x4a, 0x66, 0x54, 0x1a, 0x5d}}
+
 [Ppis]
   ## Include/Ppi/AtaController.h
   gPeiAtaControllerPpiGuid       = { 0xa45e60d1, 0xc719, 0x44aa, { 0xb0, 0x7a, 0xaa, 0x77, 0x7f, 0x85, 0x90, 0x6d }}
-- 
2.7.4.windows.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
  2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid Jiewen Yao
@ 2016-11-04  9:30 ` Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable Jiewen Yao
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04  9:30 UTC (permalink / raw)
  To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek

1) This patch installs LoadedImage protocol to SMM
protocol database, so that the SMM image info can be
got easily to construct the PiSmmMemoryAttributes table.

This table is produced at SmmEndOfDxe event.
So that the consumer (PiSmmCpu) may consult this table
to set memory attribute in page table.

Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
 MdeModulePkg/Core/PiSmmCore/Dispatcher.c            |   66 +
 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c | 1509 ++++++++++++++++++++
 MdeModulePkg/Core/PiSmmCore/Page.c                  |  775 +++++++++-
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.c             |   40 +
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.h             |   91 ++
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf           |    2 +
 MdeModulePkg/Core/PiSmmCore/Pool.c                  |   16 +
 7 files changed, 2473 insertions(+), 26 deletions(-)

diff --git a/MdeModulePkg/Core/PiSmmCore/Dispatcher.c b/MdeModulePkg/Core/PiSmmCore/Dispatcher.c
index 87f4617..1bddaf1 100644
--- a/MdeModulePkg/Core/PiSmmCore/Dispatcher.c
+++ b/MdeModulePkg/Core/PiSmmCore/Dispatcher.c
@@ -580,6 +580,11 @@ SmmLoadImage (
   DriverEntry->LoadedImage->SystemTable   = gST;
   DriverEntry->LoadedImage->DeviceHandle  = DeviceHandle;
 
+  DriverEntry->SmmLoadedImage.Revision     = EFI_LOADED_IMAGE_PROTOCOL_REVISION;
+  DriverEntry->SmmLoadedImage.ParentHandle = gSmmCorePrivate->SmmIplImageHandle;
+  DriverEntry->SmmLoadedImage.SystemTable  = gST;
+  DriverEntry->SmmLoadedImage.DeviceHandle = DeviceHandle;
+
   //
   // Make an EfiBootServicesData buffer copy of FilePath
   //
@@ -599,6 +604,25 @@ SmmLoadImage (
   DriverEntry->LoadedImage->ImageDataType = EfiRuntimeServicesData;
 
   //
+  // Make a buffer copy of FilePath
+  //
+  Status = SmmAllocatePool (EfiRuntimeServicesData, GetDevicePathSize(FilePath), (VOID **)&DriverEntry->SmmLoadedImage.FilePath);
+  if (EFI_ERROR (Status)) {
+    if (Buffer != NULL) {
+      gBS->FreePool (Buffer);
+    }
+    gBS->FreePool (DriverEntry->LoadedImage->FilePath);
+    SmmFreePages (DstBuffer, PageCount);
+    return Status;
+  }
+  CopyMem (DriverEntry->SmmLoadedImage.FilePath, FilePath, GetDevicePathSize(FilePath));
+
+  DriverEntry->SmmLoadedImage.ImageBase = (VOID *)(UINTN)DriverEntry->ImageBuffer;
+  DriverEntry->SmmLoadedImage.ImageSize = ImageContext.ImageSize;
+  DriverEntry->SmmLoadedImage.ImageCodeType = EfiRuntimeServicesCode;
+  DriverEntry->SmmLoadedImage.ImageDataType = EfiRuntimeServicesData;
+
+  //
   // Create a new image handle in the UEFI handle database for the SMM Driver
   //
   DriverEntry->ImageHandle = NULL;
@@ -608,6 +632,17 @@ SmmLoadImage (
                   NULL
                   );
 
+  //
+  // Create a new image handle in the SMM handle database for the SMM Driver
+  //
+  DriverEntry->SmmImageHandle = NULL;
+  Status = SmmInstallProtocolInterface (
+             &DriverEntry->SmmImageHandle,
+             &gEfiLoadedImageProtocolGuid,
+             EFI_NATIVE_INTERFACE,
+             &DriverEntry->SmmLoadedImage
+             );
+
   PERF_START (DriverEntry->ImageHandle, "LoadImage:", NULL, Tick);
   PERF_END (DriverEntry->ImageHandle, "LoadImage:", NULL, 0);
 
@@ -896,6 +931,16 @@ SmmDispatcher (
           }
           gBS->FreePool (DriverEntry->LoadedImage);
         }
+        Status = SmmUninstallProtocolInterface (
+                   DriverEntry->SmmImageHandle,
+                   &gEfiLoadedImageProtocolGuid,
+                   &DriverEntry->SmmLoadedImage
+                   );
+        if (!EFI_ERROR(Status)) {
+          if (DriverEntry->SmmLoadedImage.FilePath != NULL) {
+            SmmFreePool (DriverEntry->SmmLoadedImage.FilePath);
+          }
+        }
       }
 
       REPORT_STATUS_CODE_WITH_EXTENDED_DATA (
@@ -1327,6 +1372,27 @@ SmmDriverDispatchHandler (
 
               mSmmCoreLoadedImage->DeviceHandle = FvHandle;
             }
+            if (mSmmCoreDriverEntry->SmmLoadedImage.FilePath == NULL) {
+              //
+              // Maybe one special FV contains only one SMM_CORE module, so its device path must
+              // be initialized completely.
+              //
+              EfiInitializeFwVolDevicepathNode (&mFvDevicePath.File, &NameGuid);
+              SetDevicePathEndNode (&mFvDevicePath.End);
+
+              //
+              // Make a buffer copy FilePath
+              //
+              Status = SmmAllocatePool (
+                         EfiRuntimeServicesData,
+                         GetDevicePathSize ((EFI_DEVICE_PATH_PROTOCOL *)&mFvDevicePath),
+                         (VOID **)&mSmmCoreDriverEntry->SmmLoadedImage.FilePath
+                         );
+              ASSERT_EFI_ERROR (Status);
+              CopyMem (mSmmCoreDriverEntry->SmmLoadedImage.FilePath, &mFvDevicePath, GetDevicePathSize((EFI_DEVICE_PATH_PROTOCOL *)&mFvDevicePath));
+
+              mSmmCoreDriverEntry->SmmLoadedImage.DeviceHandle = FvHandle;
+            }
           } else {
             SmmAddToDriverList (Fv, FvHandle, &NameGuid);
           }
diff --git a/MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c b/MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
new file mode 100644
index 0000000..3a5a2c8
--- /dev/null
+++ b/MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
@@ -0,0 +1,1509 @@
+/** @file
+  PI SMM MemoryAttributes support
+
+Copyright (c) 2016, Intel Corporation. All rights reserved.<BR>
+This program and the accompanying materials
+are licensed and made available under the terms and conditions of the BSD License
+which accompanies this distribution.  The full text of the license may be found at
+http://opensource.org/licenses/bsd-license.php
+
+THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+
+**/
+
+#include <PiDxe.h>
+#include <Library/BaseLib.h>
+#include <Library/BaseMemoryLib.h>
+#include <Library/MemoryAllocationLib.h>
+#include <Library/UefiBootServicesTableLib.h>
+#include <Library/SmmServicesTableLib.h>
+#include <Library/DebugLib.h>
+#include <Library/PcdLib.h>
+
+#include <Library/PeCoffLib.h>
+#include <Library/PeCoffGetEntryPointLib.h>
+
+#include <Guid/PiSmmMemoryAttributesTable.h>
+
+#include "PiSmmCore.h"
+
+#define PREVIOUS_MEMORY_DESCRIPTOR(MemoryDescriptor, Size) \
+  ((EFI_MEMORY_DESCRIPTOR *)((UINT8 *)(MemoryDescriptor) - (Size)))
+
+#define IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE SIGNATURE_32 ('I','P','R','C')
+
+typedef struct {
+  UINT32                 Signature;
+  LIST_ENTRY             Link;
+  EFI_PHYSICAL_ADDRESS   CodeSegmentBase;
+  UINT64                 CodeSegmentSize;
+} IMAGE_PROPERTIES_RECORD_CODE_SECTION;
+
+#define IMAGE_PROPERTIES_RECORD_SIGNATURE SIGNATURE_32 ('I','P','R','D')
+
+typedef struct {
+  UINT32                 Signature;
+  LIST_ENTRY             Link;
+  EFI_PHYSICAL_ADDRESS   ImageBase;
+  UINT64                 ImageSize;
+  UINTN                  CodeSegmentCount;
+  LIST_ENTRY             CodeSegmentList;
+} IMAGE_PROPERTIES_RECORD;
+
+#define IMAGE_PROPERTIES_PRIVATE_DATA_SIGNATURE SIGNATURE_32 ('I','P','P','D')
+
+typedef struct {
+  UINT32                 Signature;
+  UINTN                  ImageRecordCount;
+  UINTN                  CodeSegmentCountMax;
+  LIST_ENTRY             ImageRecordList;
+} IMAGE_PROPERTIES_PRIVATE_DATA;
+
+IMAGE_PROPERTIES_PRIVATE_DATA  mImagePropertiesPrivateData = {
+  IMAGE_PROPERTIES_PRIVATE_DATA_SIGNATURE,
+  0,
+  0,
+  INITIALIZE_LIST_HEAD_VARIABLE (mImagePropertiesPrivateData.ImageRecordList)
+};
+
+#define EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA  BIT0
+
+UINT64 mMemoryProtectionAttribute = EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA;
+
+//
+// Below functions are for MemoryMap
+//
+
+/**
+  Converts a number of EFI_PAGEs to a size in bytes.
+
+  NOTE: Do not use EFI_PAGES_TO_SIZE because it handles UINTN only.
+
+  @param[in]  Pages     The number of EFI_PAGES.
+
+  @return  The number of bytes associated with the number of EFI_PAGEs specified
+           by Pages.
+**/
+STATIC
+UINT64
+EfiPagesToSize (
+  IN UINT64 Pages
+  )
+{
+  return LShiftU64 (Pages, EFI_PAGE_SHIFT);
+}
+
+/**
+  Converts a size, in bytes, to a number of EFI_PAGESs.
+
+  NOTE: Do not use EFI_SIZE_TO_PAGES because it handles UINTN only.
+
+  @param[in]  Size      A size in bytes.
+
+  @return  The number of EFI_PAGESs associated with the number of bytes specified
+           by Size.
+
+**/
+STATIC
+UINT64
+EfiSizeToPages (
+  IN UINT64 Size
+  )
+{
+  return RShiftU64 (Size, EFI_PAGE_SHIFT) + ((((UINTN)Size) & EFI_PAGE_MASK) ? 1 : 0);
+}
+
+/**
+  Check the consistency of Smm memory attributes table.
+
+  @param[in] MemoryAttributesTable  PI SMM memory attributes table
+**/
+VOID
+SmmMemoryAttributesTableConsistencyCheck (
+  IN EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE *MemoryAttributesTable
+  )
+{
+  EFI_MEMORY_DESCRIPTOR                     *MemoryMap;
+  UINTN                                     MemoryMapEntryCount;
+  UINTN                                     DescriptorSize;
+  UINTN                                     Index;
+  UINT64                                    Address;
+
+  Address = 0;
+  MemoryMapEntryCount = MemoryAttributesTable->NumberOfEntries;
+  DescriptorSize = MemoryAttributesTable->DescriptorSize;
+  MemoryMap = (EFI_MEMORY_DESCRIPTOR *)(MemoryAttributesTable + 1);
+  for (Index = 0; Index < MemoryMapEntryCount; Index++) {
+    if (Address != 0) {
+      ASSERT (Address == MemoryMap->PhysicalStart);
+    }
+    Address = MemoryMap->PhysicalStart + EFI_PAGES_TO_SIZE(MemoryMap->NumberOfPages);
+    MemoryMap = NEXT_MEMORY_DESCRIPTOR(MemoryMap, DescriptorSize);
+  }
+}
+
+/**
+  Sort memory map entries based upon PhysicalStart, from low to high.
+
+  @param[in]  MemoryMap              A pointer to the buffer in which firmware places
+                                 the current memory map.
+  @param[in]  MemoryMapSize          Size, in bytes, of the MemoryMap buffer.
+  @param[in]  DescriptorSize         Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+VOID
+SortMemoryMap (
+  IN OUT EFI_MEMORY_DESCRIPTOR  *MemoryMap,
+  IN UINTN                      MemoryMapSize,
+  IN UINTN                      DescriptorSize
+  )
+{
+  EFI_MEMORY_DESCRIPTOR       *MemoryMapEntry;
+  EFI_MEMORY_DESCRIPTOR       *NextMemoryMapEntry;
+  EFI_MEMORY_DESCRIPTOR       *MemoryMapEnd;
+  EFI_MEMORY_DESCRIPTOR       TempMemoryMap;
+
+  MemoryMapEntry = MemoryMap;
+  NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+  MemoryMapEnd = (EFI_MEMORY_DESCRIPTOR *) ((UINT8 *) MemoryMap + MemoryMapSize);
+  while (MemoryMapEntry < MemoryMapEnd) {
+    while (NextMemoryMapEntry < MemoryMapEnd) {
+      if (MemoryMapEntry->PhysicalStart > NextMemoryMapEntry->PhysicalStart) {
+        CopyMem (&TempMemoryMap, MemoryMapEntry, sizeof(EFI_MEMORY_DESCRIPTOR));
+        CopyMem (MemoryMapEntry, NextMemoryMapEntry, sizeof(EFI_MEMORY_DESCRIPTOR));
+        CopyMem (NextMemoryMapEntry, &TempMemoryMap, sizeof(EFI_MEMORY_DESCRIPTOR));
+      }
+
+      NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (NextMemoryMapEntry, DescriptorSize);
+    }
+
+    MemoryMapEntry      = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+    NextMemoryMapEntry  = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+  }
+
+  return ;
+}
+
+/**
+  Merge continous memory map entries whose have same attributes.
+
+  @param[in, out]  MemoryMap              A pointer to the buffer in which firmware places
+                                          the current memory map.
+  @param[in, out]  MemoryMapSize          A pointer to the size, in bytes, of the
+                                          MemoryMap buffer. On input, this is the size of
+                                          the current memory map.  On output,
+                                          it is the size of new memory map after merge.
+  @param[in]       DescriptorSize         Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+VOID
+MergeMemoryMap (
+  IN OUT EFI_MEMORY_DESCRIPTOR  *MemoryMap,
+  IN OUT UINTN                  *MemoryMapSize,
+  IN UINTN                      DescriptorSize
+  )
+{
+  EFI_MEMORY_DESCRIPTOR       *MemoryMapEntry;
+  EFI_MEMORY_DESCRIPTOR       *MemoryMapEnd;
+  UINT64                      MemoryBlockLength;
+  EFI_MEMORY_DESCRIPTOR       *NewMemoryMapEntry;
+  EFI_MEMORY_DESCRIPTOR       *NextMemoryMapEntry;
+
+  MemoryMapEntry = MemoryMap;
+  NewMemoryMapEntry = MemoryMap;
+  MemoryMapEnd = (EFI_MEMORY_DESCRIPTOR *) ((UINT8 *) MemoryMap + *MemoryMapSize);
+  while ((UINTN)MemoryMapEntry < (UINTN)MemoryMapEnd) {
+    CopyMem (NewMemoryMapEntry, MemoryMapEntry, sizeof(EFI_MEMORY_DESCRIPTOR));
+    NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+
+    do {
+      MemoryBlockLength = (UINT64) (EfiPagesToSize (MemoryMapEntry->NumberOfPages));
+      if (((UINTN)NextMemoryMapEntry < (UINTN)MemoryMapEnd) &&
+          (MemoryMapEntry->Type == NextMemoryMapEntry->Type) &&
+          (MemoryMapEntry->Attribute == NextMemoryMapEntry->Attribute) &&
+          ((MemoryMapEntry->PhysicalStart + MemoryBlockLength) == NextMemoryMapEntry->PhysicalStart)) {
+        MemoryMapEntry->NumberOfPages += NextMemoryMapEntry->NumberOfPages;
+        if (NewMemoryMapEntry != MemoryMapEntry) {
+          NewMemoryMapEntry->NumberOfPages += NextMemoryMapEntry->NumberOfPages;
+        }
+
+        NextMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (NextMemoryMapEntry, DescriptorSize);
+        continue;
+      } else {
+        MemoryMapEntry = PREVIOUS_MEMORY_DESCRIPTOR (NextMemoryMapEntry, DescriptorSize);
+        break;
+      }
+    } while (TRUE);
+
+    MemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+    NewMemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (NewMemoryMapEntry, DescriptorSize);
+  }
+
+  *MemoryMapSize = (UINTN)NewMemoryMapEntry - (UINTN)MemoryMap;
+
+  return ;
+}
+
+/**
+  Enforce memory map attributes.
+  This function will set EfiRuntimeServicesData/EfiMemoryMappedIO/EfiMemoryMappedIOPortSpace to be EFI_MEMORY_XP.
+
+  @param[in, out]  MemoryMap              A pointer to the buffer in which firmware places
+                                          the current memory map.
+  @param[in]       MemoryMapSize          Size, in bytes, of the MemoryMap buffer.
+  @param[in]       DescriptorSize         Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+VOID
+EnforceMemoryMapAttribute (
+  IN OUT EFI_MEMORY_DESCRIPTOR  *MemoryMap,
+  IN UINTN                      MemoryMapSize,
+  IN UINTN                      DescriptorSize
+  )
+{
+  EFI_MEMORY_DESCRIPTOR       *MemoryMapEntry;
+  EFI_MEMORY_DESCRIPTOR       *MemoryMapEnd;
+
+  MemoryMapEntry = MemoryMap;
+  MemoryMapEnd   = (EFI_MEMORY_DESCRIPTOR *) ((UINT8 *) MemoryMap + MemoryMapSize);
+  while ((UINTN)MemoryMapEntry < (UINTN)MemoryMapEnd) {
+    switch (MemoryMapEntry->Type) {
+    case EfiRuntimeServicesCode:
+      MemoryMapEntry->Attribute |= EFI_MEMORY_RO;
+      break;
+    case EfiRuntimeServicesData:
+      MemoryMapEntry->Attribute |= EFI_MEMORY_XP;
+      break;
+    }
+
+    MemoryMapEntry = NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, DescriptorSize);
+  }
+
+  return ;
+}
+
+/**
+  Return the first image record, whose [ImageBase, ImageSize] covered by [Buffer, Length].
+
+  @param[in] Buffer  Start Address
+  @param[in] Length  Address length
+
+  @return first image record covered by [buffer, length]
+**/
+STATIC
+IMAGE_PROPERTIES_RECORD *
+GetImageRecordByAddress (
+  IN EFI_PHYSICAL_ADDRESS  Buffer,
+  IN UINT64                Length
+  )
+{
+  IMAGE_PROPERTIES_RECORD    *ImageRecord;
+  LIST_ENTRY                 *ImageRecordLink;
+  LIST_ENTRY                 *ImageRecordList;
+
+  ImageRecordList = &mImagePropertiesPrivateData.ImageRecordList;
+
+  for (ImageRecordLink = ImageRecordList->ForwardLink;
+       ImageRecordLink != ImageRecordList;
+       ImageRecordLink = ImageRecordLink->ForwardLink) {
+    ImageRecord = CR (
+                    ImageRecordLink,
+                    IMAGE_PROPERTIES_RECORD,
+                    Link,
+                    IMAGE_PROPERTIES_RECORD_SIGNATURE
+                    );
+
+    if ((Buffer <= ImageRecord->ImageBase) &&
+        (Buffer + Length >= ImageRecord->ImageBase + ImageRecord->ImageSize)) {
+      return ImageRecord;
+    }
+  }
+
+  return NULL;
+}
+
+/**
+  Set the memory map to new entries, according to one old entry,
+  based upon PE code section and data section in image record
+
+  @param[in]       ImageRecord            An image record whose [ImageBase, ImageSize] covered
+                                          by old memory map entry.
+  @param[in, out]  NewRecord              A pointer to several new memory map entries.
+                                          The caller gurantee the buffer size be 1 +
+                                          (SplitRecordCount * DescriptorSize) calculated
+                                          below.
+  @param[in]       OldRecord              A pointer to one old memory map entry.
+  @param[in]       DescriptorSize         Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+UINTN
+SetNewRecord (
+  IN IMAGE_PROPERTIES_RECORD       *ImageRecord,
+  IN OUT EFI_MEMORY_DESCRIPTOR     *NewRecord,
+  IN EFI_MEMORY_DESCRIPTOR         *OldRecord,
+  IN UINTN                         DescriptorSize
+  )
+{
+  EFI_MEMORY_DESCRIPTOR                     TempRecord;
+  IMAGE_PROPERTIES_RECORD_CODE_SECTION      *ImageRecordCodeSection;
+  LIST_ENTRY                                *ImageRecordCodeSectionLink;
+  LIST_ENTRY                                *ImageRecordCodeSectionEndLink;
+  LIST_ENTRY                                *ImageRecordCodeSectionList;
+  UINTN                                     NewRecordCount;
+  UINT64                                    PhysicalEnd;
+  UINT64                                    ImageEnd;
+
+  CopyMem (&TempRecord, OldRecord, sizeof(EFI_MEMORY_DESCRIPTOR));
+  PhysicalEnd = TempRecord.PhysicalStart + EfiPagesToSize(TempRecord.NumberOfPages);
+  NewRecordCount = 0;
+
+  ImageRecordCodeSectionList = &ImageRecord->CodeSegmentList;
+
+  ImageRecordCodeSectionLink = ImageRecordCodeSectionList->ForwardLink;
+  ImageRecordCodeSectionEndLink = ImageRecordCodeSectionList;
+  while (ImageRecordCodeSectionLink != ImageRecordCodeSectionEndLink) {
+    ImageRecordCodeSection = CR (
+                               ImageRecordCodeSectionLink,
+                               IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+                               Link,
+                               IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+                               );
+    ImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+
+    if (TempRecord.PhysicalStart <= ImageRecordCodeSection->CodeSegmentBase) {
+      //
+      // DATA
+      //
+      NewRecord->Type = EfiRuntimeServicesData;
+      NewRecord->PhysicalStart = TempRecord.PhysicalStart;
+      NewRecord->VirtualStart  = 0;
+      NewRecord->NumberOfPages = EfiSizeToPages(ImageRecordCodeSection->CodeSegmentBase - NewRecord->PhysicalStart);
+      NewRecord->Attribute     = TempRecord.Attribute | EFI_MEMORY_XP;
+      if (NewRecord->NumberOfPages != 0) {
+        NewRecord = NEXT_MEMORY_DESCRIPTOR (NewRecord, DescriptorSize);
+        NewRecordCount ++;
+      }
+
+      //
+      // CODE
+      //
+      NewRecord->Type = EfiRuntimeServicesCode;
+      NewRecord->PhysicalStart = ImageRecordCodeSection->CodeSegmentBase;
+      NewRecord->VirtualStart  = 0;
+      NewRecord->NumberOfPages = EfiSizeToPages(ImageRecordCodeSection->CodeSegmentSize);
+      NewRecord->Attribute     = (TempRecord.Attribute & (~EFI_MEMORY_XP)) | EFI_MEMORY_RO;
+      if (NewRecord->NumberOfPages != 0) {
+        NewRecord = NEXT_MEMORY_DESCRIPTOR (NewRecord, DescriptorSize);
+        NewRecordCount ++;
+      }
+
+      TempRecord.PhysicalStart = ImageRecordCodeSection->CodeSegmentBase + EfiPagesToSize (EfiSizeToPages(ImageRecordCodeSection->CodeSegmentSize));
+      TempRecord.NumberOfPages = EfiSizeToPages(PhysicalEnd - TempRecord.PhysicalStart);
+      if (TempRecord.NumberOfPages == 0) {
+        break;
+      }
+    }
+  }
+
+  ImageEnd = ImageRecord->ImageBase + ImageRecord->ImageSize;
+
+  //
+  // Final DATA
+  //
+  if (TempRecord.PhysicalStart < ImageEnd) {
+    NewRecord->Type = EfiRuntimeServicesData;
+    NewRecord->PhysicalStart = TempRecord.PhysicalStart;
+    NewRecord->VirtualStart  = 0;
+    NewRecord->NumberOfPages = EfiSizeToPages (ImageEnd - TempRecord.PhysicalStart);
+    NewRecord->Attribute     = TempRecord.Attribute | EFI_MEMORY_XP;
+    NewRecordCount ++;
+  }
+
+  return NewRecordCount;
+}
+
+/**
+  Return the max number of new splitted entries, according to one old entry,
+  based upon PE code section and data section.
+
+  @param[in]  OldRecord              A pointer to one old memory map entry.
+
+  @retval  0 no entry need to be splitted.
+  @return  the max number of new splitted entries
+**/
+STATIC
+UINTN
+GetMaxSplitRecordCount (
+  IN EFI_MEMORY_DESCRIPTOR *OldRecord
+  )
+{
+  IMAGE_PROPERTIES_RECORD *ImageRecord;
+  UINTN                   SplitRecordCount;
+  UINT64                  PhysicalStart;
+  UINT64                  PhysicalEnd;
+
+  SplitRecordCount = 0;
+  PhysicalStart = OldRecord->PhysicalStart;
+  PhysicalEnd = OldRecord->PhysicalStart + EfiPagesToSize(OldRecord->NumberOfPages);
+
+  do {
+    ImageRecord = GetImageRecordByAddress (PhysicalStart, PhysicalEnd - PhysicalStart);
+    if (ImageRecord == NULL) {
+      break;
+    }
+    SplitRecordCount += (2 * ImageRecord->CodeSegmentCount + 1);
+    PhysicalStart = ImageRecord->ImageBase + ImageRecord->ImageSize;
+  } while ((ImageRecord != NULL) && (PhysicalStart < PhysicalEnd));
+
+  if (SplitRecordCount != 0) {
+    SplitRecordCount--;
+  }
+
+  return SplitRecordCount;
+}
+
+/**
+  Split the memory map to new entries, according to one old entry,
+  based upon PE code section and data section.
+
+  @param[in]       OldRecord              A pointer to one old memory map entry.
+  @param[in, out]  NewRecord              A pointer to several new memory map entries.
+                                          The caller gurantee the buffer size be 1 +
+                                          (SplitRecordCount * DescriptorSize) calculated
+                                          below.
+  @param[in]       MaxSplitRecordCount    The max number of splitted entries
+  @param[in]       DescriptorSize         Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+
+  @retval  0 no entry is splitted.
+  @return  the real number of splitted record.
+**/
+STATIC
+UINTN
+SplitRecord (
+  IN EFI_MEMORY_DESCRIPTOR     *OldRecord,
+  IN OUT EFI_MEMORY_DESCRIPTOR *NewRecord,
+  IN UINTN                     MaxSplitRecordCount,
+  IN UINTN                     DescriptorSize
+  )
+{
+  EFI_MEMORY_DESCRIPTOR   TempRecord;
+  IMAGE_PROPERTIES_RECORD *ImageRecord;
+  IMAGE_PROPERTIES_RECORD *NewImageRecord;
+  UINT64                  PhysicalStart;
+  UINT64                  PhysicalEnd;
+  UINTN                   NewRecordCount;
+  UINTN                   TotalNewRecordCount;
+
+  if (MaxSplitRecordCount == 0) {
+    CopyMem (NewRecord, OldRecord, DescriptorSize);
+    return 0;
+  }
+
+  TotalNewRecordCount = 0;
+
+  //
+  // Override previous record
+  //
+  CopyMem (&TempRecord, OldRecord, sizeof(EFI_MEMORY_DESCRIPTOR));
+  PhysicalStart = TempRecord.PhysicalStart;
+  PhysicalEnd = TempRecord.PhysicalStart + EfiPagesToSize(TempRecord.NumberOfPages);
+
+  ImageRecord = NULL;
+  do {
+    NewImageRecord = GetImageRecordByAddress (PhysicalStart, PhysicalEnd - PhysicalStart);
+    if (NewImageRecord == NULL) {
+      //
+      // No more image covered by this range, stop
+      //
+      if ((PhysicalEnd > PhysicalStart) && (ImageRecord != NULL)) {
+        //
+        // If this is still address in this record, need record.
+        //
+        NewRecord = PREVIOUS_MEMORY_DESCRIPTOR (NewRecord, DescriptorSize);
+        if (NewRecord->Type == EfiRuntimeServicesData) {
+          //
+          // Last record is DATA, just merge it.
+          //
+          NewRecord->NumberOfPages = EfiSizeToPages(PhysicalEnd - NewRecord->PhysicalStart);
+        } else {
+          //
+          // Last record is CODE, create a new DATA entry.
+          //
+          NewRecord = NEXT_MEMORY_DESCRIPTOR (NewRecord, DescriptorSize);
+          NewRecord->Type = EfiRuntimeServicesData;
+          NewRecord->PhysicalStart = TempRecord.PhysicalStart;
+          NewRecord->VirtualStart  = 0;
+          NewRecord->NumberOfPages = TempRecord.NumberOfPages;
+          NewRecord->Attribute     = TempRecord.Attribute | EFI_MEMORY_XP;
+          TotalNewRecordCount ++;
+        }
+      }
+      break;
+    }
+    ImageRecord = NewImageRecord;
+
+    //
+    // Set new record
+    //
+    NewRecordCount = SetNewRecord (ImageRecord, NewRecord, &TempRecord, DescriptorSize);
+    TotalNewRecordCount += NewRecordCount;
+    NewRecord = (EFI_MEMORY_DESCRIPTOR *)((UINT8 *)NewRecord + NewRecordCount * DescriptorSize);
+
+    //
+    // Update PhysicalStart, in order to exclude the image buffer already splitted.
+    //
+    PhysicalStart = ImageRecord->ImageBase + ImageRecord->ImageSize;
+    TempRecord.PhysicalStart = PhysicalStart;
+    TempRecord.NumberOfPages = EfiSizeToPages (PhysicalEnd - PhysicalStart);
+  } while ((ImageRecord != NULL) && (PhysicalStart < PhysicalEnd));
+
+  return TotalNewRecordCount - 1;
+}
+
+/**
+  Split the original memory map, and add more entries to describe PE code section and data section.
+  This function will set EfiRuntimeServicesData to be EFI_MEMORY_XP.
+  This function will merge entries with same attributes finally.
+
+  NOTE: It assumes PE code/data section are page aligned.
+  NOTE: It assumes enough entry is prepared for new memory map.
+
+  Split table:
+   +---------------+
+   | Record X      |
+   +---------------+
+   | Record RtCode |
+   +---------------+
+   | Record Y      |
+   +---------------+
+   ==>
+   +---------------+
+   | Record X      |
+   +---------------+ ----
+   | Record RtData |     |
+   +---------------+     |
+   | Record RtCode |     |-> PE/COFF1
+   +---------------+     |
+   | Record RtData |     |
+   +---------------+ ----
+   | Record RtData |     |
+   +---------------+     |
+   | Record RtCode |     |-> PE/COFF2
+   +---------------+     |
+   | Record RtData |     |
+   +---------------+ ----
+   | Record Y      |
+   +---------------+
+
+  @param[in, out]  MemoryMapSize          A pointer to the size, in bytes, of the
+                                          MemoryMap buffer. On input, this is the size of
+                                          old MemoryMap before split. The actual buffer
+                                          size of MemoryMap is MemoryMapSize +
+                                          (AdditionalRecordCount * DescriptorSize) calculated
+                                          below. On output, it is the size of new MemoryMap
+                                          after split.
+  @param[in, out]  MemoryMap              A pointer to the buffer in which firmware places
+                                          the current memory map.
+  @param[in]       DescriptorSize         Size, in bytes, of an individual EFI_MEMORY_DESCRIPTOR.
+**/
+STATIC
+VOID
+SplitTable (
+  IN OUT UINTN                  *MemoryMapSize,
+  IN OUT EFI_MEMORY_DESCRIPTOR  *MemoryMap,
+  IN UINTN                      DescriptorSize
+  )
+{
+  INTN        IndexOld;
+  INTN        IndexNew;
+  UINTN       MaxSplitRecordCount;
+  UINTN       RealSplitRecordCount;
+  UINTN       TotalSplitRecordCount;
+  UINTN       AdditionalRecordCount;
+
+  AdditionalRecordCount = (2 * mImagePropertiesPrivateData.CodeSegmentCountMax + 1) * mImagePropertiesPrivateData.ImageRecordCount;
+
+  TotalSplitRecordCount = 0;
+  //
+  // Let old record point to end of valid MemoryMap buffer.
+  //
+  IndexOld = ((*MemoryMapSize) / DescriptorSize) - 1;
+  //
+  // Let new record point to end of full MemoryMap buffer.
+  //
+  IndexNew = ((*MemoryMapSize) / DescriptorSize) - 1 + AdditionalRecordCount;
+  for (; IndexOld >= 0; IndexOld--) {
+    MaxSplitRecordCount = GetMaxSplitRecordCount ((EFI_MEMORY_DESCRIPTOR *)((UINT8 *)MemoryMap + IndexOld * DescriptorSize));
+    //
+    // Split this MemoryMap record
+    //
+    IndexNew -= MaxSplitRecordCount;
+    RealSplitRecordCount = SplitRecord (
+                             (EFI_MEMORY_DESCRIPTOR *)((UINT8 *)MemoryMap + IndexOld * DescriptorSize),
+                             (EFI_MEMORY_DESCRIPTOR *)((UINT8 *)MemoryMap + IndexNew * DescriptorSize),
+                             MaxSplitRecordCount,
+                             DescriptorSize
+                             );
+    //
+    // Adjust IndexNew according to real split.
+    //
+    CopyMem (
+      ((UINT8 *)MemoryMap + (IndexNew + MaxSplitRecordCount - RealSplitRecordCount) * DescriptorSize),
+      ((UINT8 *)MemoryMap + IndexNew * DescriptorSize),
+      RealSplitRecordCount * DescriptorSize
+      );
+    IndexNew = IndexNew + MaxSplitRecordCount - RealSplitRecordCount;
+    TotalSplitRecordCount += RealSplitRecordCount;
+    IndexNew --;
+  }
+  //
+  // Move all records to the beginning.
+  //
+  CopyMem (
+    MemoryMap,
+    (UINT8 *)MemoryMap + (AdditionalRecordCount - TotalSplitRecordCount) * DescriptorSize,
+    (*MemoryMapSize) + TotalSplitRecordCount * DescriptorSize
+    );
+
+  *MemoryMapSize = (*MemoryMapSize) + DescriptorSize * TotalSplitRecordCount;
+
+  //
+  // Sort from low to high (Just in case)
+  //
+  SortMemoryMap (MemoryMap, *MemoryMapSize, DescriptorSize);
+
+  //
+  // Set RuntimeData to XP
+  //
+  EnforceMemoryMapAttribute (MemoryMap, *MemoryMapSize, DescriptorSize);
+
+  //
+  // Merge same type to save entry size
+  //
+  MergeMemoryMap (MemoryMap, MemoryMapSize, DescriptorSize);
+
+  return ;
+}
+
+/**
+  This function for GetMemoryMap() with memory attributes table.
+
+  It calls original GetMemoryMap() to get the original memory map information. Then
+  plus the additional memory map entries for PE Code/Data seperation.
+
+  @param[in, out]  MemoryMapSize          A pointer to the size, in bytes, of the
+                                          MemoryMap buffer. On input, this is the size of
+                                          the buffer allocated by the caller.  On output,
+                                          it is the size of the buffer returned by the
+                                          firmware  if the buffer was large enough, or the
+                                          size of the buffer needed  to contain the map if
+                                          the buffer was too small.
+  @param[in, out]  MemoryMap              A pointer to the buffer in which firmware places
+                                          the current memory map.
+  @param[out]      MapKey                 A pointer to the location in which firmware
+                                          returns the key for the current memory map.
+  @param[out]      DescriptorSize         A pointer to the location in which firmware
+                                          returns the size, in bytes, of an individual
+                                          EFI_MEMORY_DESCRIPTOR.
+  @param[out]      DescriptorVersion      A pointer to the location in which firmware
+                                          returns the version number associated with the
+                                          EFI_MEMORY_DESCRIPTOR.
+
+  @retval EFI_SUCCESS            The memory map was returned in the MemoryMap
+                                 buffer.
+  @retval EFI_BUFFER_TOO_SMALL   The MemoryMap buffer was too small. The current
+                                 buffer size needed to hold the memory map is
+                                 returned in MemoryMapSize.
+  @retval EFI_INVALID_PARAMETER  One of the parameters has an invalid value.
+
+**/
+STATIC
+EFI_STATUS
+EFIAPI
+SmmCoreGetMemoryMapMemoryAttributesTable (
+  IN OUT UINTN                  *MemoryMapSize,
+  IN OUT EFI_MEMORY_DESCRIPTOR  *MemoryMap,
+  OUT UINTN                     *MapKey,
+  OUT UINTN                     *DescriptorSize,
+  OUT UINT32                    *DescriptorVersion
+  )
+{
+  EFI_STATUS  Status;
+  UINTN       OldMemoryMapSize;
+  UINTN       AdditionalRecordCount;
+
+  //
+  // If PE code/data is not aligned, just return.
+  //
+  if ((mMemoryProtectionAttribute & EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA) == 0) {
+    return SmmCoreGetMemoryMap (MemoryMapSize, MemoryMap, MapKey, DescriptorSize, DescriptorVersion);
+  }
+
+  if (MemoryMapSize == NULL) {
+    return EFI_INVALID_PARAMETER;
+  }
+
+  AdditionalRecordCount = (2 * mImagePropertiesPrivateData.CodeSegmentCountMax + 1) * mImagePropertiesPrivateData.ImageRecordCount;
+
+  OldMemoryMapSize = *MemoryMapSize;
+  Status = SmmCoreGetMemoryMap (MemoryMapSize, MemoryMap, MapKey, DescriptorSize, DescriptorVersion);
+  if (Status == EFI_BUFFER_TOO_SMALL) {
+    *MemoryMapSize = *MemoryMapSize + (*DescriptorSize) * AdditionalRecordCount;
+  } else if (Status == EFI_SUCCESS) {
+    if (OldMemoryMapSize - *MemoryMapSize < (*DescriptorSize) * AdditionalRecordCount) {
+      *MemoryMapSize = *MemoryMapSize + (*DescriptorSize) * AdditionalRecordCount;
+      //
+      // Need update status to buffer too small
+      //
+      Status = EFI_BUFFER_TOO_SMALL;
+    } else {
+      //
+      // Split PE code/data
+      //
+      ASSERT(MemoryMap != NULL);
+      SplitTable (MemoryMapSize, MemoryMap, *DescriptorSize);
+    }
+  }
+
+  return Status;
+}
+
+//
+// Below functions are for ImageRecord
+//
+
+/**
+  Set MemoryProtectionAttribute accroding to PE/COFF image section alignment.
+
+  @param[in]  SectionAlignment    PE/COFF section alignment
+**/
+STATIC
+VOID
+SetMemoryAttributesTableSectionAlignment (
+  IN UINT32  SectionAlignment
+  )
+{
+  if (((SectionAlignment & (EFI_ACPI_RUNTIME_PAGE_ALLOCATION_ALIGNMENT - 1)) != 0) &&
+      ((mMemoryProtectionAttribute & EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA) != 0)) {
+    DEBUG ((DEBUG_VERBOSE, "SMM SetMemoryAttributesTableSectionAlignment - Clear\n"));
+    mMemoryProtectionAttribute &= ~((UINT64)EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA);
+  }
+}
+
+/**
+  Swap two code sections in image record.
+
+  @param[in]  FirstImageRecordCodeSection    first code section in image record
+  @param[in]  SecondImageRecordCodeSection   second code section in image record
+**/
+STATIC
+VOID
+SwapImageRecordCodeSection (
+  IN IMAGE_PROPERTIES_RECORD_CODE_SECTION      *FirstImageRecordCodeSection,
+  IN IMAGE_PROPERTIES_RECORD_CODE_SECTION      *SecondImageRecordCodeSection
+  )
+{
+  IMAGE_PROPERTIES_RECORD_CODE_SECTION      TempImageRecordCodeSection;
+
+  TempImageRecordCodeSection.CodeSegmentBase = FirstImageRecordCodeSection->CodeSegmentBase;
+  TempImageRecordCodeSection.CodeSegmentSize = FirstImageRecordCodeSection->CodeSegmentSize;
+
+  FirstImageRecordCodeSection->CodeSegmentBase = SecondImageRecordCodeSection->CodeSegmentBase;
+  FirstImageRecordCodeSection->CodeSegmentSize = SecondImageRecordCodeSection->CodeSegmentSize;
+
+  SecondImageRecordCodeSection->CodeSegmentBase = TempImageRecordCodeSection.CodeSegmentBase;
+  SecondImageRecordCodeSection->CodeSegmentSize = TempImageRecordCodeSection.CodeSegmentSize;
+}
+
+/**
+  Sort code section in image record, based upon CodeSegmentBase from low to high.
+
+  @param[in]  ImageRecord    image record to be sorted
+**/
+STATIC
+VOID
+SortImageRecordCodeSection (
+  IN IMAGE_PROPERTIES_RECORD              *ImageRecord
+  )
+{
+  IMAGE_PROPERTIES_RECORD_CODE_SECTION      *ImageRecordCodeSection;
+  IMAGE_PROPERTIES_RECORD_CODE_SECTION      *NextImageRecordCodeSection;
+  LIST_ENTRY                                *ImageRecordCodeSectionLink;
+  LIST_ENTRY                                *NextImageRecordCodeSectionLink;
+  LIST_ENTRY                                *ImageRecordCodeSectionEndLink;
+  LIST_ENTRY                                *ImageRecordCodeSectionList;
+
+  ImageRecordCodeSectionList = &ImageRecord->CodeSegmentList;
+
+  ImageRecordCodeSectionLink = ImageRecordCodeSectionList->ForwardLink;
+  NextImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+  ImageRecordCodeSectionEndLink = ImageRecordCodeSectionList;
+  while (ImageRecordCodeSectionLink != ImageRecordCodeSectionEndLink) {
+    ImageRecordCodeSection = CR (
+                               ImageRecordCodeSectionLink,
+                               IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+                               Link,
+                               IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+                               );
+    while (NextImageRecordCodeSectionLink != ImageRecordCodeSectionEndLink) {
+      NextImageRecordCodeSection = CR (
+                                     NextImageRecordCodeSectionLink,
+                                     IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+                                     Link,
+                                     IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+                                     );
+      if (ImageRecordCodeSection->CodeSegmentBase > NextImageRecordCodeSection->CodeSegmentBase) {
+        SwapImageRecordCodeSection (ImageRecordCodeSection, NextImageRecordCodeSection);
+      }
+      NextImageRecordCodeSectionLink = NextImageRecordCodeSectionLink->ForwardLink;
+    }
+
+    ImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+    NextImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+  }
+}
+
+/**
+  Check if code section in image record is valid.
+
+  @param[in]  ImageRecord    image record to be checked
+
+  @retval TRUE  image record is valid
+  @retval FALSE image record is invalid
+**/
+STATIC
+BOOLEAN
+IsImageRecordCodeSectionValid (
+  IN IMAGE_PROPERTIES_RECORD              *ImageRecord
+  )
+{
+  IMAGE_PROPERTIES_RECORD_CODE_SECTION      *ImageRecordCodeSection;
+  IMAGE_PROPERTIES_RECORD_CODE_SECTION      *LastImageRecordCodeSection;
+  LIST_ENTRY                                *ImageRecordCodeSectionLink;
+  LIST_ENTRY                                *ImageRecordCodeSectionEndLink;
+  LIST_ENTRY                                *ImageRecordCodeSectionList;
+
+  DEBUG ((DEBUG_VERBOSE, "SMM ImageCode SegmentCount - 0x%x\n", ImageRecord->CodeSegmentCount));
+
+  ImageRecordCodeSectionList = &ImageRecord->CodeSegmentList;
+
+  ImageRecordCodeSectionLink = ImageRecordCodeSectionList->ForwardLink;
+  ImageRecordCodeSectionEndLink = ImageRecordCodeSectionList;
+  LastImageRecordCodeSection = NULL;
+  while (ImageRecordCodeSectionLink != ImageRecordCodeSectionEndLink) {
+    ImageRecordCodeSection = CR (
+                               ImageRecordCodeSectionLink,
+                               IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+                               Link,
+                               IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+                               );
+    if (ImageRecordCodeSection->CodeSegmentSize == 0) {
+      return FALSE;
+    }
+    if (ImageRecordCodeSection->CodeSegmentBase < ImageRecord->ImageBase) {
+      return FALSE;
+    }
+    if (ImageRecordCodeSection->CodeSegmentBase >= MAX_ADDRESS - ImageRecordCodeSection->CodeSegmentSize) {
+      return FALSE;
+    }
+    if ((ImageRecordCodeSection->CodeSegmentBase + ImageRecordCodeSection->CodeSegmentSize) > (ImageRecord->ImageBase + ImageRecord->ImageSize)) {
+      return FALSE;
+    }
+    if (LastImageRecordCodeSection != NULL) {
+      if ((LastImageRecordCodeSection->CodeSegmentBase + LastImageRecordCodeSection->CodeSegmentSize) > ImageRecordCodeSection->CodeSegmentBase) {
+        return FALSE;
+      }
+    }
+
+    LastImageRecordCodeSection = ImageRecordCodeSection;
+    ImageRecordCodeSectionLink = ImageRecordCodeSectionLink->ForwardLink;
+  }
+
+  return TRUE;
+}
+
+/**
+  Swap two image records.
+
+  @param[in]  FirstImageRecord   first image record.
+  @param[in]  SecondImageRecord  second image record.
+**/
+STATIC
+VOID
+SwapImageRecord (
+  IN IMAGE_PROPERTIES_RECORD      *FirstImageRecord,
+  IN IMAGE_PROPERTIES_RECORD      *SecondImageRecord
+  )
+{
+  IMAGE_PROPERTIES_RECORD      TempImageRecord;
+
+  TempImageRecord.ImageBase = FirstImageRecord->ImageBase;
+  TempImageRecord.ImageSize = FirstImageRecord->ImageSize;
+  TempImageRecord.CodeSegmentCount = FirstImageRecord->CodeSegmentCount;
+
+  FirstImageRecord->ImageBase = SecondImageRecord->ImageBase;
+  FirstImageRecord->ImageSize = SecondImageRecord->ImageSize;
+  FirstImageRecord->CodeSegmentCount = SecondImageRecord->CodeSegmentCount;
+
+  SecondImageRecord->ImageBase = TempImageRecord.ImageBase;
+  SecondImageRecord->ImageSize = TempImageRecord.ImageSize;
+  SecondImageRecord->CodeSegmentCount = TempImageRecord.CodeSegmentCount;
+
+  SwapListEntries (&FirstImageRecord->CodeSegmentList, &SecondImageRecord->CodeSegmentList);
+}
+
+/**
+  Sort image record based upon the ImageBase from low to high.
+**/
+STATIC
+VOID
+SortImageRecord (
+  VOID
+  )
+{
+  IMAGE_PROPERTIES_RECORD      *ImageRecord;
+  IMAGE_PROPERTIES_RECORD      *NextImageRecord;
+  LIST_ENTRY                   *ImageRecordLink;
+  LIST_ENTRY                   *NextImageRecordLink;
+  LIST_ENTRY                   *ImageRecordEndLink;
+  LIST_ENTRY                   *ImageRecordList;
+
+  ImageRecordList = &mImagePropertiesPrivateData.ImageRecordList;
+
+  ImageRecordLink = ImageRecordList->ForwardLink;
+  NextImageRecordLink = ImageRecordLink->ForwardLink;
+  ImageRecordEndLink = ImageRecordList;
+  while (ImageRecordLink != ImageRecordEndLink) {
+    ImageRecord = CR (
+                    ImageRecordLink,
+                    IMAGE_PROPERTIES_RECORD,
+                    Link,
+                    IMAGE_PROPERTIES_RECORD_SIGNATURE
+                    );
+    while (NextImageRecordLink != ImageRecordEndLink) {
+      NextImageRecord = CR (
+                          NextImageRecordLink,
+                          IMAGE_PROPERTIES_RECORD,
+                          Link,
+                          IMAGE_PROPERTIES_RECORD_SIGNATURE
+                          );
+      if (ImageRecord->ImageBase > NextImageRecord->ImageBase) {
+        SwapImageRecord (ImageRecord, NextImageRecord);
+      }
+      NextImageRecordLink = NextImageRecordLink->ForwardLink;
+    }
+
+    ImageRecordLink = ImageRecordLink->ForwardLink;
+    NextImageRecordLink = ImageRecordLink->ForwardLink;
+  }
+}
+
+/**
+  Dump image record.
+**/
+STATIC
+VOID
+DumpImageRecord (
+  VOID
+  )
+{
+  IMAGE_PROPERTIES_RECORD      *ImageRecord;
+  LIST_ENTRY                   *ImageRecordLink;
+  LIST_ENTRY                   *ImageRecordList;
+  UINTN                        Index;
+
+  ImageRecordList = &mImagePropertiesPrivateData.ImageRecordList;
+
+  for (ImageRecordLink = ImageRecordList->ForwardLink, Index= 0;
+       ImageRecordLink != ImageRecordList;
+       ImageRecordLink = ImageRecordLink->ForwardLink, Index++) {
+    ImageRecord = CR (
+                    ImageRecordLink,
+                    IMAGE_PROPERTIES_RECORD,
+                    Link,
+                    IMAGE_PROPERTIES_RECORD_SIGNATURE
+                    );
+    DEBUG ((DEBUG_VERBOSE, "SMM  Image[%d]: 0x%016lx - 0x%016lx\n", Index, ImageRecord->ImageBase, ImageRecord->ImageSize));
+  }
+}
+
+/**
+  Insert image record.
+
+  @param[in]  DriverEntry    Driver information
+**/
+VOID
+SmmInsertImageRecord (
+  IN EFI_SMM_DRIVER_ENTRY  *DriverEntry
+  )
+{
+  VOID                                 *ImageAddress;
+  EFI_IMAGE_DOS_HEADER                 *DosHdr;
+  UINT32                               PeCoffHeaderOffset;
+  UINT32                               SectionAlignment;
+  EFI_IMAGE_SECTION_HEADER             *Section;
+  EFI_IMAGE_OPTIONAL_HEADER_PTR_UNION  Hdr;
+  UINT8                                *Name;
+  UINTN                                Index;
+  IMAGE_PROPERTIES_RECORD              *ImageRecord;
+  CHAR8                                *PdbPointer;
+  IMAGE_PROPERTIES_RECORD_CODE_SECTION *ImageRecordCodeSection;
+  UINT16                               Magic;
+
+  DEBUG ((DEBUG_VERBOSE, "SMM InsertImageRecord - 0x%x\n", DriverEntry));
+  DEBUG ((DEBUG_VERBOSE, "SMM InsertImageRecord - 0x%016lx - 0x%08x\n", DriverEntry->ImageBuffer, DriverEntry->NumberOfPage));
+
+  ImageRecord = AllocatePool (sizeof(*ImageRecord));
+  if (ImageRecord == NULL) {
+    return ;
+  }
+  ImageRecord->Signature = IMAGE_PROPERTIES_RECORD_SIGNATURE;
+
+  DEBUG ((DEBUG_VERBOSE, "SMM ImageRecordCount - 0x%x\n", mImagePropertiesPrivateData.ImageRecordCount));
+
+  //
+  // Step 1: record whole region
+  //
+  ImageRecord->ImageBase = DriverEntry->ImageBuffer;
+  ImageRecord->ImageSize = EFI_PAGES_TO_SIZE(DriverEntry->NumberOfPage);
+
+  ImageAddress = (VOID *)(UINTN)DriverEntry->ImageBuffer;
+
+  PdbPointer = PeCoffLoaderGetPdbPointer ((VOID*) (UINTN) ImageAddress);
+  if (PdbPointer != NULL) {
+    DEBUG ((DEBUG_VERBOSE, "SMM   Image - %a\n", PdbPointer));
+  }
+
+  //
+  // Check PE/COFF image
+  //
+  DosHdr = (EFI_IMAGE_DOS_HEADER *) (UINTN) ImageAddress;
+  PeCoffHeaderOffset = 0;
+  if (DosHdr->e_magic == EFI_IMAGE_DOS_SIGNATURE) {
+    PeCoffHeaderOffset = DosHdr->e_lfanew;
+  }
+
+  Hdr.Pe32 = (EFI_IMAGE_NT_HEADERS32 *)((UINT8 *) (UINTN) ImageAddress + PeCoffHeaderOffset);
+  if (Hdr.Pe32->Signature != EFI_IMAGE_NT_SIGNATURE) {
+    DEBUG ((DEBUG_VERBOSE, "SMM Hdr.Pe32->Signature invalid - 0x%x\n", Hdr.Pe32->Signature));
+    goto Finish;
+  }
+
+  //
+  // Get SectionAlignment
+  //
+  if (Hdr.Pe32->FileHeader.Machine == IMAGE_FILE_MACHINE_IA64 && Hdr.Pe32->OptionalHeader.Magic == EFI_IMAGE_NT_OPTIONAL_HDR32_MAGIC) {
+    //
+    // NOTE: Some versions of Linux ELILO for Itanium have an incorrect magic value
+    //       in the PE/COFF Header. If the MachineType is Itanium(IA64) and the
+    //       Magic value in the OptionalHeader is EFI_IMAGE_NT_OPTIONAL_HDR32_MAGIC
+    //       then override the magic value to EFI_IMAGE_NT_OPTIONAL_HDR64_MAGIC
+    //
+    Magic = EFI_IMAGE_NT_OPTIONAL_HDR64_MAGIC;
+  } else {
+    //
+    // Get the magic value from the PE/COFF Optional Header
+    //
+    Magic = Hdr.Pe32->OptionalHeader.Magic;
+  }
+  if (Magic == EFI_IMAGE_NT_OPTIONAL_HDR32_MAGIC) {
+    SectionAlignment  = Hdr.Pe32->OptionalHeader.SectionAlignment;
+  } else {
+    SectionAlignment  = Hdr.Pe32Plus->OptionalHeader.SectionAlignment;
+  }
+
+  SetMemoryAttributesTableSectionAlignment (SectionAlignment);
+  if ((SectionAlignment & (EFI_ACPI_RUNTIME_PAGE_ALLOCATION_ALIGNMENT - 1)) != 0) {
+    DEBUG ((DEBUG_ERROR, "SMM !!!!!!!!  InsertImageRecord - Section Alignment(0x%x) is not %dK  !!!!!!!!\n",
+      SectionAlignment, EFI_ACPI_RUNTIME_PAGE_ALLOCATION_ALIGNMENT >> 10));
+    PdbPointer = PeCoffLoaderGetPdbPointer ((VOID*) (UINTN) ImageAddress);
+    if (PdbPointer != NULL) {
+      DEBUG ((DEBUG_ERROR, "SMM !!!!!!!!  Image - %a  !!!!!!!!\n", PdbPointer));
+    }
+    goto Finish;
+  }
+
+  Section = (EFI_IMAGE_SECTION_HEADER *) (
+               (UINT8 *) (UINTN) ImageAddress +
+               PeCoffHeaderOffset +
+               sizeof(UINT32) +
+               sizeof(EFI_IMAGE_FILE_HEADER) +
+               Hdr.Pe32->FileHeader.SizeOfOptionalHeader
+               );
+  ImageRecord->CodeSegmentCount = 0;
+  InitializeListHead (&ImageRecord->CodeSegmentList);
+  for (Index = 0; Index < Hdr.Pe32->FileHeader.NumberOfSections; Index++) {
+    Name = Section[Index].Name;
+    DEBUG ((
+      DEBUG_VERBOSE,
+      "SMM   Section - '%c%c%c%c%c%c%c%c'\n",
+      Name[0],
+      Name[1],
+      Name[2],
+      Name[3],
+      Name[4],
+      Name[5],
+      Name[6],
+      Name[7]
+      ));
+
+    if ((Section[Index].Characteristics & EFI_IMAGE_SCN_CNT_CODE) != 0) {
+      DEBUG ((DEBUG_VERBOSE, "SMM   VirtualSize          - 0x%08x\n", Section[Index].Misc.VirtualSize));
+      DEBUG ((DEBUG_VERBOSE, "SMM   VirtualAddress       - 0x%08x\n", Section[Index].VirtualAddress));
+      DEBUG ((DEBUG_VERBOSE, "SMM   SizeOfRawData        - 0x%08x\n", Section[Index].SizeOfRawData));
+      DEBUG ((DEBUG_VERBOSE, "SMM   PointerToRawData     - 0x%08x\n", Section[Index].PointerToRawData));
+      DEBUG ((DEBUG_VERBOSE, "SMM   PointerToRelocations - 0x%08x\n", Section[Index].PointerToRelocations));
+      DEBUG ((DEBUG_VERBOSE, "SMM   PointerToLinenumbers - 0x%08x\n", Section[Index].PointerToLinenumbers));
+      DEBUG ((DEBUG_VERBOSE, "SMM   NumberOfRelocations  - 0x%08x\n", Section[Index].NumberOfRelocations));
+      DEBUG ((DEBUG_VERBOSE, "SMM   NumberOfLinenumbers  - 0x%08x\n", Section[Index].NumberOfLinenumbers));
+      DEBUG ((DEBUG_VERBOSE, "SMM   Characteristics      - 0x%08x\n", Section[Index].Characteristics));
+
+      //
+      // Step 2: record code section
+      //
+      ImageRecordCodeSection = AllocatePool (sizeof(*ImageRecordCodeSection));
+      if (ImageRecordCodeSection == NULL) {
+        return ;
+      }
+      ImageRecordCodeSection->Signature = IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE;
+
+      ImageRecordCodeSection->CodeSegmentBase = (UINTN)ImageAddress + Section[Index].VirtualAddress;
+      ImageRecordCodeSection->CodeSegmentSize = Section[Index].SizeOfRawData;
+
+      DEBUG ((DEBUG_VERBOSE, "SMM ImageCode: 0x%016lx - 0x%016lx\n", ImageRecordCodeSection->CodeSegmentBase, ImageRecordCodeSection->CodeSegmentSize));
+
+      InsertTailList (&ImageRecord->CodeSegmentList, &ImageRecordCodeSection->Link);
+      ImageRecord->CodeSegmentCount++;
+    }
+  }
+
+  if (ImageRecord->CodeSegmentCount == 0) {
+    SetMemoryAttributesTableSectionAlignment (1);
+    DEBUG ((DEBUG_ERROR, "SMM !!!!!!!!  InsertImageRecord - CodeSegmentCount is 0  !!!!!!!!\n"));
+    PdbPointer = PeCoffLoaderGetPdbPointer ((VOID*) (UINTN) ImageAddress);
+    if (PdbPointer != NULL) {
+      DEBUG ((DEBUG_ERROR, "SMM !!!!!!!!  Image - %a  !!!!!!!!\n", PdbPointer));
+    }
+    goto Finish;
+  }
+
+  //
+  // Final
+  //
+  SortImageRecordCodeSection (ImageRecord);
+  //
+  // Check overlap all section in ImageBase/Size
+  //
+  if (!IsImageRecordCodeSectionValid (ImageRecord)) {
+    DEBUG ((DEBUG_ERROR, "SMM IsImageRecordCodeSectionValid - FAIL\n"));
+    goto Finish;
+  }
+
+  InsertTailList (&mImagePropertiesPrivateData.ImageRecordList, &ImageRecord->Link);
+  mImagePropertiesPrivateData.ImageRecordCount++;
+
+  SortImageRecord ();
+
+  if (mImagePropertiesPrivateData.CodeSegmentCountMax < ImageRecord->CodeSegmentCount) {
+    mImagePropertiesPrivateData.CodeSegmentCountMax = ImageRecord->CodeSegmentCount;
+  }
+
+Finish:
+  return ;
+}
+
+/**
+  Find image record accroding to image base and size.
+
+  @param[in]  ImageBase    Base of PE image
+  @param[in]  ImageSize    Size of PE image
+
+  @return image record
+**/
+STATIC
+IMAGE_PROPERTIES_RECORD *
+FindImageRecord (
+  IN EFI_PHYSICAL_ADDRESS  ImageBase,
+  IN UINT64                ImageSize
+  )
+{
+  IMAGE_PROPERTIES_RECORD    *ImageRecord;
+  LIST_ENTRY                 *ImageRecordLink;
+  LIST_ENTRY                 *ImageRecordList;
+
+  ImageRecordList = &mImagePropertiesPrivateData.ImageRecordList;
+
+  for (ImageRecordLink = ImageRecordList->ForwardLink;
+       ImageRecordLink != ImageRecordList;
+       ImageRecordLink = ImageRecordLink->ForwardLink) {
+    ImageRecord = CR (
+                    ImageRecordLink,
+                    IMAGE_PROPERTIES_RECORD,
+                    Link,
+                    IMAGE_PROPERTIES_RECORD_SIGNATURE
+                    );
+
+    if ((ImageBase == ImageRecord->ImageBase) &&
+        (ImageSize == ImageRecord->ImageSize)) {
+      return ImageRecord;
+    }
+  }
+
+  return NULL;
+}
+
+/**
+  Remove Image record.
+
+  @param[in]  DriverEntry    Driver information
+**/
+VOID
+SmmRemoveImageRecord (
+  IN EFI_SMM_DRIVER_ENTRY  *DriverEntry
+  )
+{
+  IMAGE_PROPERTIES_RECORD              *ImageRecord;
+  LIST_ENTRY                           *CodeSegmentListHead;
+  IMAGE_PROPERTIES_RECORD_CODE_SECTION *ImageRecordCodeSection;
+
+  DEBUG ((DEBUG_VERBOSE, "SMM RemoveImageRecord - 0x%x\n", DriverEntry));
+  DEBUG ((DEBUG_VERBOSE, "SMM RemoveImageRecord - 0x%016lx - 0x%016lx\n", DriverEntry->ImageBuffer, DriverEntry->NumberOfPage));
+
+  ImageRecord = FindImageRecord (DriverEntry->ImageBuffer, EFI_PAGES_TO_SIZE(DriverEntry->NumberOfPage));
+  if (ImageRecord == NULL) {
+    DEBUG ((DEBUG_ERROR, "SMM !!!!!!!! ImageRecord not found !!!!!!!!\n"));
+    return ;
+  }
+
+  CodeSegmentListHead = &ImageRecord->CodeSegmentList;
+  while (!IsListEmpty (CodeSegmentListHead)) {
+    ImageRecordCodeSection = CR (
+                               CodeSegmentListHead->ForwardLink,
+                               IMAGE_PROPERTIES_RECORD_CODE_SECTION,
+                               Link,
+                               IMAGE_PROPERTIES_RECORD_CODE_SECTION_SIGNATURE
+                               );
+    RemoveEntryList (&ImageRecordCodeSection->Link);
+    FreePool (ImageRecordCodeSection);
+  }
+
+  RemoveEntryList (&ImageRecord->Link);
+  FreePool (ImageRecord);
+  mImagePropertiesPrivateData.ImageRecordCount--;
+}
+
+/**
+  Publish MemoryAttributesTable to SMM configuration table.
+**/
+VOID
+PublishMemoryAttributesTable (
+  VOID
+  )
+{
+  UINTN                                MemoryMapSize;
+  EFI_MEMORY_DESCRIPTOR                *MemoryMap;
+  UINTN                                MapKey;
+  UINTN                                DescriptorSize;
+  UINT32                               DescriptorVersion;
+  UINTN                                Index;
+  EFI_STATUS                           Status;
+  UINTN                                RuntimeEntryCount;
+  EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE *MemoryAttributesTable;
+  EFI_MEMORY_DESCRIPTOR                *MemoryAttributesEntry;
+  UINTN                                MemoryAttributesTableSize;
+
+  MemoryMapSize = 0;
+  MemoryMap = NULL;
+  Status = SmmCoreGetMemoryMapMemoryAttributesTable (
+             &MemoryMapSize,
+             MemoryMap,
+             &MapKey,
+             &DescriptorSize,
+             &DescriptorVersion
+             );
+  ASSERT (Status == EFI_BUFFER_TOO_SMALL);
+
+  do {
+    DEBUG ((DEBUG_INFO, "MemoryMapSize - 0x%x\n", MemoryMapSize));
+    MemoryMap = AllocatePool (MemoryMapSize);
+    ASSERT (MemoryMap != NULL);
+    DEBUG ((DEBUG_INFO, "MemoryMap - 0x%x\n", MemoryMap));
+
+    Status = SmmCoreGetMemoryMapMemoryAttributesTable (
+               &MemoryMapSize,
+               MemoryMap,
+               &MapKey,
+               &DescriptorSize,
+               &DescriptorVersion
+               );
+    if (EFI_ERROR (Status)) {
+      FreePool (MemoryMap);
+    }
+  } while (Status == EFI_BUFFER_TOO_SMALL);
+
+  //
+  // Allocate MemoryAttributesTable
+  //
+  RuntimeEntryCount = MemoryMapSize/DescriptorSize;
+  MemoryAttributesTableSize = sizeof(EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE) + DescriptorSize * RuntimeEntryCount;
+  MemoryAttributesTable = AllocatePool (sizeof(EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE) + DescriptorSize * RuntimeEntryCount);
+  ASSERT (MemoryAttributesTable != NULL);
+  MemoryAttributesTable->Version         = EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE_VERSION;
+  MemoryAttributesTable->NumberOfEntries = (UINT32)RuntimeEntryCount;
+  MemoryAttributesTable->DescriptorSize  = (UINT32)DescriptorSize;
+  MemoryAttributesTable->Reserved        = 0;
+  DEBUG ((DEBUG_INFO, "MemoryAttributesTable:\n"));
+  DEBUG ((DEBUG_INFO, "  Version              - 0x%08x\n", MemoryAttributesTable->Version));
+  DEBUG ((DEBUG_INFO, "  NumberOfEntries      - 0x%08x\n", MemoryAttributesTable->NumberOfEntries));
+  DEBUG ((DEBUG_INFO, "  DescriptorSize       - 0x%08x\n", MemoryAttributesTable->DescriptorSize));
+  MemoryAttributesEntry = (EFI_MEMORY_DESCRIPTOR *)(MemoryAttributesTable + 1);
+  for (Index = 0; Index < MemoryMapSize/DescriptorSize; Index++) {
+    CopyMem (MemoryAttributesEntry, MemoryMap, DescriptorSize);
+    DEBUG ((DEBUG_INFO, "Entry (0x%x)\n", MemoryAttributesEntry));
+    DEBUG ((DEBUG_INFO, "  Type              - 0x%x\n", MemoryAttributesEntry->Type));
+    DEBUG ((DEBUG_INFO, "  PhysicalStart     - 0x%016lx\n", MemoryAttributesEntry->PhysicalStart));
+    DEBUG ((DEBUG_INFO, "  VirtualStart      - 0x%016lx\n", MemoryAttributesEntry->VirtualStart));
+    DEBUG ((DEBUG_INFO, "  NumberOfPages     - 0x%016lx\n", MemoryAttributesEntry->NumberOfPages));
+    DEBUG ((DEBUG_INFO, "  Attribute         - 0x%016lx\n", MemoryAttributesEntry->Attribute));
+    MemoryAttributesEntry = NEXT_MEMORY_DESCRIPTOR(MemoryAttributesEntry, DescriptorSize);
+
+    MemoryMap = NEXT_MEMORY_DESCRIPTOR(MemoryMap, DescriptorSize);
+  }
+
+  Status = gSmst->SmmInstallConfigurationTable (gSmst, &gEdkiiPiSmmMemoryAttributesTableGuid, MemoryAttributesTable, MemoryAttributesTableSize);
+  ASSERT_EFI_ERROR (Status);
+}
+
+/**
+  This function returns if image is inside SMRAM.
+
+  @param[in] LoadedImage LoadedImage protocol instance for an image.
+
+  @retval TRUE  the image is inside SMRAM.
+  @retval FALSE the image is outside SMRAM.
+**/
+BOOLEAN
+IsImageInsideSmram (
+  IN EFI_LOADED_IMAGE_PROTOCOL   *LoadedImage
+  )
+{
+  UINTN  Index;
+
+  for (Index = 0; Index < mFullSmramRangeCount; Index++) {
+    if ((mFullSmramRanges[Index].PhysicalStart <= (UINTN)LoadedImage->ImageBase)&&
+        (mFullSmramRanges[Index].PhysicalStart + mFullSmramRanges[Index].PhysicalSize >= (UINTN)LoadedImage->ImageBase + LoadedImage->ImageSize)) {
+      return TRUE;
+    }
+  }
+
+  return FALSE;
+}
+
+/**
+  This function installs all SMM image record information.
+**/
+VOID
+SmmInstallImageRecord (
+  VOID
+  )
+{
+  EFI_STATUS                  Status;
+  UINTN                       NoHandles;
+  EFI_HANDLE                  *HandleBuffer;
+  EFI_LOADED_IMAGE_PROTOCOL   *LoadedImage;
+  UINTN                       Index;
+  EFI_SMM_DRIVER_ENTRY        DriverEntry;
+
+  Status = SmmLocateHandleBuffer (
+             ByProtocol,
+             &gEfiLoadedImageProtocolGuid,
+             NULL,
+             &NoHandles,
+             &HandleBuffer
+             );
+  if (EFI_ERROR (Status)) {
+    return ;
+  }
+
+  for (Index = 0; Index < NoHandles; Index++) {
+    Status = gSmst->SmmHandleProtocol (
+                      HandleBuffer[Index],
+                      &gEfiLoadedImageProtocolGuid,
+                      (VOID **)&LoadedImage
+                      );
+    if (EFI_ERROR (Status)) {
+      continue;
+    }
+    DEBUG ((DEBUG_VERBOSE, "LoadedImage - 0x%x 0x%x ", LoadedImage->ImageBase, LoadedImage->ImageSize));
+    {
+      VOID *PdbPointer;
+      PdbPointer = PeCoffLoaderGetPdbPointer (LoadedImage->ImageBase);
+      if (PdbPointer != NULL) {
+        DEBUG ((DEBUG_VERBOSE, "(%a) ", PdbPointer));
+      }
+    }
+    DEBUG ((DEBUG_VERBOSE, "\n"));
+    ZeroMem (&DriverEntry, sizeof(DriverEntry));
+    DriverEntry.ImageBuffer  = (UINTN)LoadedImage->ImageBase;
+    DriverEntry.NumberOfPage = EFI_SIZE_TO_PAGES((UINTN)LoadedImage->ImageSize);
+    SmmInsertImageRecord (&DriverEntry);
+  }
+
+  FreePool (HandleBuffer);
+}
+
+/**
+  Install MemoryAttributesTable.
+
+  @param[in] Protocol   Points to the protocol's unique identifier.
+  @param[in] Interface  Points to the interface instance.
+  @param[in] Handle     The handle on which the interface was installed.
+
+  @retval EFI_SUCCESS   Notification runs successfully.
+**/
+EFI_STATUS
+EFIAPI
+SmmInstallMemoryAttributesTable (
+  IN CONST EFI_GUID  *Protocol,
+  IN VOID            *Interface,
+  IN EFI_HANDLE      Handle
+  )
+{
+  SmmInstallImageRecord ();
+
+  DEBUG ((DEBUG_INFO, "SMM MemoryProtectionAttribute - 0x%016lx\n", mMemoryProtectionAttribute));
+  if ((mMemoryProtectionAttribute & EFI_MEMORY_ATTRIBUTES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA) == 0) {
+    return EFI_SUCCESS;
+  }
+
+  DEBUG ((DEBUG_VERBOSE, "SMM Total Image Count - 0x%x\n", mImagePropertiesPrivateData.ImageRecordCount));
+  DEBUG ((DEBUG_VERBOSE, "SMM Dump ImageRecord:\n"));
+  DumpImageRecord ();
+
+  PublishMemoryAttributesTable ();
+
+  return EFI_SUCCESS;
+}
+
+/**
+  Initialize MemoryAttributesTable support.
+**/
+VOID
+EFIAPI
+SmmCoreInitializeMemoryAttributesTable (
+  VOID
+  )
+{
+  EFI_STATUS                        Status;
+  VOID                              *Registration;
+
+  Status = gSmst->SmmRegisterProtocolNotify (
+                    &gEfiSmmEndOfDxeProtocolGuid,
+                    SmmInstallMemoryAttributesTable,
+                    &Registration
+                    );
+  ASSERT_EFI_ERROR (Status);
+
+  return ;
+}
diff --git a/MdeModulePkg/Core/PiSmmCore/Page.c b/MdeModulePkg/Core/PiSmmCore/Page.c
index 5c04e8c..5f19d7e 100644
--- a/MdeModulePkg/Core/PiSmmCore/Page.c
+++ b/MdeModulePkg/Core/PiSmmCore/Page.c
@@ -2,22 +2,572 @@
   SMM Memory page management functions.
 
   Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
-  This program and the accompanying materials are licensed and made available 
-  under the terms and conditions of the BSD License which accompanies this 
-  distribution.  The full text of the license may be found at        
-  http://opensource.org/licenses/bsd-license.php                                            
+  This program and the accompanying materials are licensed and made available
+  under the terms and conditions of the BSD License which accompanies this
+  distribution.  The full text of the license may be found at
+  http://opensource.org/licenses/bsd-license.php
 
-  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,                     
-  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.             
+  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 
 **/
 
 #include "PiSmmCore.h"
+#include <Library/SmmServicesTableLib.h>
 
 #define TRUNCATE_TO_PAGES(a)  ((a) >> EFI_PAGE_SHIFT)
 
 LIST_ENTRY  mSmmMemoryMap = INITIALIZE_LIST_HEAD_VARIABLE (mSmmMemoryMap);
 
+//
+// For GetMemoryMap()
+//
+
+#define MEMORY_MAP_SIGNATURE   SIGNATURE_32('m','m','a','p')
+typedef struct {
+  UINTN           Signature;
+  LIST_ENTRY      Link;
+
+  BOOLEAN         FromStack;
+  EFI_MEMORY_TYPE Type;
+  UINT64          Start;
+  UINT64          End;
+
+} MEMORY_MAP;
+
+LIST_ENTRY        gMemoryMap  = INITIALIZE_LIST_HEAD_VARIABLE (gMemoryMap);
+
+
+#define MAX_MAP_DEPTH 6
+
+///
+/// mMapDepth - depth of new descriptor stack
+///
+UINTN         mMapDepth = 0;
+///
+/// mMapStack - space to use as temp storage to build new map descriptors
+///
+MEMORY_MAP    mMapStack[MAX_MAP_DEPTH];
+UINTN         mFreeMapStack = 0;
+///
+/// This list maintain the free memory map list
+///
+LIST_ENTRY   mFreeMemoryMapEntryList = INITIALIZE_LIST_HEAD_VARIABLE (mFreeMemoryMapEntryList);
+
+/**
+  Allocates pages from the memory map.
+
+  @param[in]   Type                   The type of allocation to perform.
+  @param[in]   MemoryType             The type of memory to turn the allocated pages
+                                      into.
+  @param[in]   NumberOfPages          The number of pages to allocate.
+  @param[out]  Memory                 A pointer to receive the base allocated memory
+                                      address.
+  @param[in]   AddRegion              If this memory is new added region.
+
+  @retval EFI_INVALID_PARAMETER  Parameters violate checking rules defined in spec.
+  @retval EFI_NOT_FOUND          Could not allocate pages match the requirement.
+  @retval EFI_OUT_OF_RESOURCES   No enough pages to allocate.
+  @retval EFI_SUCCESS            Pages successfully allocated.
+
+**/
+EFI_STATUS
+SmmInternalAllocatePagesEx (
+  IN  EFI_ALLOCATE_TYPE     Type,
+  IN  EFI_MEMORY_TYPE       MemoryType,
+  IN  UINTN                 NumberOfPages,
+  OUT EFI_PHYSICAL_ADDRESS  *Memory,
+  IN  BOOLEAN               AddRegion
+  );
+
+/**
+  Internal function.  Deque a descriptor entry from the mFreeMemoryMapEntryList.
+  If the list is emtry, then allocate a new page to refuel the list.
+  Please Note this algorithm to allocate the memory map descriptor has a property
+  that the memory allocated for memory entries always grows, and will never really be freed.
+
+  @return The Memory map descriptor dequed from the mFreeMemoryMapEntryList
+
+**/
+MEMORY_MAP *
+AllocateMemoryMapEntry (
+  VOID
+  )
+{
+  EFI_PHYSICAL_ADDRESS   Mem;
+  EFI_STATUS             Status;
+  MEMORY_MAP*            FreeDescriptorEntries;
+  MEMORY_MAP*            Entry;
+  UINTN                  Index;
+
+  //DEBUG((DEBUG_INFO, "AllocateMemoryMapEntry\n"));
+
+  if (IsListEmpty (&mFreeMemoryMapEntryList)) {
+    //DEBUG((DEBUG_INFO, "mFreeMemoryMapEntryList is empty\n"));
+    //
+    // The list is empty, to allocate one page to refuel the list
+    //
+    Status = SmmInternalAllocatePagesEx (
+               AllocateAnyPages,
+               EfiRuntimeServicesData,
+               EFI_SIZE_TO_PAGES(DEFAULT_PAGE_ALLOCATION),
+               &Mem,
+               TRUE
+               );
+    ASSERT_EFI_ERROR (Status);
+    if(!EFI_ERROR (Status)) {
+      FreeDescriptorEntries = (MEMORY_MAP *)(UINTN)Mem;
+      //DEBUG((DEBUG_INFO, "New FreeDescriptorEntries - 0x%x\n", FreeDescriptorEntries));
+      //
+      // Enque the free memmory map entries into the list
+      //
+      for (Index = 0; Index< DEFAULT_PAGE_ALLOCATION / sizeof(MEMORY_MAP); Index++) {
+        FreeDescriptorEntries[Index].Signature = MEMORY_MAP_SIGNATURE;
+        InsertTailList (&mFreeMemoryMapEntryList, &FreeDescriptorEntries[Index].Link);
+      }
+    } else {
+      return NULL;
+    }
+  }
+  //
+  // dequeue the first descriptor from the list
+  //
+  Entry = CR (mFreeMemoryMapEntryList.ForwardLink, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+  RemoveEntryList (&Entry->Link);
+
+  return Entry;
+}
+
+
+/**
+  Internal function.  Moves any memory descriptors that are on the
+  temporary descriptor stack to heap.
+
+**/
+VOID
+CoreFreeMemoryMapStack (
+  VOID
+  )
+{
+  MEMORY_MAP      *Entry;
+
+  //
+  // If already freeing the map stack, then return
+  //
+  if (mFreeMapStack != 0) {
+    ASSERT (FALSE);
+    return ;
+  }
+
+  //
+  // Move the temporary memory descriptor stack into pool
+  //
+  mFreeMapStack += 1;
+
+  while (mMapDepth != 0) {
+    //
+    // Deque an memory map entry from mFreeMemoryMapEntryList
+    //
+    Entry = AllocateMemoryMapEntry ();
+    ASSERT (Entry);
+
+    //
+    // Update to proper entry
+    //
+    mMapDepth -= 1;
+
+    if (mMapStack[mMapDepth].Link.ForwardLink != NULL) {
+
+      CopyMem (Entry , &mMapStack[mMapDepth], sizeof (MEMORY_MAP));
+      Entry->FromStack = FALSE;
+
+      //
+      // Move this entry to general memory
+      //
+      InsertTailList (&mMapStack[mMapDepth].Link, &Entry->Link);
+      RemoveEntryList (&mMapStack[mMapDepth].Link);
+      mMapStack[mMapDepth].Link.ForwardLink = NULL;
+    }
+  }
+
+  mFreeMapStack -= 1;
+}
+
+/**
+  Insert new entry from memory map.
+
+  @param[in]  Link       The old memory map entry to be linked.
+  @param[in]  Start      The start address of new memory map entry.
+  @param[in]  End        The end address of new memory map entry.
+  @param[in]  Type       The type of new memory map entry.
+  @param[in]  Next       If new entry is inserted to the next of old entry.
+  @param[in]  AddRegion  If this memory is new added region.
+**/
+VOID
+InsertNewEntry (
+  IN LIST_ENTRY      *Link,
+  IN UINT64          Start,
+  IN UINT64          End,
+  IN EFI_MEMORY_TYPE Type,
+  IN BOOLEAN         Next,
+  IN BOOLEAN         AddRegion
+  )
+{
+  MEMORY_MAP  *Entry;
+
+  Entry = &mMapStack[mMapDepth];
+  mMapDepth += 1;
+  ASSERT (mMapDepth < MAX_MAP_DEPTH);
+  Entry->FromStack = TRUE;
+
+  Entry->Signature = MEMORY_MAP_SIGNATURE;
+  Entry->Type = Type;
+  Entry->Start = Start;
+  Entry->End = End;
+  if (Next) {
+    InsertHeadList (Link, &Entry->Link);
+  } else {
+    InsertTailList (Link, &Entry->Link);
+  }
+}
+
+/**
+  Remove old entry from memory map.
+
+  @param[in] Entry Memory map entry to be removed.
+**/
+VOID
+RemoveOldEntry (
+  IN MEMORY_MAP  *Entry
+  )
+{
+  RemoveEntryList (&Entry->Link);
+  if (!Entry->FromStack) {
+    InsertTailList (&mFreeMemoryMapEntryList, &Entry->Link);
+  }
+}
+
+/**
+  Update SMM memory map entry.
+
+  @param[in]  Type                   The type of allocation to perform.
+  @param[in]  Memory                 The base of memory address.
+  @param[in]  NumberOfPages          The number of pages to allocate.
+  @param[in]  AddRegion              If this memory is new added region.
+**/
+VOID
+ConvertSmmMemoryMapEntry (
+  IN EFI_MEMORY_TYPE       Type,
+  IN EFI_PHYSICAL_ADDRESS  Memory,
+  IN UINTN                 NumberOfPages,
+  IN BOOLEAN               AddRegion
+  )
+{
+  LIST_ENTRY               *Link;
+  MEMORY_MAP               *Entry;
+  MEMORY_MAP               *NextEntry;
+  LIST_ENTRY               *NextLink;
+  MEMORY_MAP               *PreviousEntry;
+  LIST_ENTRY               *PreviousLink;
+  EFI_PHYSICAL_ADDRESS     Start;
+  EFI_PHYSICAL_ADDRESS     End;
+
+  Start = Memory;
+  End = Memory + EFI_PAGES_TO_SIZE(NumberOfPages) - 1;
+
+  //
+  // Exclude memory region
+  //
+  Link = gMemoryMap.ForwardLink;
+  while (Link != &gMemoryMap) {
+    Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+    Link  = Link->ForwardLink;
+
+    //
+    // ---------------------------------------------------
+    // |  +----------+   +------+   +------+   +------+  |
+    // ---|gMemoryMep|---|Entry1|---|Entry2|---|Entry3|---
+    //    +----------+ ^ +------+   +------+   +------+
+    //                 |
+    //              +------+
+    //              |EntryX|
+    //              +------+
+    //
+    if (Entry->Start > End) {
+      if ((Entry->Start == End + 1) && (Entry->Type == Type)) {
+        Entry->Start = Start;
+        return ;
+      }
+      InsertNewEntry (
+        &Entry->Link,
+        Start,
+        End,
+        Type,
+        FALSE,
+        AddRegion
+        );
+      return ;
+    }
+
+    if ((Entry->Start <= Start) && (Entry->End >= End)) {
+      if (Entry->Type != Type) {
+        if (Entry->Start < Start) {
+          //
+          // ---------------------------------------------------
+          // |  +----------+   +------+   +------+   +------+  |
+          // ---|gMemoryMep|---|Entry1|---|EntryX|---|Entry3|---
+          //    +----------+   +------+ ^ +------+   +------+
+          //                            |
+          //                         +------+
+          //                         |EntryA|
+          //                         +------+
+          //
+          InsertNewEntry (
+            &Entry->Link,
+            Entry->Start,
+            Start - 1,
+            Entry->Type,
+            FALSE,
+            AddRegion
+            );
+        }
+        if (Entry->End > End) {
+          //
+          // ---------------------------------------------------
+          // |  +----------+   +------+   +------+   +------+  |
+          // ---|gMemoryMep|---|Entry1|---|EntryX|---|Entry3|---
+          //    +----------+   +------+   +------+ ^ +------+
+          //                                       |
+          //                                    +------+
+          //                                    |EntryZ|
+          //                                    +------+
+          //
+          InsertNewEntry (
+            &Entry->Link,
+            End + 1,
+            Entry->End,
+            Entry->Type,
+            TRUE,
+            AddRegion
+            );
+        }
+        //
+        // Update this node
+        //
+        Entry->Start = Start;
+        Entry->End = End;
+        Entry->Type = Type;
+
+        //
+        // Check adjacent
+        //
+        NextLink = Entry->Link.ForwardLink;
+        if (NextLink != &gMemoryMap) {
+          NextEntry = CR (NextLink, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+          //
+          // ---------------------------------------------------
+          // |  +----------+   +------+   +-----------------+  |
+          // ---|gMemoryMep|---|Entry1|---|EntryX     Entry3|---
+          //    +----------+   +------+   +-----------------+
+          //
+          if ((Entry->Type == NextEntry->Type) && (Entry->End + 1 == NextEntry->Start)) {
+            Entry->End = NextEntry->End;
+            RemoveOldEntry (NextEntry);
+          }
+        }
+        PreviousLink = Entry->Link.BackLink;
+        if (PreviousLink != &gMemoryMap) {
+          PreviousEntry = CR (PreviousLink, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+          //
+          // ---------------------------------------------------
+          // |  +----------+   +-----------------+   +------+  |
+          // ---|gMemoryMep|---|Entry1     EntryX|---|Entry3|---
+          //    +----------+   +-----------------+   +------+
+          //
+          if ((PreviousEntry->Type == Entry->Type) && (PreviousEntry->End + 1 == Entry->Start)) {
+            PreviousEntry->End = Entry->End;
+            RemoveOldEntry (Entry);
+          }
+        }
+      }
+      return ;
+    }
+  }
+
+  //
+  // ---------------------------------------------------
+  // |  +----------+   +------+   +------+   +------+  |
+  // ---|gMemoryMep|---|Entry1|---|Entry2|---|Entry3|---
+  //    +----------+   +------+   +------+   +------+ ^
+  //                                                  |
+  //                                               +------+
+  //                                               |EntryX|
+  //                                               +------+
+  //
+  Link = gMemoryMap.BackLink;
+  if (Link != &gMemoryMap) {
+    Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+    if ((Entry->End + 1 == Start) && (Entry->Type == Type)) {
+      Entry->End = End;
+      return ;
+    }
+  }
+  InsertNewEntry (
+    &gMemoryMap,
+    Start,
+    End,
+    Type,
+    FALSE,
+    AddRegion
+    );
+  return ;
+}
+
+/**
+  Return the count of Smm memory map entry.
+
+  @return The count of Smm memory map entry.
+**/
+UINTN
+GetSmmMemoryMapEntryCount (
+  VOID
+  )
+{
+  LIST_ENTRY               *Link;
+  UINTN                    Count;
+
+  Count = 0;
+  Link = gMemoryMap.ForwardLink;
+  while (Link != &gMemoryMap) {
+    Link  = Link->ForwardLink;
+    Count++;
+  }
+  return Count;
+}
+
+/**
+  Dump Smm memory map entry.
+**/
+VOID
+DumpSmmMemoryMapEntry (
+  VOID
+  )
+{
+  LIST_ENTRY               *Link;
+  MEMORY_MAP               *Entry;
+  EFI_PHYSICAL_ADDRESS     Last;
+
+  Last = 0;
+  DEBUG ((DEBUG_INFO, "DumpSmmMemoryMapEntry:\n"));
+  Link = gMemoryMap.ForwardLink;
+  while (Link != &gMemoryMap) {
+    Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+    Link  = Link->ForwardLink;
+
+    if ((Last != 0) && (Last != (UINT64)-1)) {
+      if (Last + 1 != Entry->Start) {
+        Last = (UINT64)-1;
+      } else {
+        Last = Entry->End;
+      }
+    } else if (Last == 0) {
+      Last = Entry->End;
+    }
+
+    DEBUG ((DEBUG_INFO, "Entry (Link - 0x%x)\n", &Entry->Link));
+    DEBUG ((DEBUG_INFO, "  Signature         - 0x%x\n", Entry->Signature));
+    DEBUG ((DEBUG_INFO, "  Link.ForwardLink  - 0x%x\n", Entry->Link.ForwardLink));
+    DEBUG ((DEBUG_INFO, "  Link.BackLink     - 0x%x\n", Entry->Link.BackLink));
+    DEBUG ((DEBUG_INFO, "  Type              - 0x%x\n", Entry->Type));
+    DEBUG ((DEBUG_INFO, "  Start             - 0x%016lx\n", Entry->Start));
+    DEBUG ((DEBUG_INFO, "  End               - 0x%016lx\n", Entry->End));
+  }
+
+  ASSERT (Last != (UINT64)-1);
+}
+
+/**
+  Dump Smm memory map.
+**/
+VOID
+DumpSmmMemoryMap (
+  VOID
+  )
+{
+  LIST_ENTRY      *Node;
+  FREE_PAGE_LIST  *Pages;
+
+  DEBUG ((DEBUG_INFO, "DumpSmmMemoryMap\n"));
+
+  Pages = NULL;
+  Node = mSmmMemoryMap.ForwardLink;
+  while (Node != &mSmmMemoryMap) {
+    Pages = BASE_CR (Node, FREE_PAGE_LIST, Link);
+    DEBUG ((DEBUG_INFO, "Pages - 0x%x\n", Pages));
+    DEBUG ((DEBUG_INFO, "Pages->NumberOfPages - 0x%x\n", Pages->NumberOfPages));
+    Node = Node->ForwardLink;
+  }
+}
+
+/**
+  Check if a Smm base~length is in Smm memory map.
+
+  @param[in] Base   The base address of Smm memory to be checked.
+  @param[in] Length THe length of Smm memory to be checked.
+
+  @retval TRUE  Smm base~length is in smm memory map.
+  @retval FALSE Smm base~length is in smm memory map.
+**/
+BOOLEAN
+SmmMemoryMapConsistencyCheckRange (
+  IN EFI_PHYSICAL_ADDRESS Base,
+  IN UINTN                Length
+  )
+{
+  LIST_ENTRY               *Link;
+  MEMORY_MAP               *Entry;
+  BOOLEAN                  Result;
+
+  Result = FALSE;
+  Link = gMemoryMap.ForwardLink;
+  while (Link != &gMemoryMap) {
+    Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+    Link  = Link->ForwardLink;
+
+    if (Entry->Type != EfiConventionalMemory) {
+      continue;
+    }
+    if (Entry->Start == Base && Entry->End == Base + Length - 1) {
+      Result = TRUE;
+      break;
+    }
+  }
+
+  return Result;
+}
+
+/**
+  Check the consistency of Smm memory map.
+**/
+VOID
+SmmMemoryMapConsistencyCheck (
+  VOID
+  )
+{
+  LIST_ENTRY      *Node;
+  FREE_PAGE_LIST  *Pages;
+  BOOLEAN         Result;
+
+  Pages = NULL;
+  Node = mSmmMemoryMap.ForwardLink;
+  while (Node != &mSmmMemoryMap) {
+    Pages = BASE_CR (Node, FREE_PAGE_LIST, Link);
+    Result = SmmMemoryMapConsistencyCheckRange ((EFI_PHYSICAL_ADDRESS)(UINTN)Pages, (UINTN)EFI_PAGES_TO_SIZE(Pages->NumberOfPages));
+    ASSERT (Result);
+    Node = Node->ForwardLink;
+  }
+}
+
 /**
   Internal Function. Allocate n pages from given free page node.
 
@@ -131,12 +681,13 @@ InternalAllocAddress (
 /**
   Allocates pages from the memory map.
 
-  @param  Type                   The type of allocation to perform.
-  @param  MemoryType             The type of memory to turn the allocated pages
-                                 into.
-  @param  NumberOfPages          The number of pages to allocate.
-  @param  Memory                 A pointer to receive the base allocated memory
-                                 address.
+  @param[in]   Type                   The type of allocation to perform.
+  @param[in]   MemoryType             The type of memory to turn the allocated pages
+                                      into.
+  @param[in]   NumberOfPages          The number of pages to allocate.
+  @param[out]  Memory                 A pointer to receive the base allocated memory
+                                      address.
+  @param[in]   AddRegion              If this memory is new added region.
 
   @retval EFI_INVALID_PARAMETER  Parameters violate checking rules defined in spec.
   @retval EFI_NOT_FOUND          Could not allocate pages match the requirement.
@@ -145,12 +696,12 @@ InternalAllocAddress (
 
 **/
 EFI_STATUS
-EFIAPI
-SmmInternalAllocatePages (
+SmmInternalAllocatePagesEx (
   IN  EFI_ALLOCATE_TYPE     Type,
   IN  EFI_MEMORY_TYPE       MemoryType,
   IN  UINTN                 NumberOfPages,
-  OUT EFI_PHYSICAL_ADDRESS  *Memory
+  OUT EFI_PHYSICAL_ADDRESS  *Memory,
+  IN  BOOLEAN               AddRegion
   )
 {
   UINTN  RequestedAddress;
@@ -179,7 +730,7 @@ SmmInternalAllocatePages (
                   );
       if (*Memory == (UINTN)-1) {
         return EFI_OUT_OF_RESOURCES;
-      } 
+      }
       break;
     case AllocateAddress:
       *Memory = InternalAllocAddress (
@@ -194,12 +745,49 @@ SmmInternalAllocatePages (
     default:
       return EFI_INVALID_PARAMETER;
   }
+
+  //
+  // Update SmmMemoryMap here.
+  //
+  ConvertSmmMemoryMapEntry (MemoryType, *Memory, NumberOfPages, AddRegion);
+  if (!AddRegion) {
+    CoreFreeMemoryMapStack();
+  }
+
   return EFI_SUCCESS;
 }
 
 /**
   Allocates pages from the memory map.
 
+  @param[in]   Type                   The type of allocation to perform.
+  @param[in]   MemoryType             The type of memory to turn the allocated pages
+                                      into.
+  @param[in]   NumberOfPages          The number of pages to allocate.
+  @param[out]  Memory                 A pointer to receive the base allocated memory
+                                      address.
+
+  @retval EFI_INVALID_PARAMETER  Parameters violate checking rules defined in spec.
+  @retval EFI_NOT_FOUND          Could not allocate pages match the requirement.
+  @retval EFI_OUT_OF_RESOURCES   No enough pages to allocate.
+  @retval EFI_SUCCESS            Pages successfully allocated.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmInternalAllocatePages (
+  IN  EFI_ALLOCATE_TYPE     Type,
+  IN  EFI_MEMORY_TYPE       MemoryType,
+  IN  UINTN                 NumberOfPages,
+  OUT EFI_PHYSICAL_ADDRESS  *Memory
+  )
+{
+  return SmmInternalAllocatePagesEx (Type, MemoryType, NumberOfPages, Memory, FALSE);
+}
+
+/**
+  Allocates pages from the memory map.
+
   @param  Type                   The type of allocation to perform.
   @param  MemoryType             The type of memory to turn the allocated pages
                                  into.
@@ -268,8 +856,9 @@ InternalMergeNodes (
 /**
   Frees previous allocated pages.
 
-  @param  Memory                 Base address of memory being freed.
-  @param  NumberOfPages          The number of pages to free.
+  @param[in]  Memory                 Base address of memory being freed.
+  @param[in]  NumberOfPages          The number of pages to free.
+  @param[in]  AddRegion              If this memory is new added region.
 
   @retval EFI_NOT_FOUND          Could not find the entry that covers the range.
   @retval EFI_INVALID_PARAMETER  Address not aligned.
@@ -277,10 +866,10 @@ InternalMergeNodes (
 
 **/
 EFI_STATUS
-EFIAPI
-SmmInternalFreePages (
+SmmInternalFreePagesEx (
   IN EFI_PHYSICAL_ADDRESS  Memory,
-  IN UINTN                 NumberOfPages
+  IN UINTN                 NumberOfPages,
+  IN BOOLEAN               AddRegion
   )
 {
   LIST_ENTRY      *Node;
@@ -326,12 +915,41 @@ SmmInternalFreePages (
     InternalMergeNodes (Pages);
   }
 
+  //
+  // Update SmmMemoryMap here.
+  //
+  ConvertSmmMemoryMapEntry (EfiConventionalMemory, Memory, NumberOfPages, AddRegion);
+  if (!AddRegion) {
+    CoreFreeMemoryMapStack();
+  }
+
   return EFI_SUCCESS;
 }
 
 /**
   Frees previous allocated pages.
 
+  @param[in]  Memory                 Base address of memory being freed.
+  @param[in]  NumberOfPages          The number of pages to free.
+
+  @retval EFI_NOT_FOUND          Could not find the entry that covers the range.
+  @retval EFI_INVALID_PARAMETER  Address not aligned.
+  @return EFI_SUCCESS            Pages successfully freed.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmInternalFreePages (
+  IN EFI_PHYSICAL_ADDRESS  Memory,
+  IN UINTN                 NumberOfPages
+  )
+{
+  return SmmInternalFreePagesEx (Memory, NumberOfPages, FALSE);
+}
+
+/**
+  Frees previous allocated pages.
+
   @param  Memory                 Base address of memory being freed.
   @param  NumberOfPages          The number of pages to free.
 
@@ -383,16 +1001,121 @@ SmmAddMemoryRegion (
   UINTN  AlignedMemBase;
 
   //
-  // Do not add memory regions that is already allocated, needs testing, or needs ECC initialization
+  // Add EfiRuntimeServicesData for memory regions that is already allocated, needs testing, or needs ECC initialization
   //
   if ((Attributes & (EFI_ALLOCATED | EFI_NEEDS_TESTING | EFI_NEEDS_ECC_INITIALIZATION)) != 0) {
-    return;
+    Type = EfiRuntimeServicesData;
+  } else {
+    Type = EfiConventionalMemory;
   }
-  
+
+  DEBUG ((DEBUG_INFO, "SmmAddMemoryRegion\n"));
+  DEBUG ((DEBUG_INFO, "  MemBase    - 0x%lx\n", MemBase));
+  DEBUG ((DEBUG_INFO, "  MemLength  - 0x%lx\n", MemLength));
+  DEBUG ((DEBUG_INFO, "  Type       - 0x%x\n", Type));
+  DEBUG ((DEBUG_INFO, "  Attributes - 0x%lx\n", Attributes));
+
   //
   // Align range on an EFI_PAGE_SIZE boundary
-  //  
+  //
   AlignedMemBase = (UINTN)(MemBase + EFI_PAGE_MASK) & ~EFI_PAGE_MASK;
   MemLength -= AlignedMemBase - MemBase;
-  SmmFreePages (AlignedMemBase, TRUNCATE_TO_PAGES ((UINTN)MemLength));
+  if (Type == EfiConventionalMemory) {
+    SmmInternalFreePagesEx (AlignedMemBase, TRUNCATE_TO_PAGES ((UINTN)MemLength), TRUE);
+  } else {
+    ConvertSmmMemoryMapEntry (EfiRuntimeServicesData, AlignedMemBase, TRUNCATE_TO_PAGES ((UINTN)MemLength), TRUE);
+  }
+
+  CoreFreeMemoryMapStack ();
+}
+
+/**
+  This function returns a copy of the current memory map. The map is an array of
+  memory descriptors, each of which describes a contiguous block of memory.
+
+  @param[in, out]  MemoryMapSize          A pointer to the size, in bytes, of the
+                                          MemoryMap buffer. On input, this is the size of
+                                          the buffer allocated by the caller.  On output,
+                                          it is the size of the buffer returned by the
+                                          firmware  if the buffer was large enough, or the
+                                          size of the buffer needed  to contain the map if
+                                          the buffer was too small.
+  @param[in, out]  MemoryMap              A pointer to the buffer in which firmware places
+                                          the current memory map.
+  @param[out]      MapKey                 A pointer to the location in which firmware
+                                          returns the key for the current memory map.
+  @param[out]      DescriptorSize         A pointer to the location in which firmware
+                                          returns the size, in bytes, of an individual
+                                          EFI_MEMORY_DESCRIPTOR.
+  @param[out]      DescriptorVersion      A pointer to the location in which firmware
+                                          returns the version number associated with the
+                                          EFI_MEMORY_DESCRIPTOR.
+
+  @retval EFI_SUCCESS            The memory map was returned in the MemoryMap
+                                 buffer.
+  @retval EFI_BUFFER_TOO_SMALL   The MemoryMap buffer was too small. The current
+                                 buffer size needed to hold the memory map is
+                                 returned in MemoryMapSize.
+  @retval EFI_INVALID_PARAMETER  One of the parameters has an invalid value.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmCoreGetMemoryMap (
+  IN OUT UINTN                  *MemoryMapSize,
+  IN OUT EFI_MEMORY_DESCRIPTOR  *MemoryMap,
+  OUT UINTN                     *MapKey,
+  OUT UINTN                     *DescriptorSize,
+  OUT UINT32                    *DescriptorVersion
+  )
+{
+  UINTN                    Count;
+  LIST_ENTRY               *Link;
+  MEMORY_MAP               *Entry;
+  UINTN                    Size;
+  UINTN                    BufferSize;
+
+  Size = sizeof (EFI_MEMORY_DESCRIPTOR);
+
+  //
+  // Make sure Size != sizeof(EFI_MEMORY_DESCRIPTOR). This will
+  // prevent people from having pointer math bugs in their code.
+  // now you have to use *DescriptorSize to make things work.
+  //
+  Size += sizeof(UINT64) - (Size % sizeof (UINT64));
+
+  if (DescriptorSize != NULL) {
+    *DescriptorSize = Size;
+  }
+
+  if (DescriptorVersion != NULL) {
+    *DescriptorVersion = EFI_MEMORY_DESCRIPTOR_VERSION;
+  }
+
+  Count = GetSmmMemoryMapEntryCount ();
+  BufferSize = Size * Count;
+  if (*MemoryMapSize < BufferSize) {
+    *MemoryMapSize = BufferSize;
+    return EFI_BUFFER_TOO_SMALL;
+  }
+
+  *MemoryMapSize = BufferSize;
+  if (MemoryMap == NULL) {
+    return EFI_INVALID_PARAMETER;
+  }
+
+  ZeroMem (MemoryMap, BufferSize);
+  Link = gMemoryMap.ForwardLink;
+  while (Link != &gMemoryMap) {
+    Entry = CR (Link, MEMORY_MAP, Link, MEMORY_MAP_SIGNATURE);
+    Link  = Link->ForwardLink;
+
+    MemoryMap->Type           = Entry->Type;
+    MemoryMap->PhysicalStart  = Entry->Start;
+    MemoryMap->NumberOfPages  = RShiftU64 (Entry->End - Entry->Start + 1, EFI_PAGE_SHIFT);
+
+    MemoryMap = NEXT_MEMORY_DESCRIPTOR (MemoryMap, Size);
+  }
+
+  return EFI_SUCCESS;
 }
diff --git a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.c b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.c
index 2bdb19c..b877a33 100644
--- a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.c
+++ b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.c
@@ -87,6 +87,8 @@ SMM_CORE_SMI_HANDLERS  mSmmCoreSmiHandlers[] = {
 UINTN                           mFullSmramRangeCount;
 EFI_SMRAM_DESCRIPTOR            *mFullSmramRanges;
 
+EFI_SMM_DRIVER_ENTRY            *mSmmCoreDriverEntry;
+
 EFI_LOADED_IMAGE_PROTOCOL       *mSmmCoreLoadedImage;
 
 /**
@@ -564,6 +566,42 @@ SmmCoreInstallLoadedImage (
                   );
   ASSERT_EFI_ERROR (Status);
 
+  //
+  // Allocate a Loaded Image Protocol in SMM
+  //
+  Status = SmmAllocatePool (EfiRuntimeServicesData, sizeof(EFI_SMM_DRIVER_ENTRY), (VOID **)&mSmmCoreDriverEntry);
+  ASSERT_EFI_ERROR(Status);
+
+  ZeroMem (mSmmCoreDriverEntry, sizeof(EFI_SMM_DRIVER_ENTRY));
+  //
+  // Fill in the remaining fields of the Loaded Image Protocol instance.
+  //
+  mSmmCoreDriverEntry->Signature = EFI_SMM_DRIVER_ENTRY_SIGNATURE;
+  mSmmCoreDriverEntry->SmmLoadedImage.Revision = EFI_LOADED_IMAGE_PROTOCOL_REVISION;
+  mSmmCoreDriverEntry->SmmLoadedImage.ParentHandle = gSmmCorePrivate->SmmIplImageHandle;
+  mSmmCoreDriverEntry->SmmLoadedImage.SystemTable = gST;
+
+  mSmmCoreDriverEntry->SmmLoadedImage.ImageBase = (VOID *)(UINTN)gSmmCorePrivate->PiSmmCoreImageBase;
+  mSmmCoreDriverEntry->SmmLoadedImage.ImageSize = gSmmCorePrivate->PiSmmCoreImageSize;
+  mSmmCoreDriverEntry->SmmLoadedImage.ImageCodeType = EfiRuntimeServicesCode;
+  mSmmCoreDriverEntry->SmmLoadedImage.ImageDataType = EfiRuntimeServicesData;
+
+  mSmmCoreDriverEntry->ImageEntryPoint = gSmmCorePrivate->PiSmmCoreEntryPoint;
+  mSmmCoreDriverEntry->ImageBuffer     = gSmmCorePrivate->PiSmmCoreImageBase;
+  mSmmCoreDriverEntry->NumberOfPage    = EFI_SIZE_TO_PAGES((UINTN)gSmmCorePrivate->PiSmmCoreImageSize);
+
+  //
+  // Create a new image handle in the SMM handle database for the SMM Driver
+  //
+  mSmmCoreDriverEntry->SmmImageHandle = NULL;
+  Status = SmmInstallProtocolInterface (
+             &mSmmCoreDriverEntry->SmmImageHandle,
+             &gEfiLoadedImageProtocolGuid,
+             EFI_NATIVE_INTERFACE,
+             &mSmmCoreDriverEntry->SmmLoadedImage
+             );
+  ASSERT_EFI_ERROR(Status);
+
   return ;
 }
 
@@ -636,5 +674,7 @@ SmmMain (
 
   SmmCoreInstallLoadedImage ();
 
+  SmmCoreInitializeMemoryAttributesTable ();
+
   return EFI_SUCCESS;
 }
diff --git a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.h b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.h
index f46ee72..e2fee54 100644
--- a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.h
+++ b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.h
@@ -110,6 +110,8 @@ typedef struct {
   // Image Page Number
   //
   UINTN                           NumberOfPage;
+  EFI_HANDLE                      SmmImageHandle;
+  EFI_LOADED_IMAGE_PROTOCOL       SmmLoadedImage;
 } EFI_SMM_DRIVER_ENTRY;
 
 #define EFI_HANDLE_SIGNATURE            SIGNATURE_32('h','n','d','l')
@@ -551,6 +553,38 @@ SmmLocateProtocol (
   );
 
 /**
+  Function returns an array of handles that support the requested protocol
+  in a buffer allocated from pool. This is a version of SmmLocateHandle()
+  that allocates a buffer for the caller.
+
+  @param  SearchType             Specifies which handle(s) are to be returned.
+  @param  Protocol               Provides the protocol to search by.    This
+                                 parameter is only valid for SearchType
+                                 ByProtocol.
+  @param  SearchKey              Supplies the search key depending on the
+                                 SearchType.
+  @param  NumberHandles          The number of handles returned in Buffer.
+  @param  Buffer                 A pointer to the buffer to return the requested
+                                 array of  handles that support Protocol.
+
+  @retval EFI_SUCCESS            The result array of handles was returned.
+  @retval EFI_NOT_FOUND          No handles match the search.
+  @retval EFI_OUT_OF_RESOURCES   There is not enough pool memory to store the
+                                 matching results.
+  @retval EFI_INVALID_PARAMETER  One or more paramters are not valid.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmLocateHandleBuffer (
+  IN     EFI_LOCATE_SEARCH_TYPE  SearchType,
+  IN     EFI_GUID                *Protocol OPTIONAL,
+  IN     VOID                    *SearchKey OPTIONAL,
+  IN OUT UINTN                   *NumberHandles,
+  OUT    EFI_HANDLE              **Buffer
+  );
+
+/**
   Manage SMI of a particular type.
 
   @param  HandlerType    Points to the handler type or NULL for root SMI handlers.
@@ -980,9 +1014,66 @@ SmramProfileReadyToLock (
   VOID
   );
 
+/**
+  Initialize MemoryAttributes support.
+**/
+VOID
+EFIAPI
+SmmCoreInitializeMemoryAttributesTable (
+  VOID
+  );
+
+/**
+  This function returns a copy of the current memory map. The map is an array of
+  memory descriptors, each of which describes a contiguous block of memory.
+
+  @param[in, out]  MemoryMapSize          A pointer to the size, in bytes, of the
+                                          MemoryMap buffer. On input, this is the size of
+                                          the buffer allocated by the caller.  On output,
+                                          it is the size of the buffer returned by the
+                                          firmware  if the buffer was large enough, or the
+                                          size of the buffer needed  to contain the map if
+                                          the buffer was too small.
+  @param[in, out]  MemoryMap              A pointer to the buffer in which firmware places
+                                          the current memory map.
+  @param[out]      MapKey                 A pointer to the location in which firmware
+                                          returns the key for the current memory map.
+  @param[out]      DescriptorSize         A pointer to the location in which firmware
+                                          returns the size, in bytes, of an individual
+                                          EFI_MEMORY_DESCRIPTOR.
+  @param[out]      DescriptorVersion      A pointer to the location in which firmware
+                                          returns the version number associated with the
+                                          EFI_MEMORY_DESCRIPTOR.
+
+  @retval EFI_SUCCESS            The memory map was returned in the MemoryMap
+                                 buffer.
+  @retval EFI_BUFFER_TOO_SMALL   The MemoryMap buffer was too small. The current
+                                 buffer size needed to hold the memory map is
+                                 returned in MemoryMapSize.
+  @retval EFI_INVALID_PARAMETER  One of the parameters has an invalid value.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmCoreGetMemoryMap (
+  IN OUT UINTN                  *MemoryMapSize,
+  IN OUT EFI_MEMORY_DESCRIPTOR  *MemoryMap,
+  OUT UINTN                     *MapKey,
+  OUT UINTN                     *DescriptorSize,
+  OUT UINT32                    *DescriptorVersion
+  );
+
+///
+/// For generic EFI machines make the default allocations 4K aligned
+///
+#define EFI_ACPI_RUNTIME_PAGE_ALLOCATION_ALIGNMENT  (EFI_PAGE_SIZE)
+#define DEFAULT_PAGE_ALLOCATION                     (EFI_PAGE_SIZE)
+
 extern UINTN                    mFullSmramRangeCount;
 extern EFI_SMRAM_DESCRIPTOR     *mFullSmramRanges;
 
+extern EFI_SMM_DRIVER_ENTRY       *mSmmCoreDriverEntry;
+
 extern EFI_LOADED_IMAGE_PROTOCOL  *mSmmCoreLoadedImage;
 
 //
diff --git a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf
index 1f73cbb..c256e90 100644
--- a/MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf
+++ b/MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf
@@ -38,6 +38,7 @@
   Smi.c
   InstallConfigurationTable.c
   SmramProfileRecord.c
+  MemoryAttributesTable.c
 
 [Packages]
   MdePkg/MdePkg.dec
@@ -96,6 +97,7 @@
   gEdkiiMemoryProfileGuid
   ## SOMETIMES_PRODUCES   ## GUID # Install protocol
   gEdkiiSmmMemoryProfileGuid
+  gEdkiiPiSmmMemoryAttributesTableGuid          ## SOMETIMES_PRODUCES   ## SystemTable
 
 [UserExtensions.TianoCore."ExtraFiles"]
   PiSmmCoreExtra.uni
diff --git a/MdeModulePkg/Core/PiSmmCore/Pool.c b/MdeModulePkg/Core/PiSmmCore/Pool.c
index 02dab01..dcfd13e 100644
--- a/MdeModulePkg/Core/PiSmmCore/Pool.c
+++ b/MdeModulePkg/Core/PiSmmCore/Pool.c
@@ -86,8 +86,24 @@ SmmInitializeMemoryServices (
   }
   //
   // Initialize free SMRAM regions
+  // Need add Free memory at first, to let gSmmMemoryMap record data
   //
   for (Index = 0; Index < SmramRangeCount; Index++) {
+    if ((SmramRanges[Index].RegionState & (EFI_ALLOCATED | EFI_NEEDS_TESTING | EFI_NEEDS_ECC_INITIALIZATION)) != 0) {
+      continue;
+    }
+    SmmAddMemoryRegion (
+      SmramRanges[Index].CpuStart,
+      SmramRanges[Index].PhysicalSize,
+      EfiConventionalMemory,
+      SmramRanges[Index].RegionState
+      );
+  }
+
+  for (Index = 0; Index < SmramRangeCount; Index++) {
+    if ((SmramRanges[Index].RegionState & (EFI_ALLOCATED | EFI_NEEDS_TESTING | EFI_NEEDS_ECC_INITIALIZATION)) == 0) {
+      continue;
+    }
     SmmAddMemoryRegion (
       SmramRanges[Index].CpuStart,
       SmramRanges[Index].PhysicalSize,
-- 
2.7.4.windows.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
  2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
                   ` (2 preceding siblings ...)
  2016-11-04  9:30 ` [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support Jiewen Yao
@ 2016-11-04  9:30 ` Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection Jiewen Yao
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04  9:30 UTC (permalink / raw)
  To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek

If enabled, SMM will not use on-demand paging.
SMM will build static page table for all memory.

The page table size depend on 2 things:
1) The 1G paging capability.
2) The whole system memory/MMIO addressing capability.

A) If the system only supports 2M paging,
When the whole memory/MMIO is 32bit, we only need 1+1+4=6 pages for 4G.
When the whole memory/MMIO is 39bit, we need 1+1+256 pages (~ 1M)
When the whole memory/MMIO is 48bit, we need 1+256+256*256 pages (~ 257M)

B) If the system supports 1G paging.
When the whole memory/MMIO is 32bit, we only need 1+1+4=6 pages for 4G.
(We still generate 2M page for maintenance consideration.)
When the whole memory/MMIO is 39bit, we still need 6 pages.
(We setup 1G paging for >1G.)
When the whole memory/MMIO is 48bit, we need 1+256 pages (~ 1M).

Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
 UefiCpuPkg/UefiCpuPkg.dec | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/UefiCpuPkg/UefiCpuPkg.dec b/UefiCpuPkg/UefiCpuPkg.dec
index 8674533..a110820 100644
--- a/UefiCpuPkg/UefiCpuPkg.dec
+++ b/UefiCpuPkg/UefiCpuPkg.dec
@@ -199,6 +199,14 @@
   # @Prompt The specified AP target C-state for Mwait.
   gUefiCpuPkgTokenSpaceGuid.PcdCpuApTargetCstate|0|UINT8|0x00000007
 
+  ## Indicates if SMM uses static page table.
+  #  If enabled, SMM will not use on-demand paging. SMM will build static page table for all memory.<BR><BR>
+  #  This flag only impacts X64 build, because SMM alway builds static page table for IA32.
+  #   TRUE  - SMM uses static page table for all memory.<BR>
+  #   FALSE - SMM uses static page table for below 4G memory and use on-demand paging for above 4G memory.<BR>
+  # @Prompt Use static page table for all memory in SMM.
+  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStaticPageTable|TRUE|BOOLEAN|0x3213210D
+
 [PcdsDynamic, PcdsDynamicEx]
   ## Contains the pointer to a CPU S3 data buffer of structure ACPI_CPU_DATA.
   # @Prompt The pointer to a CPU S3 data buffer.
-- 
2.7.4.windows.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
  2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
                   ` (3 preceding siblings ...)
  2016-11-04  9:30 ` [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable Jiewen Yao
@ 2016-11-04  9:30 ` Jiewen Yao
  2016-11-04  9:30 ` [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm " Jiewen Yao
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04  9:30 UTC (permalink / raw)
  To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek

PiSmmCpuDxeSmm consumes SmmAttributesTable and setup page table:
1) Code region is marked as read-only and Data region is non-executable,
if the PE image is 4K aligned.
2) Important data structure is set to RO, such as GDT/IDT.
3) SmmSaveState is set to non-executable,
and SmmEntrypoint is set to read-only.
4) If static page is supported, page table is read-only.

We use page table to protect other components, and itself.

If we use dynamic paging, we can still provide *partial* protection.
And hope page table is not modified by other components.

The XD enabling code is moved to SmiEntry to let NX take effect.

Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c           |  71 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S          |  67 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm        |  68 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm       |  70 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S      | 226 +----
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm    |  36 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm   |  36 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c      |  37 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c    |   4 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c              | 127 ++-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c         | 142 +++-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h         | 156 +++-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf       |   5 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c | 871 ++++++++++++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c             |  39 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h             |  15 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c            | 274 +++++-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S           |  51 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm         |  54 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm        |  61 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S       | 250 +-----
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm     |  35 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm    |  31 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c       |  30 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c     |   7 +-
 25 files changed, 1988 insertions(+), 775 deletions(-)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c
index a871bef..65f09e5 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c
@@ -58,7 +58,7 @@ SmmInitPageTable (
   if (FeaturePcdGet (PcdCpuSmmStackGuard)) {
     InitializeIDTSmmStackGuard ();
   }
-  return Gen4GPageTable (0, TRUE);
+  return Gen4GPageTable (TRUE);
 }
 
 /**
@@ -99,7 +99,7 @@ SmiPFHandler (
   if ((FeaturePcdGet (PcdCpuSmmStackGuard)) &&
       (PFAddress >= mCpuHotPlugData.SmrrBase) &&
       (PFAddress < (mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize))) {
-    DEBUG ((EFI_D_ERROR, "SMM stack overflow!\n"));
+    DEBUG ((DEBUG_ERROR, "SMM stack overflow!\n"));
     CpuDeadLoop ();
   }
 
@@ -109,7 +109,7 @@ SmiPFHandler (
   if ((PFAddress < mCpuHotPlugData.SmrrBase) ||
       (PFAddress >= mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize)) {
     if ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0) {
-      DEBUG ((EFI_D_ERROR, "Code executed on IP(0x%x) out of SMM range after SMM is locked!\n", PFAddress));
+      DEBUG ((DEBUG_ERROR, "Code executed on IP(0x%x) out of SMM range after SMM is locked!\n", PFAddress));
       DEBUG_CODE (
         DumpModuleInfoByIp (*(UINTN *)(UINTN)SystemContext.SystemContextIa32->Esp);
       );
@@ -128,3 +128,68 @@ SmiPFHandler (
 
   ReleaseSpinLock (mPFLock);
 }
+
+/**
+  This function sets memory attribute for page table.
+**/
+VOID
+SetPageTableAttributes (
+  VOID
+  )
+{
+  UINTN                 Index2;
+  UINTN                 Index3;
+  UINT64                *L1PageTable;
+  UINT64                *L2PageTable;
+  UINT64                *L3PageTable;
+  BOOLEAN               IsSplitted;
+  BOOLEAN               PageTableSplitted;
+
+  DEBUG ((DEBUG_INFO, "SetPageTableAttributes\n"));
+
+  //
+  // Disable write protection, because we need mark page table to be write protected.
+  // We need *write* page table memory, to mark itself to be *read only*.
+  //
+  AsmWriteCr0 (AsmReadCr0() & ~CR0_WP);
+
+  do {
+    DEBUG ((DEBUG_INFO, "Start...\n"));
+    PageTableSplitted = FALSE;
+
+    L3PageTable = (UINT64 *)GetPageTableBase ();
+
+    SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L3PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+    PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+    for (Index3 = 0; Index3 < 4; Index3++) {
+      L2PageTable = (UINT64 *)(UINTN)(L3PageTable[Index3] & PAGING_4K_ADDRESS_MASK_64);
+      if (L2PageTable == NULL) {
+        continue;
+      }
+
+      SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L2PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+      PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+      for (Index2 = 0; Index2 < SIZE_4KB/sizeof(UINT64); Index2++) {
+        if ((L2PageTable[Index2] & IA32_PG_PS) != 0) {
+          // 2M
+          continue;
+        }
+        L1PageTable = (UINT64 *)(UINTN)(L2PageTable[Index2] & PAGING_4K_ADDRESS_MASK_64);
+        if (L1PageTable == NULL) {
+          continue;
+        }
+        SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L1PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+        PageTableSplitted = (PageTableSplitted || IsSplitted);
+      }
+    }
+  } while (PageTableSplitted);
+
+  //
+  // Enable write protection, after page table updated.
+  //
+  AsmWriteCr0 (AsmReadCr0() | CR0_WP);
+
+  return ;
+}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S
index ec5b9a0..93f11e2 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S
@@ -1,6 +1,6 @@
 #------------------------------------------------------------------------------
 #
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 # This program and the accompanying materials
 # are licensed and made available under the terms and conditions of the BSD License
 # which accompanies this distribution.  The full text of the license may be found at
@@ -24,9 +24,13 @@ ASM_GLOBAL  ASM_PFX(gcSmiHandlerSize)
 ASM_GLOBAL  ASM_PFX(gSmiCr3)
 ASM_GLOBAL  ASM_PFX(gSmiStack)
 ASM_GLOBAL  ASM_PFX(gSmbase)
+ASM_GLOBAL  ASM_PFX(mXdSupported)
 ASM_GLOBAL  ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))
 ASM_GLOBAL  ASM_PFX(gSmiHandlerIdtr)
 
+.equ            MSR_EFER, 0xc0000080
+.equ            MSR_EFER_XD, 0x800
+
 .equ            DSC_OFFSET, 0xfb00
 .equ            DSC_GDTPTR, 0x30
 .equ            DSC_GDTSIZ, 0x38
@@ -122,8 +126,39 @@ L11:
     orl     $BIT10, %eax
 L12:                                       # as cr4.PGE is not set here, refresh cr3
     movl    %eax, %cr4                     # in PreModifyMtrrs() to flush TLB.
+
+    cmpb    $0, ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))
+    jz      L5
+# Load TSS
+    movb    $0x89, (TSS_SEGMENT + 5)(%ebp) # clear busy flag
+    movl    $TSS_SEGMENT, %eax
+    ltrw    %ax
+L5:
+
+# enable NXE if supported
+    .byte   0xb0                           # mov al, imm8
+ASM_PFX(mXdSupported): .byte 1
+    cmpb    $0, %al
+    jz      L14
+#
+# Check XD disable bit
+#
+    movl    $MSR_IA32_MISC_ENABLE, %ecx
+    rdmsr
+    pushl   %edx                           # save MSR_IA32_MISC_ENABLE[63-32]
+    testl   $BIT2, %edx                    # MSR_IA32_MISC_ENABLE[34]
+    jz      L13
+    andw    $0x0FFFB, %dx                  # clear XD Disable bit if it is set
+    wrmsr
+L13:
+    movl    $MSR_EFER, %ecx
+    rdmsr
+    orw     $MSR_EFER_XD,%ax               # enable NXE
+    wrmsr
+L14:
+
     movl    %cr0, %ebx
-    orl     $0x080010000, %ebx             # enable paging + WP
+    orl     $0x080010023, %ebx             # enable paging + WP + NE + MP + PE
     movl    %ebx, %cr0
     leal    DSC_OFFSET(%edi),%ebx
     movw    DSC_DS(%ebx),%ax
@@ -135,35 +170,35 @@ L12:                                       # as cr4.PGE is not set here, refresh
     movw    DSC_SS(%ebx),%ax
     movl    %eax, %ss
 
-    cmpb    $0, ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))
-    jz      L5
-
-# Load TSS
-    movb    $0x89, (TSS_SEGMENT + 5)(%ebp) # clear busy flag
-    movl    $TSS_SEGMENT, %eax
-    ltrw    %ax
-L5:
-
 #   jmp     _SmiHandler                 # instruction is not needed
 
 _SmiHandler:
-    movl    (%esp), %ebx
+    movl    4(%esp), %ebx
 
     pushl   %ebx
     movl    $ASM_PFX(CpuSmmDebugEntry), %eax
     call    *%eax
-    popl    %ecx
-    
+    addl    $4, %esp
+
     pushl   %ebx
     movl    $ASM_PFX(SmiRendezvous), %eax
     call    *%eax
-    popl    %ecx
+    addl    $4, %esp
 
     pushl   %ebx
     movl    $ASM_PFX(CpuSmmDebugExit), %eax
     call    *%eax
-    popl    %ecx
+    addl    $4, %esp
+
+    popl    %edx                        # get saved MSR_IA32_MISC_ENABLE[63-32]
+    testl   $BIT2, %edx
+    jz      L16
+    movl    $MSR_IA32_MISC_ENABLE, %ecx
+    rdmsr
+    orw     $BIT2, %dx                  # set XD Disable bit if it was set before entering into SMM
+    wrmsr
 
+L16:
     rsm
 
 ASM_PFX(gcSmiHandlerSize):    .word      . - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm
index ac1a9b4..1e5db55 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm
@@ -1,5 +1,5 @@
 ;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 ; This program and the accompanying materials
 ; are licensed and made available under the terms and conditions of the BSD License
 ; which accompanies this distribution.  The full text of the license may be found at
@@ -22,6 +22,10 @@
     .model  flat,C
     .xmm
 
+MSR_IA32_MISC_ENABLE  EQU     1A0h
+MSR_EFER      EQU     0c0000080h
+MSR_EFER_XD   EQU     0800h
+
 DSC_OFFSET    EQU     0fb00h
 DSC_GDTPTR    EQU     30h
 DSC_GDTSIZ    EQU     38h
@@ -43,6 +47,7 @@ EXTERNDEF   gcSmiHandlerSize:WORD
 EXTERNDEF   gSmiCr3:DWORD
 EXTERNDEF   gSmiStack:DWORD
 EXTERNDEF   gSmbase:DWORD
+EXTERNDEF   mXdSupported:BYTE
 EXTERNDEF   FeaturePcdGet (PcdCpuSmmStackGuard):BYTE
 EXTERNDEF   gSmiHandlerIdtr:FWORD
 
@@ -128,8 +133,39 @@ gSmiCr3     DD      ?
     or      eax, BIT10
 @@:                                     ; as cr4.PGE is not set here, refresh cr3
     mov     cr4, eax                    ; in PreModifyMtrrs() to flush TLB.
+
+    cmp     FeaturePcdGet (PcdCpuSmmStackGuard), 0
+    jz      @F
+; Load TSS
+    mov     byte ptr [ebp + TSS_SEGMENT + 5], 89h ; clear busy flag
+    mov     eax, TSS_SEGMENT
+    ltr     ax
+@@:
+
+; enable NXE if supported
+    DB      0b0h                        ; mov al, imm8
+mXdSupported     DB      1
+    cmp     al, 0
+    jz      @SkipXd
+;
+; Check XD disable bit
+;
+    mov     ecx, MSR_IA32_MISC_ENABLE
+    rdmsr
+    push    edx                        ; save MSR_IA32_MISC_ENABLE[63-32]
+    test    edx, BIT2                  ; MSR_IA32_MISC_ENABLE[34]
+    jz      @f
+    and     dx, 0FFFBh                 ; clear XD Disable bit if it is set
+    wrmsr
+@@:
+    mov     ecx, MSR_EFER
+    rdmsr
+    or      ax, MSR_EFER_XD             ; enable NXE
+    wrmsr
+@SkipXd:
+
     mov     ebx, cr0
-    or      ebx, 080010000h             ; enable paging + WP
+    or      ebx, 080010023h             ; enable paging + WP + NE + MP + PE
     mov     cr0, ebx
     lea     ebx, [edi + DSC_OFFSET]
     mov     ax, [ebx + DSC_DS]
@@ -141,34 +177,34 @@ gSmiCr3     DD      ?
     mov     ax, [ebx + DSC_SS]
     mov     ss, eax
 
-    cmp     FeaturePcdGet (PcdCpuSmmStackGuard), 0
-    jz      @F
-
-; Load TSS
-    mov     byte ptr [ebp + TSS_SEGMENT + 5], 89h ; clear busy flag
-    mov     eax, TSS_SEGMENT
-    ltr     ax
-@@:
 ;   jmp     _SmiHandler                 ; instruction is not needed
 
 _SmiHandler PROC
-    mov     ebx, [esp]                  ; CPU Index
-
+    mov     ebx, [esp + 4]                  ; CPU Index
     push    ebx
     mov     eax, CpuSmmDebugEntry
     call    eax
-    pop     ecx
+    add     esp, 4
 
     push    ebx
     mov     eax, SmiRendezvous
     call    eax
-    pop     ecx
-    
+    add     esp, 4
+
     push    ebx
     mov     eax, CpuSmmDebugExit
     call    eax
-    pop     ecx
+    add     esp, 4
 
+    pop     edx                       ; get saved MSR_IA32_MISC_ENABLE[63-32]
+    test    edx, BIT2
+    jz      @f
+    mov     ecx, MSR_IA32_MISC_ENABLE
+    rdmsr
+    or      dx, BIT2                  ; set XD Disable bit if it was set before entering into SMM
+    wrmsr
+
+@@:
     rsm
 _SmiHandler ENDP
 
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm
index 4fb0c13..2d81dde 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm
@@ -18,6 +18,10 @@
 ;
 ;-------------------------------------------------------------------------------
 
+%define MSR_IA32_MISC_ENABLE 0x1A0
+%define MSR_EFER      0xc0000080
+%define MSR_EFER_XD   0x800
+
 %define DSC_OFFSET 0xfb00
 %define DSC_GDTPTR 0x30
 %define DSC_GDTSIZ 0x38
@@ -40,6 +44,7 @@ global ASM_PFX(gcSmiHandlerSize)
 global ASM_PFX(gSmiCr3)
 global ASM_PFX(gSmiStack)
 global ASM_PFX(gSmbase)
+global ASM_PFX(mXdSupported)
 extern ASM_PFX(gSmiHandlerIdtr)
 
     SECTION .text
@@ -56,7 +61,7 @@ _SmiEntryPoint:
     mov     ebp, eax                      ; ebp = GDT base
 o32 lgdt    [cs:bx]                       ; lgdt fword ptr cs:[bx]
     mov     ax, PROTECT_MODE_CS
-    mov     [cs:bx-0x2],ax    
+    mov     [cs:bx-0x2],ax
     DB      0x66, 0xbf                   ; mov edi, SMBASE
 ASM_PFX(gSmbase): DD 0
     lea     eax, [edi + (@32bit - _SmiEntryPoint) + 0x8000]
@@ -66,7 +71,7 @@ ASM_PFX(gSmbase): DD 0
     or      ebx, 0x23
     mov     cr0, ebx
     jmp     dword 0x0:0x0
-_GdtDesc:   
+_GdtDesc:
     DW 0
     DD 0
 
@@ -115,8 +120,39 @@ ASM_PFX(gSmiCr3): DD 0
     or      eax, BIT10
 .4:                                     ; as cr4.PGE is not set here, refresh cr3
     mov     cr4, eax                    ; in PreModifyMtrrs() to flush TLB.
+
+    cmp     byte [dword ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))], 0
+    jz      .6
+; Load TSS
+    mov     byte [ebp + TSS_SEGMENT + 5], 0x89 ; clear busy flag
+    mov     eax, TSS_SEGMENT
+    ltr     ax
+.6:
+
+; enable NXE if supported
+    DB      0b0h                        ; mov al, imm8
+ASM_PFX(mXdSupported):     DB      1
+    cmp     al, 0
+    jz      @SkipXd
+;
+; Check XD disable bit
+;
+    mov     ecx, MSR_IA32_MISC_ENABLE
+    rdmsr
+    push    edx                        ; save MSR_IA32_MISC_ENABLE[63-32]
+    test    edx, BIT2                  ; MSR_IA32_MISC_ENABLE[34]
+    jz      .5
+    and     dx, 0xFFFB                 ; clear XD Disable bit if it is set
+    wrmsr
+.5:
+    mov     ecx, MSR_EFER
+    rdmsr
+    or      ax, MSR_EFER_XD             ; enable NXE
+    wrmsr
+@SkipXd:
+
     mov     ebx, cr0
-    or      ebx, 0x080010000            ; enable paging + WP
+    or      ebx, 0x80010023             ; enable paging + WP + NE + MP + PE
     mov     cr0, ebx
     lea     ebx, [edi + DSC_OFFSET]
     mov     ax, [ebx + DSC_DS]
@@ -128,35 +164,35 @@ ASM_PFX(gSmiCr3): DD 0
     mov     ax, [ebx + DSC_SS]
     mov     ss, eax
 
-    cmp     byte [dword ASM_PFX(FeaturePcdGet (PcdCpuSmmStackGuard))], 0
-    jz      .5
-
-; Load TSS
-    mov     byte [ebp + TSS_SEGMENT + 5], 0x89 ; clear busy flag
-    mov     eax, TSS_SEGMENT
-    ltr     ax
-.5:
 ;   jmp     _SmiHandler                 ; instruction is not needed
 
 global ASM_PFX(SmiHandler)
 ASM_PFX(SmiHandler):
-    mov     ebx, [esp]                  ; CPU Index
-
+    mov     ebx, [esp + 4]                  ; CPU Index
     push    ebx
     mov     eax, ASM_PFX(CpuSmmDebugEntry)
     call    eax
-    pop     ecx
+    add     esp, 4
 
     push    ebx
     mov     eax, ASM_PFX(SmiRendezvous)
     call    eax
-    pop     ecx
-    
+    add     esp, 4
+
     push    ebx
     mov     eax, ASM_PFX(CpuSmmDebugExit)
     call    eax
-    pop     ecx
+    add     esp, 4
+
+    pop     edx                       ; get saved MSR_IA32_MISC_ENABLE[63-32]
+    test    edx, BIT2
+    jz      .7
+    mov     ecx, MSR_IA32_MISC_ENABLE
+    rdmsr
+    or      dx, BIT2                  ; set XD Disable bit if it was set before entering into SMM
+    wrmsr
 
+.7:
     rsm
 
 ASM_PFX(gcSmiHandlerSize): DW $ - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S
index 4130bf5..cf5ef82 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S
@@ -1,6 +1,6 @@
 #------------------------------------------------------------------------------
 #
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 # This program and the accompanying materials
 # are licensed and made available under the terms and conditions of the BSD License
 # which accompanies this distribution.  The full text of the license may be found at
@@ -24,6 +24,7 @@ ASM_GLOBAL  ASM_PFX(PageFaultStubFunction)
 ASM_GLOBAL  ASM_PFX(gSmiMtrrs)
 ASM_GLOBAL  ASM_PFX(gcSmiIdtr)
 ASM_GLOBAL  ASM_PFX(gcSmiGdtr)
+ASM_GLOBAL  ASM_PFX(gTaskGateDescriptor)
 ASM_GLOBAL  ASM_PFX(gcPsd)
 ASM_GLOBAL  ASM_PFX(FeaturePcdGet (PcdCpuSmmProfileEnable))
 
@@ -236,207 +237,10 @@ ASM_PFX(gcPsd):
 ASM_PFX(gcSmiGdtr):  .word      GDT_SIZE - 1
                      .long      NullSeg
 
-ASM_PFX(gcSmiIdtr):  .word      IDT_SIZE - 1
-                     .long      _SmiIDT
-
-_SmiIDT:
-# The following segment repeats 32 times:
-# No. 1
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 2
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 3
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 4
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 5
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 6
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 7
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 8
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 9
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 10
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 11
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 12
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 13
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 14
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 15
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 16
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 17
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 18
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 19
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 20
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 21
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 22
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 23
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 24
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 25
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 26
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 27
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 28
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 29
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 30
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 31
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-# No. 32
-    .word 0                             # Offset 0:15
-    .word      CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-
-.equ  IDT_SIZE, . - _SmiIDT
-
-TaskGateDescriptor:
+ASM_PFX(gcSmiIdtr):  .word      0
+                     .long      0
+
+ASM_PFX(gTaskGateDescriptor):
     .word      0                        # Reserved
     .word      EXCEPTION_TSS_SEL        # TSS Segment selector
     .byte      0                        # Reserved
@@ -891,21 +695,3 @@ ASM_PFX(PageFaultStubFunction):
 #
     clts
     iret
-
-ASM_GLOBAL ASM_PFX(InitializeIDTSmmStackGuard)
-ASM_PFX(InitializeIDTSmmStackGuard):
-    pushl   %ebx
-#
-# If SMM Stack Guard feature is enabled, the Page Fault Exception entry in IDT
-# is a Task Gate Descriptor so that when a Page Fault Exception occurs,
-# the processors can use a known good stack in case stack ran out.
-#
-    leal    _SmiIDT + 14 * 8, %ebx
-    leal    TaskGateDescriptor, %edx
-    movl    (%edx), %eax
-    movl    %eax, (%ebx)
-    movl    4(%edx), %eax
-    movl    %eax, 4(%ebx)
-
-    popl    %ebx
-    ret
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm
index b4eb492..7b162f8 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm
@@ -1,5 +1,5 @@
 ;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 ; This program and the accompanying materials
 ; are licensed and made available under the terms and conditions of the BSD License
 ; which accompanies this distribution.  The full text of the license may be found at
@@ -26,6 +26,7 @@ EXTERNDEF   PageFaultStubFunction:PROC
 EXTERNDEF   gSmiMtrrs:QWORD
 EXTERNDEF   gcSmiIdtr:FWORD
 EXTERNDEF   gcSmiGdtr:FWORD
+EXTERNDEF   gTaskGateDescriptor:QWORD
 EXTERNDEF   gcPsd:BYTE
 EXTERNDEF   FeaturePcdGet (PcdCpuSmmProfileEnable):BYTE
 
@@ -252,20 +253,10 @@ gcSmiGdtr   LABEL   FWORD
     DD      offset NullSeg
 
 gcSmiIdtr   LABEL   FWORD
-    DW      IDT_SIZE - 1
-    DD      offset _SmiIDT
-
-_SmiIDT     LABEL   QWORD
-REPEAT      32
-    DW      0                           ; Offset 0:15
-    DW      CODE_SEL                    ; Segment selector
-    DB      0                           ; Unused
-    DB      8eh                         ; Interrupt Gate, Present
-    DW      0                           ; Offset 16:31
-            ENDM
-IDT_SIZE = $ - offset _SmiIDT
-
-TaskGateDescriptor LABEL DWORD
+    DW      0
+    DD      0
+
+gTaskGateDescriptor LABEL QWORD
     DW      0                           ; Reserved
     DW      EXCEPTION_TSS_SEL           ; TSS Segment selector
     DB      0                           ; Reserved
@@ -720,19 +711,4 @@ PageFaultStubFunction   PROC
     iretd
 PageFaultStubFunction   ENDP
 
-InitializeIDTSmmStackGuard   PROC    USES    ebx
-;
-; If SMM Stack Guard feature is enabled, the Page Fault Exception entry in IDT
-; is a Task Gate Descriptor so that when a Page Fault Exception occurs,
-; the processors can use a known good stack in case stack is ran out.
-;
-    lea     ebx, _SmiIDT + 14 * 8
-    lea     edx, TaskGateDescriptor
-    mov     eax, [edx]
-    mov     [ebx], eax
-    mov     eax, [edx + 4]
-    mov     [ebx + 4], eax
-    ret
-InitializeIDTSmmStackGuard   ENDP
-
     END
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm
index 6a32828..4d58999 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm
@@ -1,5 +1,5 @@
 ;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 ; This program and the accompanying materials
 ; are licensed and made available under the terms and conditions of the BSD License
 ; which accompanies this distribution.  The full text of the license may be found at
@@ -24,6 +24,7 @@ extern  ASM_PFX(SmiPFHandler)
 
 global  ASM_PFX(gcSmiIdtr)
 global  ASM_PFX(gcSmiGdtr)
+global  ASM_PFX(gTaskGateDescriptor)
 global  ASM_PFX(gcPsd)
 
     SECTION .data
@@ -250,21 +251,10 @@ ASM_PFX(gcSmiGdtr):
     DD      NullSeg
 
 ASM_PFX(gcSmiIdtr):
-    DW      IDT_SIZE - 1
-    DD      _SmiIDT
+    DW      0
+    DD      0
 
-_SmiIDT:
-%rep 32
-    DW      0                           ; Offset 0:15
-    DW      CODE_SEL                    ; Segment selector
-    DB      0                           ; Unused
-    DB      0x8e                         ; Interrupt Gate, Present
-    DW      0                           ; Offset 16:31
-%endrep
-
-IDT_SIZE equ $ - _SmiIDT
-
-TaskGateDescriptor:
+ASM_PFX(gTaskGateDescriptor):
     DW      0                           ; Reserved
     DW      EXCEPTION_TSS_SEL           ; TSS Segment selector
     DB      0                           ; Reserved
@@ -717,19 +707,3 @@ ASM_PFX(PageFaultStubFunction):
     clts
     iretd
 
-global ASM_PFX(InitializeIDTSmmStackGuard)
-ASM_PFX(InitializeIDTSmmStackGuard):
-    push    ebx
-;
-; If SMM Stack Guard feature is enabled, the Page Fault Exception entry in IDT
-; is a Task Gate Descriptor so that when a Page Fault Exception occurrs,
-; the processors can use a known good stack in case stack is ran out.
-;
-    lea     ebx, [_SmiIDT + 14 * 8]
-    lea     edx, [TaskGateDescriptor]
-    mov     eax, [edx]
-    mov     [ebx], eax
-    mov     eax, [edx + 4]
-    mov     [ebx + 4], eax
-    pop     ebx
-    ret
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
index 545b534..e87bf7b 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
@@ -1,7 +1,7 @@
 /** @file
   SMM CPU misc functions for Ia32 arch specific.
   
-Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2015 - 2016, Intel Corporation. All rights reserved.<BR>
 This program and the accompanying materials
 are licensed and made available under the terms and conditions of the BSD License
 which accompanies this distribution.  The full text of the license may be found at
@@ -14,6 +14,33 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 
 #include "PiSmmCpuDxeSmm.h"
 
+extern UINT64 gTaskGateDescriptor;
+
+EFI_PHYSICAL_ADDRESS                mGdtBuffer;
+UINTN                               mGdtBufferSize;
+
+/**
+  Initialize IDT for SMM Stack Guard.
+
+**/
+VOID
+EFIAPI
+InitializeIDTSmmStackGuard (
+  VOID
+  )
+{
+  IA32_IDT_GATE_DESCRIPTOR  *IdtGate;
+
+  //
+  // If SMM Stack Guard feature is enabled, the Page Fault Exception entry in IDT
+  // is a Task Gate Descriptor so that when a Page Fault Exception occurs,
+  // the processors can use a known good stack in case stack is ran out.
+  //
+  IdtGate = (IA32_IDT_GATE_DESCRIPTOR *)gcSmiIdtr.Base;
+  IdtGate += EXCEPT_IA32_PAGE_FAULT;
+  IdtGate->Uint64 = gTaskGateDescriptor;
+}
+
 /**
   Initialize Gdt for all processors.
   
@@ -49,8 +76,10 @@ InitGdt (
     gcSmiGdtr.Limit += (UINT16)(2 * sizeof (IA32_SEGMENT_DESCRIPTOR));
 
     GdtTssTableSize = (gcSmiGdtr.Limit + 1 + TSS_SIZE * 2 + 7) & ~7; // 8 bytes aligned
-    GdtTssTables = (UINT8*)AllocatePages (EFI_SIZE_TO_PAGES (GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus));
+    mGdtBufferSize = GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus;
+    GdtTssTables = (UINT8*)AllocateCodePages (EFI_SIZE_TO_PAGES (mGdtBufferSize));
     ASSERT (GdtTssTables != NULL);
+    mGdtBuffer = (UINTN)GdtTssTables;
     GdtTableStepSize = GdtTssTableSize;
 
     for (Index = 0; Index < gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus; Index++) {
@@ -82,8 +111,10 @@ InitGdt (
     // Just use original table, AllocatePage and copy them here to make sure GDTs are covered in page memory.
     //
     GdtTssTableSize = gcSmiGdtr.Limit + 1;
-    GdtTssTables = (UINT8*)AllocatePages (EFI_SIZE_TO_PAGES (GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus));
+    mGdtBufferSize = GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus;
+    GdtTssTables = (UINT8*)AllocateCodePages (EFI_SIZE_TO_PAGES (mGdtBufferSize));
     ASSERT (GdtTssTables != NULL);
+    mGdtBuffer = (UINTN)GdtTssTables;
     GdtTableStepSize = GdtTssTableSize;
 
     for (Index = 0; Index < gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus; Index++) {
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c
index 767cb69..724cd92 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c
@@ -1,7 +1,7 @@
 /** @file
 IA-32 processor specific functions to enable SMM profile.
 
-Copyright (c) 2012 - 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2012 - 2016, Intel Corporation. All rights reserved.<BR>
 This program and the accompanying materials
 are licensed and made available under the terms and conditions of the BSD License
 which accompanies this distribution.  The full text of the license may be found at
@@ -24,7 +24,7 @@ InitSmmS3Cr3 (
   VOID
   )
 {
-  mSmmS3ResumeState->SmmS3Cr3 = Gen4GPageTable (0, TRUE);
+  mSmmS3ResumeState->SmmS3Cr3 = Gen4GPageTable (TRUE);
 
   return ;
 }
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
index 12466ef..d0092d2 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
@@ -734,14 +734,12 @@ APHandler (
 /**
   Create 4G PageTable in SMRAM.
 
-  @param          ExtraPages       Additional page numbers besides for 4G memory
-  @param          Is32BitPageTable Whether the page table is 32-bit PAE
+  @param[in]      Is32BitPageTable Whether the page table is 32-bit PAE
   @return         PageTable Address
 
 **/
 UINT32
 Gen4GPageTable (
-  IN      UINTN                     ExtraPages,
   IN      BOOLEAN                   Is32BitPageTable
   )
 {
@@ -775,10 +773,10 @@ Gen4GPageTable (
   //
   // Allocate the page table
   //
-  PageTable = AllocatePageTableMemory (ExtraPages + 5 + PagesNeeded);
+  PageTable = AllocatePageTableMemory (5 + PagesNeeded);
   ASSERT (PageTable != NULL);
 
-  PageTable = (VOID *)((UINTN)PageTable + EFI_PAGES_TO_SIZE (ExtraPages));
+  PageTable = (VOID *)((UINTN)PageTable);
   Pte = (UINT64*)PageTable;
 
   //
@@ -903,13 +901,13 @@ SetCacheability (
   PageTable[PTIndex] |= (UINT64)Cacheability;
 }
 
-
 /**
   Schedule a procedure to run on the specified CPU.
 
-  @param  Procedure                The address of the procedure to run
-  @param  CpuIndex                 Target CPU Index
-  @param  ProcArguments            The parameter to pass to the procedure
+  @param[in]       Procedure                The address of the procedure to run
+  @param[in]       CpuIndex                 Target CPU Index
+  @param[in, OUT]  ProcArguments            The parameter to pass to the procedure
+  @param[in]       BlockingMode             Startup AP in blocking mode or not
 
   @retval EFI_INVALID_PARAMETER    CpuNumber not valid
   @retval EFI_INVALID_PARAMETER    CpuNumber specifying BSP
@@ -919,26 +917,44 @@ SetCacheability (
 
 **/
 EFI_STATUS
-EFIAPI
-SmmStartupThisAp (
+InternalSmmStartupThisAp (
   IN      EFI_AP_PROCEDURE          Procedure,
   IN      UINTN                     CpuIndex,
-  IN OUT  VOID                      *ProcArguments OPTIONAL
+  IN OUT  VOID                      *ProcArguments OPTIONAL,
+  IN      BOOLEAN                   BlockingMode
   )
 {
-  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
-      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
-      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
-      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove ||
-      !AcquireSpinLockOrFail (mSmmMpSyncData->CpuData[CpuIndex].Busy)) {
+  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus) {
+    DEBUG((DEBUG_ERROR, "CpuIndex(%d) >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus(%d)\n", CpuIndex, gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus));
+    return EFI_INVALID_PARAMETER;
+  }
+  if (CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu) {
+    DEBUG((DEBUG_ERROR, "CpuIndex(%d) == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu\n", CpuIndex));
     return EFI_INVALID_PARAMETER;
   }
+  if (!(*(mSmmMpSyncData->CpuData[CpuIndex].Present))) {
+    DEBUG((DEBUG_ERROR, "!mSmmMpSyncData->CpuData[%d].Present\n", CpuIndex));
+    return EFI_INVALID_PARAMETER;
+  }
+  if (gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
+    DEBUG((DEBUG_ERROR, "gSmmCpuPrivate->Operation[%d] == SmmCpuRemove\n", CpuIndex));
+    return EFI_INVALID_PARAMETER;
+  }
+
+  if (BlockingMode) {
+    AcquireSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
+  } else {
+    if (!AcquireSpinLockOrFail (mSmmMpSyncData->CpuData[CpuIndex].Busy)) {
+      DEBUG((DEBUG_ERROR, "mSmmMpSyncData->CpuData[%d].Busy\n", CpuIndex));
+      return EFI_INVALID_PARAMETER;
+    }
+  }
 
   mSmmMpSyncData->CpuData[CpuIndex].Procedure = Procedure;
   mSmmMpSyncData->CpuData[CpuIndex].Parameter = ProcArguments;
   ReleaseSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);
 
-  if (FeaturePcdGet (PcdCpuSmmBlockStartupThisAp)) {
+  if (BlockingMode) {
     AcquireSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
     ReleaseSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
   }
@@ -946,6 +962,56 @@ SmmStartupThisAp (
 }
 
 /**
+  Schedule a procedure to run on the specified CPU in blocking mode.
+
+  @param[in]       Procedure                The address of the procedure to run
+  @param[in]       CpuIndex                 Target CPU Index
+  @param[in, out]  ProcArguments            The parameter to pass to the procedure
+
+  @retval EFI_INVALID_PARAMETER    CpuNumber not valid
+  @retval EFI_INVALID_PARAMETER    CpuNumber specifying BSP
+  @retval EFI_INVALID_PARAMETER    The AP specified by CpuNumber did not enter SMM
+  @retval EFI_INVALID_PARAMETER    The AP specified by CpuNumber is busy
+  @retval EFI_SUCCESS              The procedure has been successfully scheduled
+
+**/
+EFI_STATUS
+EFIAPI
+SmmBlockingStartupThisAp (
+  IN      EFI_AP_PROCEDURE          Procedure,
+  IN      UINTN                     CpuIndex,
+  IN OUT  VOID                      *ProcArguments OPTIONAL
+  )
+{
+  return InternalSmmStartupThisAp(Procedure, CpuIndex, ProcArguments, TRUE);
+}
+
+/**
+  Schedule a procedure to run on the specified CPU.
+
+  @param  Procedure                The address of the procedure to run
+  @param  CpuIndex                 Target CPU Index
+  @param  ProcArguments            The parameter to pass to the procedure
+
+  @retval EFI_INVALID_PARAMETER    CpuNumber not valid
+  @retval EFI_INVALID_PARAMETER    CpuNumber specifying BSP
+  @retval EFI_INVALID_PARAMETER    The AP specified by CpuNumber did not enter SMM
+  @retval EFI_INVALID_PARAMETER    The AP specified by CpuNumber is busy
+  @retval EFI_SUCCESS              The procedure has been successfully scheduled
+
+**/
+EFI_STATUS
+EFIAPI
+SmmStartupThisAp (
+  IN      EFI_AP_PROCEDURE          Procedure,
+  IN      UINTN                     CpuIndex,
+  IN OUT  VOID                      *ProcArguments OPTIONAL
+  )
+{
+  return InternalSmmStartupThisAp(Procedure, CpuIndex, ProcArguments, FeaturePcdGet (PcdCpuSmmBlockStartupThisAp));
+}
+
+/**
   This function sets DR6 & DR7 according to SMM save state, before running SMM C code.
   They are useful when you want to enable hardware breakpoints in SMM without entry SMM mode.
 
@@ -1022,8 +1088,6 @@ SmiRendezvous (
   BOOLEAN                        BspInProgress;
   UINTN                          Index;
   UINTN                          Cr2;
-  BOOLEAN                        XdDisableFlag;
-  MSR_IA32_MISC_ENABLE_REGISTER  MiscEnableMsr;
 
   //
   // Save Cr2 because Page Fault exception in SMM may override its value
@@ -1082,20 +1146,6 @@ SmiRendezvous (
       InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
     }
 
-    //
-    // Try to enable XD
-    //
-    XdDisableFlag = FALSE;
-    if (mXdSupported) {
-      MiscEnableMsr.Uint64 = AsmReadMsr64 (MSR_IA32_MISC_ENABLE);
-      if (MiscEnableMsr.Bits.XD == 1) {
-        XdDisableFlag = TRUE;
-        MiscEnableMsr.Bits.XD = 0;
-        AsmWriteMsr64 (MSR_IA32_MISC_ENABLE, MiscEnableMsr.Uint64);
-      }
-      ActivateXd ();
-    }
-
     if (FeaturePcdGet (PcdCpuSmmProfileEnable)) {
       ActivateSmmProfile (CpuIndex);
     }
@@ -1176,15 +1226,6 @@ SmiRendezvous (
     //
     while (*mSmmMpSyncData->AllCpusInSync) {
       CpuPause ();
-     }
-
-    //
-    // Restore XD
-    //
-    if (XdDisableFlag) {
-      MiscEnableMsr.Uint64 = AsmReadMsr64 (MSR_IA32_MISC_ENABLE);
-      MiscEnableMsr.Bits.XD = 1;
-      AsmWriteMsr64 (MSR_IA32_MISC_ENABLE, MiscEnableMsr.Uint64);
     }
   }
 
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
index 852b5c7..8ef6695 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
@@ -113,6 +113,19 @@ InitializeSmmIdt (
   EFI_STATUS               Status;
   BOOLEAN                  InterruptState;
   IA32_DESCRIPTOR          DxeIdtr;
+
+  //
+  // There are 32 (not 255) entries in it since only processor
+  // generated exceptions will be handled.
+  //
+  gcSmiIdtr.Limit = (sizeof(IA32_IDT_GATE_DESCRIPTOR) * 32) - 1;
+  //
+  // Allocate page aligned IDT, because it might be set as read only.
+  //
+  gcSmiIdtr.Base = (UINTN)AllocateCodePages (EFI_SIZE_TO_PAGES(gcSmiIdtr.Limit + 1));
+  ASSERT (gcSmiIdtr.Base != 0);
+  ZeroMem ((VOID *)gcSmiIdtr.Base, gcSmiIdtr.Limit + 1);
+
   //
   // Disable Interrupt and save DXE IDT table
   //
@@ -731,9 +744,9 @@ PiCpuSmmEntry (
   //
   BufferPages = EFI_SIZE_TO_PAGES (SIZE_32KB + TileSize * (mMaxNumberOfCpus - 1));
   if ((FamilyId == 4) || (FamilyId == 5)) {
-    Buffer = AllocateAlignedPages (BufferPages, SIZE_32KB);
+    Buffer = AllocateAlignedCodePages (BufferPages, SIZE_32KB);
   } else {
-    Buffer = AllocateAlignedPages (BufferPages, SIZE_4KB);
+    Buffer = AllocateAlignedCodePages (BufferPages, SIZE_4KB);
   }
   ASSERT (Buffer != NULL);
   DEBUG ((EFI_D_INFO, "SMRAM SaveState Buffer (0x%08x, 0x%08x)\n", Buffer, EFI_PAGES_TO_SIZE(BufferPages)));
@@ -1137,6 +1150,17 @@ ConfigSmmCodeAccessCheck (
 }
 
 /**
+  Set code region to be read only and data region to be execute disable.
+**/
+VOID
+SetRegionAttributes (
+  VOID
+  )
+{
+  SetMemMapAttributes ();
+}
+
+/**
   This API provides a way to allocate memory for page table.
 
   This API can be called more once to allocate memory for page tables.
@@ -1166,6 +1190,109 @@ AllocatePageTableMemory (
 }
 
 /**
+  Allocate pages for code.
+
+  @param[in]  Pages Number of pages to be allocated.
+
+  @return Allocated memory.
+**/
+VOID *
+AllocateCodePages (
+  IN UINTN           Pages
+  )
+{
+  EFI_STATUS            Status;
+  EFI_PHYSICAL_ADDRESS  Memory;
+
+  if (Pages == 0) {
+    return NULL;
+  }
+
+  Status = gSmst->SmmAllocatePages (AllocateAnyPages, EfiRuntimeServicesCode, Pages, &Memory);
+  if (EFI_ERROR (Status)) {
+    return NULL;
+  }
+  return (VOID *) (UINTN) Memory;
+}
+
+/**
+  Allocate aligned pages for code.
+
+  @param[in]  Pages                 Number of pages to be allocated.
+  @param[in]  Alignment             The requested alignment of the allocation.
+                                    Must be a power of two.
+                                    If Alignment is zero, then byte alignment is used.
+
+  @return Allocated memory.
+**/
+VOID *
+AllocateAlignedCodePages (
+  IN UINTN            Pages,
+  IN UINTN            Alignment
+  )
+{
+  EFI_STATUS            Status;
+  EFI_PHYSICAL_ADDRESS  Memory;
+  UINTN                 AlignedMemory;
+  UINTN                 AlignmentMask;
+  UINTN                 UnalignedPages;
+  UINTN                 RealPages;
+
+  //
+  // Alignment must be a power of two or zero.
+  //
+  ASSERT ((Alignment & (Alignment - 1)) == 0);
+
+  if (Pages == 0) {
+    return NULL;
+  }
+  if (Alignment > EFI_PAGE_SIZE) {
+    //
+    // Calculate the total number of pages since alignment is larger than page size.
+    //
+    AlignmentMask  = Alignment - 1;
+    RealPages      = Pages + EFI_SIZE_TO_PAGES (Alignment);
+    //
+    // Make sure that Pages plus EFI_SIZE_TO_PAGES (Alignment) does not overflow.
+    //
+    ASSERT (RealPages > Pages);
+
+    Status         = gSmst->SmmAllocatePages (AllocateAnyPages, EfiRuntimeServicesCode, RealPages, &Memory);
+    if (EFI_ERROR (Status)) {
+      return NULL;
+    }
+    AlignedMemory  = ((UINTN) Memory + AlignmentMask) & ~AlignmentMask;
+    UnalignedPages = EFI_SIZE_TO_PAGES (AlignedMemory - (UINTN) Memory);
+    if (UnalignedPages > 0) {
+      //
+      // Free first unaligned page(s).
+      //
+      Status = gSmst->SmmFreePages (Memory, UnalignedPages);
+      ASSERT_EFI_ERROR (Status);
+    }
+    Memory         = (EFI_PHYSICAL_ADDRESS) (AlignedMemory + EFI_PAGES_TO_SIZE (Pages));
+    UnalignedPages = RealPages - Pages - UnalignedPages;
+    if (UnalignedPages > 0) {
+      //
+      // Free last unaligned page(s).
+      //
+      Status = gSmst->SmmFreePages (Memory, UnalignedPages);
+      ASSERT_EFI_ERROR (Status);
+    }
+  } else {
+    //
+    // Do not over-allocate pages in this case.
+    //
+    Status = gSmst->SmmAllocatePages (AllocateAnyPages, EfiRuntimeServicesCode, Pages, &Memory);
+    if (EFI_ERROR (Status)) {
+      return NULL;
+    }
+    AlignedMemory  = (UINTN) Memory;
+  }
+  return (VOID *) AlignedMemory;
+}
+
+/**
   Perform the remaining tasks.
 
 **/
@@ -1185,6 +1312,17 @@ PerformRemainingTasks (
     // Create a mix of 2MB and 4KB page table. Update some memory ranges absent and execute-disable.
     //
     InitPaging ();
+
+    //
+    // Mark critical region to be read-only in page table
+    //
+    SetRegionAttributes ();
+
+    //
+    // Set page table itself to be read-only
+    //
+    SetPageTableAttributes ();
+
     //
     // Configure SMM Code Access Check feature if available.
     //
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 9b119c8..6a1582b 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -25,6 +25,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 #include <Protocol/SmmCpuService.h>
 
 #include <Guid/AcpiS3Context.h>
+#include <Guid/PiSmmMemoryAttributesTable.h>
 
 #include <Library/BaseLib.h>
 #include <Library/IoLib.h>
@@ -83,13 +84,38 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 #define IA32_PG_PMNT                BIT62
 #define IA32_PG_NX                  BIT63
 
-#define PAGE_ATTRIBUTE_BITS         (IA32_PG_RW | IA32_PG_P)
+#define PAGE_ATTRIBUTE_BITS         (IA32_PG_D | IA32_PG_A | IA32_PG_U | IA32_PG_RW | IA32_PG_P)
 //
 // Bits 1, 2, 5, 6 are reserved in the IA32 PAE PDPTE
 // X64 PAE PDPTE does not have such restriction
 //
 #define IA32_PAE_PDPTE_ATTRIBUTE_BITS    (IA32_PG_P)
 
+#define PAGE_PROGATE_BITS           (IA32_PG_NX | PAGE_ATTRIBUTE_BITS)
+
+#define PAGING_4K_MASK  0xFFF
+#define PAGING_2M_MASK  0x1FFFFF
+#define PAGING_1G_MASK  0x3FFFFFFF
+
+#define PAGING_PAE_INDEX_MASK  0x1FF
+
+#define PAGING_4K_ADDRESS_MASK_64 0x000FFFFFFFFFF000ull
+#define PAGING_2M_ADDRESS_MASK_64 0x000FFFFFFFE00000ull
+#define PAGING_1G_ADDRESS_MASK_64 0x000FFFFFC0000000ull
+
+typedef enum {
+  PageNone,
+  Page4K,
+  Page2M,
+  Page1G,
+} PAGE_ATTRIBUTE;
+
+typedef struct {
+  PAGE_ATTRIBUTE   Attribute;
+  UINT64           Length;
+  UINT64           AddressMask;
+} PAGE_ATTRIBUTE_TABLE;
+
 //
 // Size of Task-State Segment defined in IA32 Manual
 //
@@ -98,6 +124,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 #define TSS_IA32_CR3_OFFSET   28
 #define TSS_IA32_ESP_OFFSET   56
 
+#define CR0_WP                BIT16
+
 //
 // Code select value
 //
@@ -395,6 +423,8 @@ typedef struct {
 } SMM_CPU_SEMAPHORES;
 
 extern IA32_DESCRIPTOR                     gcSmiGdtr;
+extern EFI_PHYSICAL_ADDRESS                mGdtBuffer;
+extern UINTN                               mGdtBufferSize;
 extern IA32_DESCRIPTOR                     gcSmiIdtr;
 extern VOID                                *gcSmiIdtrPtr;
 extern CONST PROCESSOR_SMM_DESCRIPTOR      gcPsd;
@@ -414,14 +444,12 @@ extern SPIN_LOCK                           *mMemoryMappedLock;
 /**
   Create 4G PageTable in SMRAM.
 
-  @param          ExtraPages       Additional page numbers besides for 4G memory
-  @param          Is32BitPageTable Whether the page table is 32-bit PAE
+  @param[in]      Is32BitPageTable Whether the page table is 32-bit PAE
   @return         PageTable Address
 
 **/
 UINT32
 Gen4GPageTable (
-  IN      UINTN                     ExtraPages,
   IN      BOOLEAN                   Is32BitPageTable
   );
 
@@ -482,7 +510,7 @@ InitializeIDTSmmStackGuard (
 
 /**
   Initialize Gdt for all processors.
-  
+
   @param[in]   Cr3          CR3 value.
   @param[out]  GdtStepSize  The step size for GDT table.
 
@@ -761,6 +789,96 @@ DumpModuleInfoByIp (
   );
 
 /**
+  This function sets memory attribute according to MemoryAttributesTable.
+**/
+VOID
+SetMemMapAttributes (
+  VOID
+  );
+
+/**
+  This function sets memory attribute for page table.
+**/
+VOID
+SetPageTableAttributes (
+  VOID
+  );
+
+/**
+  Return page table base.
+
+  @return page table base.
+**/
+UINTN
+GetPageTableBase (
+  VOID
+  );
+
+/**
+  This function sets the attributes for the memory region specified by BaseAddress and
+  Length from their current attributes to the attributes specified by Attributes.
+
+  @param[in]   BaseAddress      The physical address that is the start address of a memory region.
+  @param[in]   Length           The size in bytes of the memory region.
+  @param[in]   Attributes       The bit mask of attributes to set for the memory region.
+  @param[out]  IsSplitted       TRUE means page table splitted. FALSE means page table not splitted.
+
+  @retval EFI_SUCCESS           The attributes were set for the memory region.
+  @retval EFI_ACCESS_DENIED     The attributes for the memory resource range specified by
+                                BaseAddress and Length cannot be modified.
+  @retval EFI_INVALID_PARAMETER Length is zero.
+                                Attributes specified an illegal combination of attributes that
+                                cannot be set together.
+  @retval EFI_OUT_OF_RESOURCES  There are not enough system resources to modify the attributes of
+                                the memory resource range.
+  @retval EFI_UNSUPPORTED       The processor does not support one or more bytes of the memory
+                                resource range specified by BaseAddress and Length.
+                                The bit mask of attributes is not support for the memory resource
+                                range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmSetMemoryAttributesEx (
+  IN  EFI_PHYSICAL_ADDRESS                       BaseAddress,
+  IN  UINT64                                     Length,
+  IN  UINT64                                     Attributes,
+  OUT BOOLEAN                                    *IsSplitted  OPTIONAL
+  );
+
+/**
+  This function clears the attributes for the memory region specified by BaseAddress and
+  Length from their current attributes to the attributes specified by Attributes.
+
+  @param[in]   BaseAddress      The physical address that is the start address of a memory region.
+  @param[in]   Length           The size in bytes of the memory region.
+  @param[in]   Attributes       The bit mask of attributes to clear for the memory region.
+  @param[out]  IsSplitted       TRUE means page table splitted. FALSE means page table not splitted.
+
+  @retval EFI_SUCCESS           The attributes were cleared for the memory region.
+  @retval EFI_ACCESS_DENIED     The attributes for the memory resource range specified by
+                                BaseAddress and Length cannot be modified.
+  @retval EFI_INVALID_PARAMETER Length is zero.
+                                Attributes specified an illegal combination of attributes that
+                                cannot be set together.
+  @retval EFI_OUT_OF_RESOURCES  There are not enough system resources to modify the attributes of
+                                the memory resource range.
+  @retval EFI_UNSUPPORTED       The processor does not support one or more bytes of the memory
+                                resource range specified by BaseAddress and Length.
+                                The bit mask of attributes is not support for the memory resource
+                                range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmClearMemoryAttributesEx (
+  IN  EFI_PHYSICAL_ADDRESS                       BaseAddress,
+  IN  UINT64                                     Length,
+  IN  UINT64                                     Attributes,
+  OUT BOOLEAN                                    *IsSplitted  OPTIONAL
+  );
+
+/**
   This API provides a way to allocate memory for page table.
 
   This API can be called more once to allocate memory for page tables.
@@ -780,6 +898,34 @@ AllocatePageTableMemory (
   IN UINTN           Pages
   );
 
+/**
+  Allocate pages for code.
+
+  @param[in]  Pages Number of pages to be allocated.
+
+  @return Allocated memory.
+**/
+VOID *
+AllocateCodePages (
+  IN UINTN           Pages
+  );
+
+/**
+  Allocate aligned pages for code.
+
+  @param[in]  Pages                 Number of pages to be allocated.
+  @param[in]  Alignment             The requested alignment of the allocation.
+                                    Must be a power of two.
+                                    If Alignment is zero, then byte alignment is used.
+
+  @return Allocated memory.
+**/
+VOID *
+AllocateAlignedCodePages (
+  IN UINTN            Pages,
+  IN UINTN            Alignment
+  );
+
 
 //
 // S3 related global variable and function prototype.
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf
index 5d598d6..d409edf 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf
@@ -4,7 +4,7 @@
 # This SMM driver performs SMM initialization, deploy SMM Entry Vector,
 # provides CPU specific services in SMM.
 #
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 #
 # This program and the accompanying materials
 # are licensed and made available under the terms and conditions of the BSD License
@@ -44,6 +44,7 @@
   SmmProfile.h
   SmmProfileInternal.h
   SmramSaveState.c
+  SmmCpuMemoryManagement.c
 
 [Sources.Ia32]
   Ia32/Semaphore.c
@@ -133,6 +134,7 @@
   gEfiGlobalVariableGuid                   ## SOMETIMES_PRODUCES ## Variable:L"SmmProfileData"
   gEfiAcpi20TableGuid                      ## SOMETIMES_CONSUMES ## SystemTable
   gEfiAcpi10TableGuid                      ## SOMETIMES_CONSUMES ## SystemTable
+  gEdkiiPiSmmMemoryAttributesTableGuid     ## CONSUMES ## SystemTable
 
 [FeaturePcd]
   gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmDebug                         ## CONSUMES
@@ -153,6 +155,7 @@
   gUefiCpuPkgTokenSpaceGuid.PcdCpuHotPlugDataAddress               ## SOMETIMES_PRODUCES
   gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmCodeAccessCheckEnable         ## CONSUMES
   gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode                      ## CONSUMES
+  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStaticPageTable               ## CONSUMES
   gEfiMdeModulePkgTokenSpaceGuid.PcdAcpiS3Enable                   ## CONSUMES
 
 [Depex]
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
new file mode 100644
index 0000000..4c1f900
--- /dev/null
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
@@ -0,0 +1,871 @@
+/** @file
+
+Copyright (c) 2016, Intel Corporation. All rights reserved.<BR>
+This program and the accompanying materials
+are licensed and made available under the terms and conditions of the BSD License
+which accompanies this distribution.  The full text of the license may be found at
+http://opensource.org/licenses/bsd-license.php
+
+THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+
+**/
+
+#include "PiSmmCpuDxeSmm.h"
+
+#define NEXT_MEMORY_DESCRIPTOR(MemoryDescriptor, Size) \
+  ((EFI_MEMORY_DESCRIPTOR *)((UINT8 *)(MemoryDescriptor) + (Size)))
+
+PAGE_ATTRIBUTE_TABLE mPageAttributeTable[] = {
+  {Page4K,  SIZE_4KB, PAGING_4K_ADDRESS_MASK_64},
+  {Page2M,  SIZE_2MB, PAGING_2M_ADDRESS_MASK_64},
+  {Page1G,  SIZE_1GB, PAGING_1G_ADDRESS_MASK_64},
+};
+
+/**
+  Return page table base.
+
+  @return page table base.
+**/
+UINTN
+GetPageTableBase (
+  VOID
+  )
+{
+  return (AsmReadCr3 () & PAGING_4K_ADDRESS_MASK_64);
+}
+
+/**
+  Return length according to page attributes.
+
+  @param[in]  PageAttributes   The page attribute of the page entry.
+
+  @return The length of page entry.
+**/
+UINTN
+PageAttributeToLength (
+  IN PAGE_ATTRIBUTE  PageAttribute
+  )
+{
+  UINTN  Index;
+  for (Index = 0; Index < sizeof(mPageAttributeTable)/sizeof(mPageAttributeTable[0]); Index++) {
+    if (PageAttribute == mPageAttributeTable[Index].Attribute) {
+      return (UINTN)mPageAttributeTable[Index].Length;
+    }
+  }
+  return 0;
+}
+
+/**
+  Return address mask according to page attributes.
+
+  @param[in]  PageAttributes   The page attribute of the page entry.
+
+  @return The address mask of page entry.
+**/
+UINTN
+PageAttributeToMask (
+  IN PAGE_ATTRIBUTE  PageAttribute
+  )
+{
+  UINTN  Index;
+  for (Index = 0; Index < sizeof(mPageAttributeTable)/sizeof(mPageAttributeTable[0]); Index++) {
+    if (PageAttribute == mPageAttributeTable[Index].Attribute) {
+      return (UINTN)mPageAttributeTable[Index].AddressMask;
+    }
+  }
+  return 0;
+}
+
+/**
+  Return page table entry to match the address.
+
+  @param[in]   Address          The address to be checked.
+  @param[out]  PageAttributes   The page attribute of the page entry.
+
+  @return The page entry.
+**/
+VOID *
+GetPageTableEntry (
+  IN  PHYSICAL_ADDRESS                  Address,
+  OUT PAGE_ATTRIBUTE                    *PageAttribute
+  )
+{
+  UINTN                 Index1;
+  UINTN                 Index2;
+  UINTN                 Index3;
+  UINTN                 Index4;
+  UINT64                *L1PageTable;
+  UINT64                *L2PageTable;
+  UINT64                *L3PageTable;
+  UINT64                *L4PageTable;
+
+  Index4 = ((UINTN)RShiftU64 (Address, 39)) & PAGING_PAE_INDEX_MASK;
+  Index3 = ((UINTN)Address >> 30) & PAGING_PAE_INDEX_MASK;
+  Index2 = ((UINTN)Address >> 21) & PAGING_PAE_INDEX_MASK;
+  Index1 = ((UINTN)Address >> 12) & PAGING_PAE_INDEX_MASK;
+
+  if (sizeof(UINTN) == sizeof(UINT64)) {
+    L4PageTable = (UINT64 *)GetPageTableBase ();
+    if (L4PageTable[Index4] == 0) {
+      *PageAttribute = PageNone;
+      return NULL;
+    }
+
+    L3PageTable = (UINT64 *)(UINTN)(L4PageTable[Index4] & PAGING_4K_ADDRESS_MASK_64);
+  } else {
+    L3PageTable = (UINT64 *)GetPageTableBase ();
+  }
+  if (L3PageTable[Index3] == 0) {
+    *PageAttribute = PageNone;
+    return NULL;
+  }
+  if ((L3PageTable[Index3] & IA32_PG_PS) != 0) {
+    // 1G
+    *PageAttribute = Page1G;
+    return &L3PageTable[Index3];
+  }
+
+  L2PageTable = (UINT64 *)(UINTN)(L3PageTable[Index3] & PAGING_4K_ADDRESS_MASK_64);
+  if (L2PageTable[Index2] == 0) {
+    *PageAttribute = PageNone;
+    return NULL;
+  }
+  if ((L2PageTable[Index2] & IA32_PG_PS) != 0) {
+    // 2M
+    *PageAttribute = Page2M;
+    return &L2PageTable[Index2];
+  }
+
+  // 4k
+  L1PageTable = (UINT64 *)(UINTN)(L2PageTable[Index2] & PAGING_4K_ADDRESS_MASK_64);
+  if ((L1PageTable[Index1] == 0) && (Address != 0)) {
+    *PageAttribute = PageNone;
+    return NULL;
+  }
+  *PageAttribute = Page4K;
+  return &L1PageTable[Index1];
+}
+
+/**
+  Return memory attributes of page entry.
+
+  @param[in]  PageEntry        The page entry.
+
+  @return Memory attributes of page entry.
+**/
+UINT64
+GetAttributesFromPageEntry (
+  IN  UINT64                            *PageEntry
+  )
+{
+  UINT64  Attributes;
+  Attributes = 0;
+  if ((*PageEntry & IA32_PG_P) == 0) {
+    Attributes |= EFI_MEMORY_RP;
+  }
+  if ((*PageEntry & IA32_PG_RW) == 0) {
+    Attributes |= EFI_MEMORY_RO;
+  }
+  if ((*PageEntry & IA32_PG_NX) != 0) {
+    Attributes |= EFI_MEMORY_XP;
+  }
+  return Attributes;
+}
+
+/**
+  Modify memory attributes of page entry.
+
+  @param[in]   PageEntry        The page entry.
+  @param[in]   Attributes       The bit mask of attributes to modify for the memory region.
+  @param[in]   IsSet            TRUE means to set attributes. FALSE means to clear attributes.
+  @param[out]  IsModified       TRUE means page table modified. FALSE means page table not modified.
+**/
+VOID
+ConvertPageEntryAttribute (
+  IN  UINT64                            *PageEntry,
+  IN  UINT64                            Attributes,
+  IN  BOOLEAN                           IsSet,
+  OUT BOOLEAN                           *IsModified
+  )
+{
+  UINT64  CurrentPageEntry;
+  UINT64  NewPageEntry;
+
+  CurrentPageEntry = *PageEntry;
+  NewPageEntry = CurrentPageEntry;
+  if ((Attributes & EFI_MEMORY_RP) != 0) {
+    if (IsSet) {
+      NewPageEntry &= ~(UINT64)IA32_PG_P;
+    } else {
+      NewPageEntry |= IA32_PG_P;
+    }
+  }
+  if ((Attributes & EFI_MEMORY_RO) != 0) {
+    if (IsSet) {
+      NewPageEntry &= ~(UINT64)IA32_PG_RW;
+    } else {
+      NewPageEntry |= IA32_PG_RW;
+    }
+  }
+  if ((Attributes & EFI_MEMORY_XP) != 0) {
+    if (IsSet) {
+      NewPageEntry |= IA32_PG_NX;
+    } else {
+      NewPageEntry &= ~IA32_PG_NX;
+    }
+  }
+  *PageEntry = NewPageEntry;
+  if (CurrentPageEntry != NewPageEntry) {
+    *IsModified = TRUE;
+    DEBUG ((DEBUG_INFO, "ConvertPageEntryAttribute 0x%lx", CurrentPageEntry));
+    DEBUG ((DEBUG_INFO, "->0x%lx\n", NewPageEntry));
+  } else {
+    *IsModified = FALSE;
+  }
+}
+
+/**
+  This function returns if there is need to split page entry.
+
+  @param[in]  BaseAddress      The base address to be checked.
+  @param[in]  Length           The length to be checked.
+  @param[in]  PageEntry        The page entry to be checked.
+  @param[in]  PageAttribute    The page attribute of the page entry.
+
+  @retval SplitAttributes on if there is need to split page entry.
+**/
+PAGE_ATTRIBUTE
+NeedSplitPage (
+  IN  PHYSICAL_ADDRESS                  BaseAddress,
+  IN  UINT64                            Length,
+  IN  UINT64                            *PageEntry,
+  IN  PAGE_ATTRIBUTE                    PageAttribute
+  )
+{
+  UINT64                PageEntryLength;
+
+  PageEntryLength = PageAttributeToLength (PageAttribute);
+
+  if (((BaseAddress & (PageEntryLength - 1)) == 0) && (Length >= PageEntryLength)) {
+    return PageNone;
+  }
+
+  if (((BaseAddress & PAGING_2M_MASK) != 0) || (Length < SIZE_2MB)) {
+    return Page4K;
+  }
+
+  return Page2M;
+}
+
+/**
+  This function splits one page entry to small page entries.
+
+  @param[in]  PageEntry        The page entry to be splitted.
+  @param[in]  PageAttribute    The page attribute of the page entry.
+  @param[in]  SplitAttribute   How to split the page entry.
+
+  @retval RETURN_SUCCESS            The page entry is splitted.
+  @retval RETURN_UNSUPPORTED        The page entry does not support to be splitted.
+  @retval RETURN_OUT_OF_RESOURCES   No resource to split page entry.
+**/
+RETURN_STATUS
+SplitPage (
+  IN  UINT64                            *PageEntry,
+  IN  PAGE_ATTRIBUTE                    PageAttribute,
+  IN  PAGE_ATTRIBUTE                    SplitAttribute
+  )
+{
+  UINT64   BaseAddress;
+  UINT64   *NewPageEntry;
+  UINTN    Index;
+
+  ASSERT (PageAttribute == Page2M || PageAttribute == Page1G);
+
+  if (PageAttribute == Page2M) {
+    //
+    // Split 2M to 4K
+    //
+    ASSERT (SplitAttribute == Page4K);
+    if (SplitAttribute == Page4K) {
+      NewPageEntry = AllocatePageTableMemory (1);
+      DEBUG ((DEBUG_INFO, "Split - 0x%x\n", NewPageEntry));
+      if (NewPageEntry == NULL) {
+        return RETURN_OUT_OF_RESOURCES;
+      }
+      BaseAddress = *PageEntry & PAGING_2M_ADDRESS_MASK_64;
+      for (Index = 0; Index < SIZE_4KB / sizeof(UINT64); Index++) {
+        NewPageEntry[Index] = BaseAddress + SIZE_4KB * Index + ((*PageEntry) & PAGE_PROGATE_BITS);
+      }
+      (*PageEntry) = (UINT64)(UINTN)NewPageEntry + ((*PageEntry) & PAGE_PROGATE_BITS);
+      return RETURN_SUCCESS;
+    } else {
+      return RETURN_UNSUPPORTED;
+    }
+  } else if (PageAttribute == Page1G) {
+    //
+    // Split 1G to 2M
+    // No need support 1G->4K directly, we should use 1G->2M, then 2M->4K to get more compact page table.
+    //
+    ASSERT (SplitAttribute == Page2M || SplitAttribute == Page4K);
+    if ((SplitAttribute == Page2M || SplitAttribute == Page4K)) {
+      NewPageEntry = AllocatePageTableMemory (1);
+      DEBUG ((DEBUG_INFO, "Split - 0x%x\n", NewPageEntry));
+      if (NewPageEntry == NULL) {
+        return RETURN_OUT_OF_RESOURCES;
+      }
+      BaseAddress = *PageEntry & PAGING_1G_ADDRESS_MASK_64;
+      for (Index = 0; Index < SIZE_4KB / sizeof(UINT64); Index++) {
+        NewPageEntry[Index] = BaseAddress + SIZE_2MB * Index + IA32_PG_PS + ((*PageEntry) & PAGE_PROGATE_BITS);
+      }
+      (*PageEntry) = (UINT64)(UINTN)NewPageEntry + ((*PageEntry) & PAGE_PROGATE_BITS);
+      return RETURN_SUCCESS;
+    } else {
+      return RETURN_UNSUPPORTED;
+    }
+  } else {
+    return RETURN_UNSUPPORTED;
+  }
+}
+
+/**
+  This function modifies the page attributes for the memory region specified by BaseAddress and
+  Length from their current attributes to the attributes specified by Attributes.
+
+  Caller should make sure BaseAddress and Length is at page boundary.
+
+  @param[in]   BaseAddress      The physical address that is the start address of a memory region.
+  @param[in]   Length           The size in bytes of the memory region.
+  @param[in]   Attributes       The bit mask of attributes to modify for the memory region.
+  @param[in]   IsSet            TRUE means to set attributes. FALSE means to clear attributes.
+  @param[out]  IsSplitted       TRUE means page table splitted. FALSE means page table not splitted.
+  @param[out]  IsModified       TRUE means page table modified. FALSE means page table not modified.
+
+  @retval RETURN_SUCCESS           The attributes were modified for the memory region.
+  @retval RETURN_ACCESS_DENIED     The attributes for the memory resource range specified by
+                                   BaseAddress and Length cannot be modified.
+  @retval RETURN_INVALID_PARAMETER Length is zero.
+                                   Attributes specified an illegal combination of attributes that
+                                   cannot be set together.
+  @retval RETURN_OUT_OF_RESOURCES  There are not enough system resources to modify the attributes of
+                                   the memory resource range.
+  @retval RETURN_UNSUPPORTED       The processor does not support one or more bytes of the memory
+                                   resource range specified by BaseAddress and Length.
+                                   The bit mask of attributes is not support for the memory resource
+                                   range specified by BaseAddress and Length.
+**/
+RETURN_STATUS
+EFIAPI
+ConvertMemoryPageAttributes (
+  IN  PHYSICAL_ADDRESS                  BaseAddress,
+  IN  UINT64                            Length,
+  IN  UINT64                            Attributes,
+  IN  BOOLEAN                           IsSet,
+  OUT BOOLEAN                           *IsSplitted,  OPTIONAL
+  OUT BOOLEAN                           *IsModified   OPTIONAL
+  )
+{
+  UINT64                            *PageEntry;
+  PAGE_ATTRIBUTE                    PageAttribute;
+  UINTN                             PageEntryLength;
+  PAGE_ATTRIBUTE                    SplitAttribute;
+  RETURN_STATUS                     Status;
+  BOOLEAN                           IsEntryModified;
+
+  ASSERT (Attributes != 0);
+  ASSERT ((Attributes & ~(EFI_MEMORY_RP | EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0);
+
+  ASSERT ((BaseAddress & (SIZE_4KB - 1)) == 0);
+  ASSERT ((Length & (SIZE_4KB - 1)) == 0);
+
+  if (Length == 0) {
+    return RETURN_INVALID_PARAMETER;
+  }
+
+//  DEBUG ((DEBUG_ERROR, "ConvertMemoryPageAttributes(%x) - %016lx, %016lx, %02lx\n", IsSet, BaseAddress, Length, Attributes));
+
+  if (IsSplitted != NULL) {
+    *IsSplitted = FALSE;
+  }
+  if (IsModified != NULL) {
+    *IsModified = FALSE;
+  }
+
+  //
+  // Below logic is to check 2M/4K page to make sure we donot waist memory.
+  //
+  while (Length != 0) {
+    PageEntry = GetPageTableEntry (BaseAddress, &PageAttribute);
+    if (PageEntry == NULL) {
+      return RETURN_UNSUPPORTED;
+    }
+    PageEntryLength = PageAttributeToLength (PageAttribute);
+    SplitAttribute = NeedSplitPage (BaseAddress, Length, PageEntry, PageAttribute);
+    if (SplitAttribute == PageNone) {
+      ConvertPageEntryAttribute (PageEntry, Attributes, IsSet, &IsEntryModified);
+      if (IsEntryModified) {
+        if (IsModified != NULL) {
+          *IsModified = TRUE;
+        }
+      }
+      //
+      // Convert success, move to next
+      //
+      BaseAddress += PageEntryLength;
+      Length -= PageEntryLength;
+    } else {
+      Status = SplitPage (PageEntry, PageAttribute, SplitAttribute);
+      if (RETURN_ERROR (Status)) {
+        return RETURN_UNSUPPORTED;
+      }
+      if (IsSplitted != NULL) {
+        *IsSplitted = TRUE;
+      }
+      if (IsModified != NULL) {
+        *IsModified = TRUE;
+      }
+      //
+      // Just split current page
+      // Convert success in next around
+      //
+    }
+  }
+
+  return RETURN_SUCCESS;
+}
+
+/**
+  FlushTlb on current processor.
+
+  @param[in,out] Buffer  Pointer to private data buffer.
+**/
+VOID
+EFIAPI
+FlushTlbOnCurrentProcessor (
+  IN OUT VOID  *Buffer
+  )
+{
+  CpuFlushTlb ();
+}
+
+/**
+  FlushTlb for all processors.
+**/
+VOID
+FlushTlbForAll (
+  VOID
+  )
+{
+  UINTN       Index;
+
+  FlushTlbOnCurrentProcessor (NULL);
+
+  for (Index = 0; Index < gSmst->NumberOfCpus; Index++) {
+    if (Index != gSmst->CurrentlyExecutingCpu) {
+      // Force to start up AP in blocking mode,
+      SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
+      // Do not check return status, because AP might not be present in some corner cases.
+    }
+  }
+}
+
+/**
+  This function sets the attributes for the memory region specified by BaseAddress and
+  Length from their current attributes to the attributes specified by Attributes.
+
+  @param[in]   BaseAddress      The physical address that is the start address of a memory region.
+  @param[in]   Length           The size in bytes of the memory region.
+  @param[in]   Attributes       The bit mask of attributes to set for the memory region.
+  @param[out]  IsSplitted       TRUE means page table splitted. FALSE means page table not splitted.
+
+  @retval EFI_SUCCESS           The attributes were set for the memory region.
+  @retval EFI_ACCESS_DENIED     The attributes for the memory resource range specified by
+                                BaseAddress and Length cannot be modified.
+  @retval EFI_INVALID_PARAMETER Length is zero.
+                                Attributes specified an illegal combination of attributes that
+                                cannot be set together.
+  @retval EFI_OUT_OF_RESOURCES  There are not enough system resources to modify the attributes of
+                                the memory resource range.
+  @retval EFI_UNSUPPORTED       The processor does not support one or more bytes of the memory
+                                resource range specified by BaseAddress and Length.
+                                The bit mask of attributes is not support for the memory resource
+                                range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmSetMemoryAttributesEx (
+  IN  EFI_PHYSICAL_ADDRESS                       BaseAddress,
+  IN  UINT64                                     Length,
+  IN  UINT64                                     Attributes,
+  OUT BOOLEAN                                    *IsSplitted  OPTIONAL
+  )
+{
+  EFI_STATUS  Status;
+  BOOLEAN     IsModified;
+
+  Status = ConvertMemoryPageAttributes (BaseAddress, Length, Attributes, TRUE, IsSplitted, &IsModified);
+  if (!EFI_ERROR(Status)) {
+    if (IsModified) {
+      //
+      // Flush TLB as last step
+      //
+      FlushTlbForAll();
+    }
+  }
+
+  return Status;
+}
+
+/**
+  This function clears the attributes for the memory region specified by BaseAddress and
+  Length from their current attributes to the attributes specified by Attributes.
+
+  @param[in]   BaseAddress      The physical address that is the start address of a memory region.
+  @param[in]   Length           The size in bytes of the memory region.
+  @param[in]   Attributes       The bit mask of attributes to clear for the memory region.
+  @param[out]  IsSplitted       TRUE means page table splitted. FALSE means page table not splitted.
+
+  @retval EFI_SUCCESS           The attributes were cleared for the memory region.
+  @retval EFI_ACCESS_DENIED     The attributes for the memory resource range specified by
+                                BaseAddress and Length cannot be modified.
+  @retval EFI_INVALID_PARAMETER Length is zero.
+                                Attributes specified an illegal combination of attributes that
+                                cannot be set together.
+  @retval EFI_OUT_OF_RESOURCES  There are not enough system resources to modify the attributes of
+                                the memory resource range.
+  @retval EFI_UNSUPPORTED       The processor does not support one or more bytes of the memory
+                                resource range specified by BaseAddress and Length.
+                                The bit mask of attributes is not support for the memory resource
+                                range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmClearMemoryAttributesEx (
+  IN  EFI_PHYSICAL_ADDRESS                       BaseAddress,
+  IN  UINT64                                     Length,
+  IN  UINT64                                     Attributes,
+  OUT BOOLEAN                                    *IsSplitted  OPTIONAL
+  )
+{
+  EFI_STATUS  Status;
+  BOOLEAN     IsModified;
+
+  Status = ConvertMemoryPageAttributes (BaseAddress, Length, Attributes, FALSE, IsSplitted, &IsModified);
+  if (!EFI_ERROR(Status)) {
+    if (IsModified) {
+      //
+      // Flush TLB as last step
+      //
+      FlushTlbForAll();
+    }
+  }
+
+  return Status;
+}
+
+/**
+  This function sets the attributes for the memory region specified by BaseAddress and
+  Length from their current attributes to the attributes specified by Attributes.
+
+  @param[in]  BaseAddress      The physical address that is the start address of a memory region.
+  @param[in]  Length           The size in bytes of the memory region.
+  @param[in]  Attributes       The bit mask of attributes to set for the memory region.
+
+  @retval EFI_SUCCESS           The attributes were set for the memory region.
+  @retval EFI_ACCESS_DENIED     The attributes for the memory resource range specified by
+                                BaseAddress and Length cannot be modified.
+  @retval EFI_INVALID_PARAMETER Length is zero.
+                                Attributes specified an illegal combination of attributes that
+                                cannot be set together.
+  @retval EFI_OUT_OF_RESOURCES  There are not enough system resources to modify the attributes of
+                                the memory resource range.
+  @retval EFI_UNSUPPORTED       The processor does not support one or more bytes of the memory
+                                resource range specified by BaseAddress and Length.
+                                The bit mask of attributes is not support for the memory resource
+                                range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmSetMemoryAttributes (
+  IN  EFI_PHYSICAL_ADDRESS                       BaseAddress,
+  IN  UINT64                                     Length,
+  IN  UINT64                                     Attributes
+  )
+{
+  return SmmSetMemoryAttributesEx (BaseAddress, Length, Attributes, NULL);
+}
+
+/**
+  This function clears the attributes for the memory region specified by BaseAddress and
+  Length from their current attributes to the attributes specified by Attributes.
+
+  @param[in]  BaseAddress      The physical address that is the start address of a memory region.
+  @param[in]  Length           The size in bytes of the memory region.
+  @param[in]  Attributes       The bit mask of attributes to clear for the memory region.
+
+  @retval EFI_SUCCESS           The attributes were cleared for the memory region.
+  @retval EFI_ACCESS_DENIED     The attributes for the memory resource range specified by
+                                BaseAddress and Length cannot be modified.
+  @retval EFI_INVALID_PARAMETER Length is zero.
+                                Attributes specified an illegal combination of attributes that
+                                cannot be set together.
+  @retval EFI_OUT_OF_RESOURCES  There are not enough system resources to modify the attributes of
+                                the memory resource range.
+  @retval EFI_UNSUPPORTED       The processor does not support one or more bytes of the memory
+                                resource range specified by BaseAddress and Length.
+                                The bit mask of attributes is not support for the memory resource
+                                range specified by BaseAddress and Length.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmClearMemoryAttributes (
+  IN  EFI_PHYSICAL_ADDRESS                       BaseAddress,
+  IN  UINT64                                     Length,
+  IN  UINT64                                     Attributes
+  )
+{
+  return SmmClearMemoryAttributesEx (BaseAddress, Length, Attributes, NULL);
+}
+
+
+
+/**
+  Retrieves a pointer to the system configuration table from the SMM System Table
+  based on a specified GUID.
+
+  @param[in]   TableGuid       The pointer to table's GUID type.
+  @param[out]  Table           The pointer to the table associated with TableGuid in the EFI System Table.
+
+  @retval EFI_SUCCESS     A configuration table matching TableGuid was found.
+  @retval EFI_NOT_FOUND   A configuration table matching TableGuid could not be found.
+
+**/
+EFI_STATUS
+EFIAPI
+SmmGetSystemConfigurationTable (
+  IN  EFI_GUID  *TableGuid,
+  OUT VOID      **Table
+  )
+{
+  UINTN             Index;
+
+  ASSERT (TableGuid != NULL);
+  ASSERT (Table != NULL);
+
+  *Table = NULL;
+  for (Index = 0; Index < gSmst->NumberOfTableEntries; Index++) {
+    if (CompareGuid (TableGuid, &(gSmst->SmmConfigurationTable[Index].VendorGuid))) {
+      *Table = gSmst->SmmConfigurationTable[Index].VendorTable;
+      return EFI_SUCCESS;
+    }
+  }
+
+  return EFI_NOT_FOUND;
+}
+
+/**
+  This function sets SMM save state buffer to be RW and XP.
+**/
+VOID
+PatchSmmSaveStateMap (
+  VOID
+  )
+{
+  UINTN  Index;
+  UINTN  TileCodeSize;
+  UINTN  TileDataSize;
+  UINTN  TileSize;
+
+  TileCodeSize = GetSmiHandlerSize ();
+  TileCodeSize = ALIGN_VALUE(TileCodeSize, SIZE_4KB);
+  TileDataSize = sizeof (SMRAM_SAVE_STATE_MAP) + sizeof (PROCESSOR_SMM_DESCRIPTOR);
+  TileDataSize = ALIGN_VALUE(TileDataSize, SIZE_4KB);
+  TileSize = TileDataSize + TileCodeSize - 1;
+  TileSize = 2 * GetPowerOfTwo32 ((UINT32)TileSize);
+
+  DEBUG ((DEBUG_INFO, "PatchSmmSaveStateMap:\n"));
+  for (Index = 0; Index < mMaxNumberOfCpus - 1; Index++) {
+    //
+    // Code
+    //
+    SmmSetMemoryAttributes (
+      mCpuHotPlugData.SmBase[Index] + SMM_HANDLER_OFFSET,
+      TileCodeSize,
+      EFI_MEMORY_RO
+      );
+    SmmClearMemoryAttributes (
+      mCpuHotPlugData.SmBase[Index] + SMM_HANDLER_OFFSET,
+      TileCodeSize,
+      EFI_MEMORY_XP
+      );
+
+    //
+    // Data
+    //
+    SmmClearMemoryAttributes (
+      mCpuHotPlugData.SmBase[Index] + SMM_HANDLER_OFFSET + TileCodeSize,
+      TileSize - TileCodeSize,
+      EFI_MEMORY_RO
+      );
+    SmmSetMemoryAttributes (
+      mCpuHotPlugData.SmBase[Index] + SMM_HANDLER_OFFSET + TileCodeSize,
+      TileSize - TileCodeSize,
+      EFI_MEMORY_XP
+      );
+  }
+
+  //
+  // Code
+  //
+  SmmSetMemoryAttributes (
+    mCpuHotPlugData.SmBase[mMaxNumberOfCpus - 1] + SMM_HANDLER_OFFSET,
+    TileCodeSize,
+    EFI_MEMORY_RO
+    );
+  SmmClearMemoryAttributes (
+    mCpuHotPlugData.SmBase[mMaxNumberOfCpus - 1] + SMM_HANDLER_OFFSET,
+    TileCodeSize,
+    EFI_MEMORY_XP
+    );
+
+  //
+  // Data
+  //
+  SmmClearMemoryAttributes (
+    mCpuHotPlugData.SmBase[mMaxNumberOfCpus - 1] + SMM_HANDLER_OFFSET + TileCodeSize,
+    SIZE_32KB - TileCodeSize,
+    EFI_MEMORY_RO
+    );
+  SmmSetMemoryAttributes (
+    mCpuHotPlugData.SmBase[mMaxNumberOfCpus - 1] + SMM_HANDLER_OFFSET + TileCodeSize,
+    SIZE_32KB - TileCodeSize,
+    EFI_MEMORY_XP
+    );
+}
+
+/**
+  This function sets GDT/IDT buffer to be RO and XP.
+**/
+VOID
+PatchGdtIdtMap (
+  VOID
+  )
+{
+  EFI_PHYSICAL_ADDRESS       BaseAddress;
+  UINTN                      Size;
+
+  //
+  // GDT
+  //
+  DEBUG ((DEBUG_INFO, "PatchGdtIdtMap - GDT:\n"));
+
+  BaseAddress = mGdtBuffer;
+  Size = ALIGN_VALUE(mGdtBufferSize, SIZE_4KB);
+  SmmSetMemoryAttributes (
+    BaseAddress,
+    Size,
+    EFI_MEMORY_RO
+    );
+  SmmSetMemoryAttributes (
+    BaseAddress,
+    Size,
+    EFI_MEMORY_XP
+    );
+
+  //
+  // IDT
+  //
+  DEBUG ((DEBUG_INFO, "PatchGdtIdtMap - IDT:\n"));
+
+  BaseAddress = gcSmiIdtr.Base;
+  Size = ALIGN_VALUE(gcSmiIdtr.Limit + 1, SIZE_4KB);
+  SmmSetMemoryAttributes (
+    BaseAddress,
+    Size,
+    EFI_MEMORY_RO
+    );
+  SmmSetMemoryAttributes (
+    BaseAddress,
+    Size,
+    EFI_MEMORY_XP
+    );
+}
+
+/**
+  This function sets memory attribute according to MemoryAttributesTable.
+**/
+VOID
+SetMemMapAttributes (
+  VOID
+  )
+{
+  EFI_MEMORY_DESCRIPTOR                     *MemoryMap;
+  EFI_MEMORY_DESCRIPTOR                     *MemoryMapStart;
+  UINTN                                     MemoryMapEntryCount;
+  UINTN                                     DescriptorSize;
+  UINTN                                     Index;
+  EDKII_PI_SMM_MEMORY_ATTRIBUTES_TABLE      *MemoryAttributesTable;
+
+  SmmGetSystemConfigurationTable (&gEdkiiPiSmmMemoryAttributesTableGuid, (VOID **)&MemoryAttributesTable);
+  if (MemoryAttributesTable == NULL) {
+    DEBUG ((DEBUG_INFO, "MemoryAttributesTable - NULL\n"));
+    return ;
+  }
+
+  DEBUG ((DEBUG_INFO, "MemoryAttributesTable:\n"));
+  DEBUG ((DEBUG_INFO, "  Version                   - 0x%08x\n", MemoryAttributesTable->Version));
+  DEBUG ((DEBUG_INFO, "  NumberOfEntries           - 0x%08x\n", MemoryAttributesTable->NumberOfEntries));
+  DEBUG ((DEBUG_INFO, "  DescriptorSize            - 0x%08x\n", MemoryAttributesTable->DescriptorSize));
+
+  MemoryMapEntryCount = MemoryAttributesTable->NumberOfEntries;
+  DescriptorSize = MemoryAttributesTable->DescriptorSize;
+  MemoryMapStart = (EFI_MEMORY_DESCRIPTOR *)(MemoryAttributesTable + 1);
+  MemoryMap = MemoryMapStart;
+  for (Index = 0; Index < MemoryMapEntryCount; Index++) {
+    DEBUG ((DEBUG_INFO, "Entry (0x%x)\n", MemoryMap));
+    DEBUG ((DEBUG_INFO, "  Type              - 0x%x\n", MemoryMap->Type));
+    DEBUG ((DEBUG_INFO, "  PhysicalStart     - 0x%016lx\n", MemoryMap->PhysicalStart));
+    DEBUG ((DEBUG_INFO, "  VirtualStart      - 0x%016lx\n", MemoryMap->VirtualStart));
+    DEBUG ((DEBUG_INFO, "  NumberOfPages     - 0x%016lx\n", MemoryMap->NumberOfPages));
+    DEBUG ((DEBUG_INFO, "  Attribute         - 0x%016lx\n", MemoryMap->Attribute));
+    MemoryMap = NEXT_MEMORY_DESCRIPTOR(MemoryMap, DescriptorSize);
+  }
+
+  MemoryMap = MemoryMapStart;
+  for (Index = 0; Index < MemoryMapEntryCount; Index++) {
+    DEBUG ((DEBUG_INFO, "SetAttribute: Memory Entry - 0x%lx, 0x%x\n", MemoryMap->PhysicalStart, MemoryMap->NumberOfPages));
+    switch (MemoryMap->Type) {
+    case EfiRuntimeServicesCode:
+      SmmSetMemoryAttributes (
+        MemoryMap->PhysicalStart,
+        EFI_PAGES_TO_SIZE((UINTN)MemoryMap->NumberOfPages),
+        EFI_MEMORY_RO
+        );
+      break;
+    case EfiRuntimeServicesData:
+      SmmSetMemoryAttributes (
+        MemoryMap->PhysicalStart,
+        EFI_PAGES_TO_SIZE((UINTN)MemoryMap->NumberOfPages),
+        EFI_MEMORY_XP
+        );
+      break;
+    default:
+      SmmSetMemoryAttributes (
+        MemoryMap->PhysicalStart,
+        EFI_PAGES_TO_SIZE((UINTN)MemoryMap->NumberOfPages),
+        EFI_MEMORY_XP
+        );
+      break;
+    }
+    MemoryMap = NEXT_MEMORY_DESCRIPTOR(MemoryMap, DescriptorSize);
+  }
+
+  PatchSmmSaveStateMap ();
+  PatchGdtIdtMap ();
+
+  return ;
+}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c
index 329574e..4b7fad2 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c
@@ -30,11 +30,6 @@ UINTN                     mSmmProfileSize;
 UINTN                     mMsrDsAreaSize   = SMM_PROFILE_DTS_SIZE;
 
 //
-// The flag indicates if execute-disable is supported by processor.
-//
-BOOLEAN                   mXdSupported     = TRUE;
-
-//
 // The flag indicates if execute-disable is enabled on processor.
 //
 BOOLEAN                   mXdEnabled       = FALSE;
@@ -529,6 +524,12 @@ InitPaging (
         //
         continue;
       }
+      if ((*Pde & IA32_PG_PS) != 0) {
+        //
+        // This is 1G entry, skip it
+        //
+        continue;
+      }
       Pte = (UINT64 *)(UINTN)(*Pde & PHYSICAL_ADDRESS_MASK);
       if (Pte == 0) {
         continue;
@@ -587,6 +588,15 @@ InitPaging (
         //
         continue;
       }
+      if ((*Pde & IA32_PG_PS) != 0) {
+        //
+        // This is 1G entry, set NX bit and skip it
+        //
+        if (mXdSupported) {
+          *Pde = *Pde | IA32_PG_NX;
+        }
+        continue;
+      }
       Pte = (UINT64 *)(UINTN)(*Pde & PHYSICAL_ADDRESS_MASK);
       if (Pte == 0) {
         continue;
@@ -976,25 +986,6 @@ CheckFeatureSupported (
 }
 
 /**
-  Enable XD feature.
-
-**/
-VOID
-ActivateXd (
-  VOID
-  )
-{
-  UINT64           MsrRegisters;
-
-  MsrRegisters = AsmReadMsr64 (MSR_EFER);
-  if ((MsrRegisters & MSR_EFER_XD) != 0) {
-    return ;
-  }
-  MsrRegisters |= MSR_EFER_XD;
-  AsmWriteMsr64 (MSR_EFER, MsrRegisters);
-}
-
-/**
   Enable single step.
 
 **/
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h
index 13ff675..b6fb5cf 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h
@@ -97,15 +97,6 @@ CheckFeatureSupported (
   );
 
 /**
-  Enable XD feature.
-
-**/
-VOID
-ActivateXd (
-  VOID
-  );
-
-/**
   Update page table according to protected memory ranges and the 4KB-page mapped memory ranges.
 
 **/
@@ -114,7 +105,13 @@ InitPaging (
   VOID
   );
 
+//
+// The flag indicates if execute-disable is supported by processor.
+//
 extern BOOLEAN    mXdSupported;
+//
+// The flag indicates if execute-disable is enabled on processor.
+//
 extern BOOLEAN    mXdEnabled;
 
 #endif // _SMM_PROFILE_H_
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c
index 9cee784..b3e50a4 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c
@@ -18,6 +18,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 #define ACC_MAX_BIT                 BIT3
 LIST_ENTRY                          mPagePool = INITIALIZE_LIST_HEAD_VARIABLE (mPagePool);
 BOOLEAN                             m1GPageTableSupport = FALSE;
+UINT8                               mPhysicalAddressBits;
+BOOLEAN                             mCpuSmmStaticPageTable;
 
 /**
   Check if 1-GByte pages is supported by processor or not.
@@ -86,6 +88,146 @@ GetSubEntriesNum (
 }
 
 /**
+  Calculate the maximum support address.
+
+  @return the maximum support address.
+**/
+UINT8
+CalculateMaximumSupportAddress (
+  VOID
+  )
+{
+  UINT32                                        RegEax;
+  UINT8                                         PhysicalAddressBits;
+  VOID                                          *Hob;
+
+  //
+  // Get physical address bits supported.
+  //
+  Hob = GetFirstHob (EFI_HOB_TYPE_CPU);
+  if (Hob != NULL) {
+    PhysicalAddressBits = ((EFI_HOB_CPU *) Hob)->SizeOfMemorySpace;
+  } else {
+    AsmCpuid (0x80000000, &RegEax, NULL, NULL, NULL);
+    if (RegEax >= 0x80000008) {
+      AsmCpuid (0x80000008, &RegEax, NULL, NULL, NULL);
+      PhysicalAddressBits = (UINT8) RegEax;
+    } else {
+      PhysicalAddressBits = 36;
+    }
+  }
+
+  //
+  // IA-32e paging translates 48-bit linear addresses to 52-bit physical addresses.
+  //
+  ASSERT (PhysicalAddressBits <= 52);
+  if (PhysicalAddressBits > 48) {
+    PhysicalAddressBits = 48;
+  }
+  return PhysicalAddressBits;
+}
+
+/**
+  Set static page table.
+
+  @param[in] PageTable     Address of page table.
+**/
+VOID
+SetStaticPageTable (
+  IN UINTN               PageTable
+  )
+{
+  UINT64                                        PageAddress;
+  UINTN                                         NumberOfPml4EntriesNeeded;
+  UINTN                                         NumberOfPdpEntriesNeeded;
+  UINTN                                         IndexOfPml4Entries;
+  UINTN                                         IndexOfPdpEntries;
+  UINTN                                         IndexOfPageDirectoryEntries;
+  UINT64                                        *PageMapLevel4Entry;
+  UINT64                                        *PageMap;
+  UINT64                                        *PageDirectoryPointerEntry;
+  UINT64                                        *PageDirectory1GEntry;
+  UINT64                                        *PageDirectoryEntry;
+
+  if (mPhysicalAddressBits <= 39 ) {
+    NumberOfPml4EntriesNeeded = 1;
+    NumberOfPdpEntriesNeeded = (UINT32)LShiftU64 (1, (mPhysicalAddressBits - 30));
+  } else {
+    NumberOfPml4EntriesNeeded = (UINT32)LShiftU64 (1, (mPhysicalAddressBits - 39));
+    NumberOfPdpEntriesNeeded = 512;
+  }
+
+  //
+  // By architecture only one PageMapLevel4 exists - so lets allocate storage for it.
+  //
+  PageMap         = (VOID *) PageTable;
+
+  PageMapLevel4Entry = PageMap;
+  PageAddress        = 0;
+  for (IndexOfPml4Entries = 0; IndexOfPml4Entries < NumberOfPml4EntriesNeeded; IndexOfPml4Entries++, PageMapLevel4Entry++) {
+    //
+    // Each PML4 entry points to a page of Page Directory Pointer entries.
+    //
+    PageDirectoryPointerEntry = (UINT64 *) ((*PageMapLevel4Entry) & gPhyMask);
+    if (PageDirectoryPointerEntry == NULL) {
+      PageDirectoryPointerEntry = AllocatePageTableMemory (1);
+      ASSERT(PageDirectoryPointerEntry != NULL);
+      ZeroMem (PageDirectoryPointerEntry, EFI_PAGES_TO_SIZE(1));
+
+      *PageMapLevel4Entry = ((UINTN)PageDirectoryPointerEntry & gPhyMask)  | PAGE_ATTRIBUTE_BITS;
+    }
+
+    if (m1GPageTableSupport) {
+      PageDirectory1GEntry = PageDirectoryPointerEntry;
+      for (IndexOfPageDirectoryEntries = 0; IndexOfPageDirectoryEntries < 512; IndexOfPageDirectoryEntries++, PageDirectory1GEntry++, PageAddress += SIZE_1GB) {
+        if (IndexOfPml4Entries == 0 && IndexOfPageDirectoryEntries < 4) {
+          //
+          // Skip the < 4G entries
+          //
+          continue;
+        }
+        //
+        // Fill in the Page Directory entries
+        //
+        *PageDirectory1GEntry = (PageAddress & gPhyMask) | IA32_PG_PS | PAGE_ATTRIBUTE_BITS;
+      }
+    } else {
+      PageAddress = BASE_4GB;
+      for (IndexOfPdpEntries = 0; IndexOfPdpEntries < NumberOfPdpEntriesNeeded; IndexOfPdpEntries++, PageDirectoryPointerEntry++) {
+        if (IndexOfPml4Entries == 0 && IndexOfPdpEntries < 4) {
+          //
+          // Skip the < 4G entries
+          //
+          continue;
+        }
+        //
+        // Each Directory Pointer entries points to a page of Page Directory entires.
+        // So allocate space for them and fill them in in the IndexOfPageDirectoryEntries loop.
+        //
+        PageDirectoryEntry = (UINT64 *) ((*PageDirectoryPointerEntry) & gPhyMask);
+        if (PageDirectoryEntry == NULL) {
+          PageDirectoryEntry = AllocatePageTableMemory (1);
+          ASSERT(PageDirectoryEntry != NULL);
+          ZeroMem (PageDirectoryEntry, EFI_PAGES_TO_SIZE(1));
+
+          //
+          // Fill in a Page Directory Pointer Entries
+          //
+          *PageDirectoryPointerEntry = (UINT64)(UINTN)PageDirectoryEntry | PAGE_ATTRIBUTE_BITS;
+        }
+
+        for (IndexOfPageDirectoryEntries = 0; IndexOfPageDirectoryEntries < 512; IndexOfPageDirectoryEntries++, PageDirectoryEntry++, PageAddress += SIZE_2MB) {
+          //
+          // Fill in the Page Directory entries
+          //
+          *PageDirectoryEntry = (UINT64)PageAddress | IA32_PG_PS | PAGE_ATTRIBUTE_BITS;
+        }
+      }
+    }
+  }
+}
+
+/**
   Create PageTable for SMM use.
 
   @return The address of PML4 (to set CR3).
@@ -108,11 +250,17 @@ SmmInitPageTable (
   //
   InitializeSpinLock (mPFLock);
 
+  mCpuSmmStaticPageTable = PcdGetBool (PcdCpuSmmStaticPageTable);
   m1GPageTableSupport = Is1GPageSupport ();
+  DEBUG ((DEBUG_INFO, "1GPageTableSupport - 0x%x\n", m1GPageTableSupport));
+  DEBUG ((DEBUG_INFO, "PcdCpuSmmStaticPageTable - 0x%x\n", mCpuSmmStaticPageTable));
+
+  mPhysicalAddressBits = CalculateMaximumSupportAddress ();
+  DEBUG ((DEBUG_INFO, "PhysicalAddressBits - 0x%x\n", mPhysicalAddressBits));
   //
   // Generate PAE page table for the first 4GB memory space
   //
-  Pages = Gen4GPageTable (PAGE_TABLE_PAGES + 1, FALSE);
+  Pages = Gen4GPageTable (FALSE);
 
   //
   // Set IA32_PG_PMNT bit to mask this entry
@@ -125,21 +273,28 @@ SmmInitPageTable (
   //
   // Fill Page-Table-Level4 (PML4) entry
   //
-  PTEntry = (UINT64*)(UINTN)(Pages - EFI_PAGES_TO_SIZE (PAGE_TABLE_PAGES + 1));
-  *PTEntry = Pages + PAGE_ATTRIBUTE_BITS;
+  PTEntry = (UINT64*)AllocatePageTableMemory (1);
+  ASSERT (PTEntry != NULL);
+  *PTEntry = Pages | PAGE_ATTRIBUTE_BITS;
   ZeroMem (PTEntry + 1, EFI_PAGE_SIZE - sizeof (*PTEntry));
+
   //
   // Set sub-entries number
   //
   SetSubEntriesNum (PTEntry, 3);
 
-  //
-  // Add remaining pages to page pool
-  //
-  FreePage = (LIST_ENTRY*)(PTEntry + EFI_PAGE_SIZE / sizeof (*PTEntry));
-  while ((UINTN)FreePage < Pages) {
-    InsertTailList (&mPagePool, FreePage);
-    FreePage += EFI_PAGE_SIZE / sizeof (*FreePage);
+  if (mCpuSmmStaticPageTable) {
+    SetStaticPageTable ((UINTN)PTEntry);
+  } else {
+    //
+    // Add pages to page pool
+    //
+    FreePage = (LIST_ENTRY*)AllocatePageTableMemory (PAGE_TABLE_PAGES);
+    ASSERT (FreePage != NULL);
+    for (Index = 0; Index < PAGE_TABLE_PAGES; Index++) {
+      InsertTailList (&mPagePool, FreePage);
+      FreePage += EFI_PAGE_SIZE / sizeof (*FreePage);
+    }
   }
 
   if (FeaturePcdGet (PcdCpuSmmProfileEnable)) {
@@ -561,7 +716,7 @@ SmiDefaultPFHandler (
     break;
   case SmmPageSize1G:
     if (!m1GPageTableSupport) {
-      DEBUG ((EFI_D_ERROR, "1-GByte pages is not supported!"));
+      DEBUG ((DEBUG_ERROR, "1-GByte pages is not supported!"));
       ASSERT (FALSE);
     }
     //
@@ -612,8 +767,8 @@ SmiDefaultPFHandler (
       // Check if the entry has already existed, this issue may occur when the different
       // size page entries created under the same entry
       //
-      DEBUG ((EFI_D_ERROR, "PageTable = %lx, PTIndex = %x, PageTable[PTIndex] = %lx\n", PageTable, PTIndex, PageTable[PTIndex]));
-      DEBUG ((EFI_D_ERROR, "New page table overlapped with old page table!\n"));
+      DEBUG ((DEBUG_ERROR, "PageTable = %lx, PTIndex = %x, PageTable[PTIndex] = %lx\n", PageTable, PTIndex, PageTable[PTIndex]));
+      DEBUG ((DEBUG_ERROR, "New page table overlapped with old page table!\n"));
       ASSERT (FALSE);
     }
     //
@@ -654,13 +809,18 @@ SmiPFHandler (
 
   PFAddress = AsmReadCr2 ();
 
+  if (mCpuSmmStaticPageTable && (PFAddress >= LShiftU64 (1, (mPhysicalAddressBits - 1)))) {
+    DEBUG ((DEBUG_ERROR, "Do not support address 0x%lx by processor!\n", PFAddress));
+    CpuDeadLoop ();
+  }
+
   //
   // If a page fault occurs in SMRAM range, it should be in a SMM stack guard page.
   //
   if ((FeaturePcdGet (PcdCpuSmmStackGuard)) &&
       (PFAddress >= mCpuHotPlugData.SmrrBase) &&
       (PFAddress < (mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize))) {
-    DEBUG ((EFI_D_ERROR, "SMM stack overflow!\n"));
+    DEBUG ((DEBUG_ERROR, "SMM stack overflow!\n"));
     CpuDeadLoop ();
   }
 
@@ -670,7 +830,7 @@ SmiPFHandler (
   if ((PFAddress < mCpuHotPlugData.SmrrBase) ||
       (PFAddress >= mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize)) {
     if ((SystemContext.SystemContextX64->ExceptionData & IA32_PF_EC_ID) != 0) {
-      DEBUG ((EFI_D_ERROR, "Code executed on IP(0x%lx) out of SMM range after SMM is locked!\n", PFAddress));
+      DEBUG ((DEBUG_ERROR, "Code executed on IP(0x%lx) out of SMM range after SMM is locked!\n", PFAddress));
       DEBUG_CODE (
         DumpModuleInfoByIp (*(UINTN *)(UINTN)SystemContext.SystemContextX64->Rsp);
       );
@@ -689,3 +849,87 @@ SmiPFHandler (
 
   ReleaseSpinLock (mPFLock);
 }
+
+/**
+  This function sets memory attribute for page table.
+**/
+VOID
+SetPageTableAttributes (
+  VOID
+  )
+{
+  UINTN                 Index2;
+  UINTN                 Index3;
+  UINTN                 Index4;
+  UINT64                *L1PageTable;
+  UINT64                *L2PageTable;
+  UINT64                *L3PageTable;
+  UINT64                *L4PageTable;
+  BOOLEAN               IsSplitted;
+  BOOLEAN               PageTableSplitted;
+
+  if (!mCpuSmmStaticPageTable) {
+    return ;
+  }
+
+  DEBUG ((DEBUG_INFO, "SetPageTableAttributes\n"));
+
+  //
+  // Disable write protection, because we need mark page table to be write protected.
+  // We need *write* page table memory, to mark itself to be *read only*.
+  //
+  AsmWriteCr0 (AsmReadCr0() & ~CR0_WP);
+
+  do {
+    DEBUG ((DEBUG_INFO, "Start...\n"));
+    PageTableSplitted = FALSE;
+
+    L4PageTable = (UINT64 *)GetPageTableBase ();
+    SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L4PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+    PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+    for (Index4 = 0; Index4 < SIZE_4KB/sizeof(UINT64); Index4++) {
+      L3PageTable = (UINT64 *)(UINTN)(L4PageTable[Index4] & PAGING_4K_ADDRESS_MASK_64);
+      if (L3PageTable == NULL) {
+        continue;
+      }
+
+      SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L3PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+      PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+      for (Index3 = 0; Index3 < SIZE_4KB/sizeof(UINT64); Index3++) {
+        if ((L3PageTable[Index3] & IA32_PG_PS) != 0) {
+          // 1G
+          continue;
+        }
+        L2PageTable = (UINT64 *)(UINTN)(L3PageTable[Index3] & PAGING_4K_ADDRESS_MASK_64);
+        if (L2PageTable == NULL) {
+          continue;
+        }
+
+        SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L2PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+        PageTableSplitted = (PageTableSplitted || IsSplitted);
+
+        for (Index2 = 0; Index2 < SIZE_4KB/sizeof(UINT64); Index2++) {
+          if ((L2PageTable[Index2] & IA32_PG_PS) != 0) {
+            // 2M
+            continue;
+          }
+          L1PageTable = (UINT64 *)(UINTN)(L2PageTable[Index2] & PAGING_4K_ADDRESS_MASK_64);
+          if (L1PageTable == NULL) {
+            continue;
+          }
+          SmmSetMemoryAttributesEx ((EFI_PHYSICAL_ADDRESS)(UINTN)L1PageTable, SIZE_4KB, EFI_MEMORY_RO, &IsSplitted);
+          PageTableSplitted = (PageTableSplitted || IsSplitted);
+        }
+      }
+    }
+  } while (PageTableSplitted);
+
+  //
+  // Enable write protection, after page table updated.
+  //
+  AsmWriteCr0 (AsmReadCr0() | CR0_WP);
+
+  return ;
+}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S
index 7e9ac58..a425830 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S
@@ -1,6 +1,6 @@
 #------------------------------------------------------------------------------
 #
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 # This program and the accompanying materials
 # are licensed and made available under the terms and conditions of the BSD License
 # which accompanies this distribution.  The full text of the license may be found at
@@ -24,8 +24,13 @@ ASM_GLOBAL  ASM_PFX(gcSmiHandlerSize)
 ASM_GLOBAL  ASM_PFX(gSmiCr3)
 ASM_GLOBAL  ASM_PFX(gSmiStack)
 ASM_GLOBAL  ASM_PFX(gSmbase)
+ASM_GLOBAL  ASM_PFX(mXdSupported)
 ASM_GLOBAL  ASM_PFX(gSmiHandlerIdtr)
 
+.equ            MSR_IA32_MISC_ENABLE, 0x1A0
+.equ            MSR_EFER, 0xc0000080
+.equ            MSR_EFER_XD, 0x800
+
 #
 # Constants relating to PROCESSOR_SMM_DESCRIPTOR
 #
@@ -132,6 +137,29 @@ ASM_PFX(gSmiCr3):    .space  4
     movl    $TSS_SEGMENT, %eax
     ltr     %ax
 
+# enable NXE if supported
+    .byte   0xb0                        # mov al, imm8
+ASM_PFX(mXdSupported): .byte 1
+    cmpb    $0, %al
+    jz      NxeDone
+#
+# Check XD disable bit
+#
+    movl    $MSR_IA32_MISC_ENABLE, %ecx
+    rdmsr
+    subl    $4, %esp
+    pushq   %rdx                       # save MSR_IA32_MISC_ENABLE[63-32]
+    testl   $BIT2, %edx                # MSR_IA32_MISC_ENABLE[34]
+    jz      L13
+    andw    $0x0FFFB, %dx              # clear XD Disable bit if it is set
+    wrmsr
+L13:
+    movl    $MSR_EFER, %ecx
+    rdmsr
+    orw     $MSR_EFER_XD,%ax            # enable NXE
+    wrmsr
+NxeDone:
+
     #
     # Switch to LongMode
     #
@@ -139,12 +167,13 @@ ASM_PFX(gSmiCr3):    .space  4
     call     Base                         # push return address for retf later
 Base:
     addl    $(LongMode - Base), (%rsp)  # offset for far retf, seg is the 1st arg
-    movl    $0xc0000080, %ecx
+
+    movl    $MSR_EFER, %ecx
     rdmsr
-    orb     $1,%ah
+    orb     $1,%ah                      # enable LME
     wrmsr
     movq    %cr0, %rbx
-    orl     $0x080010000, %ebx          # enable paging + WP
+    orl     $0x080010023, %ebx          # enable paging + WP + NE + MP + PE
     movq    %rbx, %cr0
     retf
 LongMode:                               # long mode (64-bit code) starts here
@@ -162,10 +191,10 @@ LongMode:                               # long mode (64-bit code) starts here
 #   jmp     _SmiHandler                 ; instruction is not needed
 
 _SmiHandler:
-    movq    (%rsp), %rbx
+    movq    8(%rsp), %rbx
     # Save FP registers
 
-    subq    $0x208, %rsp
+    subq    $0x200, %rsp
     .byte   0x48                        # FXSAVE64
     fxsave  (%rsp)
 
@@ -191,6 +220,16 @@ _SmiHandler:
     .byte   0x48                        # FXRSTOR64
     fxrstor (%rsp)
 
+    addq    $0x200, %rsp
+    popq    %rdx                        # get saved MSR_IA32_MISC_ENABLE[63-32]
+    testl   $BIT2, %edx
+    jz      L16
+    movl    $MSR_IA32_MISC_ENABLE, %ecx
+    rdmsr
+    orw     $BIT2, %dx                  # set XD Disable bit if it was set before entering into SMM
+    wrmsr
+
+L16:
     rsm
 
 ASM_PFX(gcSmiHandlerSize):    .word      . - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm
index 094cf2c..74d320e 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm
@@ -1,5 +1,5 @@
 ;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 ; This program and the accompanying materials
 ; are licensed and made available under the terms and conditions of the BSD License
 ; which accompanies this distribution.  The full text of the license may be found at
@@ -29,8 +29,12 @@ EXTERNDEF   gcSmiHandlerSize:WORD
 EXTERNDEF   gSmiCr3:DWORD
 EXTERNDEF   gSmiStack:DWORD
 EXTERNDEF   gSmbase:DWORD
+EXTERNDEF   mXdSupported:BYTE
 EXTERNDEF   gSmiHandlerIdtr:FWORD
 
+MSR_IA32_MISC_ENABLE  EQU     1A0h
+MSR_EFER      EQU     0c0000080h
+MSR_EFER_XD   EQU     0800h
 
 ;
 ; Constants relating to PROCESSOR_SMM_DESCRIPTOR
@@ -130,17 +134,41 @@ gSmiCr3     DD      ?
     mov     eax, TSS_SEGMENT
     ltr     ax
 
+; enable NXE if supported
+    DB      0b0h                        ; mov al, imm8
+mXdSupported     DB      1
+    cmp     al, 0
+    jz      @SkipXd
+;
+; Check XD disable bit
+;
+    mov     ecx, MSR_IA32_MISC_ENABLE
+    rdmsr
+    sub     esp, 4
+    push    rdx                        ; save MSR_IA32_MISC_ENABLE[63-32]
+    test    edx, BIT2                  ; MSR_IA32_MISC_ENABLE[34]
+    jz      @f
+    and     dx, 0FFFBh                 ; clear XD Disable bit if it is set
+    wrmsr
+@@:
+    mov     ecx, MSR_EFER
+    rdmsr
+    or      ax, MSR_EFER_XD            ; enable NXE
+    wrmsr
+@SkipXd:
+
 ; Switch into @LongMode
     push    LONG_MODE_CS                ; push cs hardcore here
     call    Base                       ; push return address for retf later
 Base:
     add     dword ptr [rsp], @LongMode - Base; offset for far retf, seg is the 1st arg
-    mov     ecx, 0c0000080h
+
+    mov     ecx, MSR_EFER
     rdmsr
-    or      ah, 1
+    or      ah, 1                      ; enable LME
     wrmsr
     mov     rbx, cr0
-    or      ebx, 080010000h            ; enable paging + WP
+    or      ebx, 080010023h            ; enable paging + WP + NE + MP + PE
     mov     cr0, rbx
     retf
 @LongMode:                              ; long mode (64-bit code) starts here
@@ -163,7 +191,7 @@ _SmiHandler:
     ;
     ; Save FP registers
     ;
-    sub     rsp, 208h
+    sub     rsp, 200h
     DB      48h                         ; FXSAVE64
     fxsave  [rsp]
 
@@ -172,15 +200,15 @@ _SmiHandler:
     mov     rcx, rbx
     mov     rax, CpuSmmDebugEntry
     call    rax
-    
+
     mov     rcx, rbx
     mov     rax, SmiRendezvous          ; rax <- absolute addr of SmiRedezvous
     call    rax
-    
+
     mov     rcx, rbx
     mov     rax, CpuSmmDebugExit
     call    rax
-    
+
     add     rsp, 20h
 
     ;
@@ -189,6 +217,16 @@ _SmiHandler:
     DB      48h                         ; FXRSTOR64
     fxrstor [rsp]
 
+    add     rsp, 200h
+    pop     rdx                       ; get saved MSR_IA32_MISC_ENABLE[63-32]
+    test    edx, BIT2
+    jz      @f
+    mov     ecx, MSR_IA32_MISC_ENABLE
+    rdmsr
+    or      dx, BIT2                  ; set XD Disable bit if it was set before entering into SMM
+    wrmsr
+
+@@:
     rsm
 
 gcSmiHandlerSize    DW      $ - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm
index b717cda..5eb5cc6 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm
@@ -22,6 +22,10 @@
 ; Variables referrenced by C code
 ;
 
+%define MSR_IA32_MISC_ENABLE 0x1A0
+%define MSR_EFER      0xc0000080
+%define MSR_EFER_XD   0x800
+
 ;
 ; Constants relating to PROCESSOR_SMM_DESCRIPTOR
 ;
@@ -50,6 +54,7 @@ extern ASM_PFX(CpuSmmDebugEntry)
 extern ASM_PFX(CpuSmmDebugExit)
 
 global ASM_PFX(gSmbase)
+global ASM_PFX(mXdSupported)
 global ASM_PFX(gSmiStack)
 global ASM_PFX(gSmiCr3)
 global ASM_PFX(gcSmiHandlerTemplate)
@@ -69,7 +74,7 @@ _SmiEntryPoint:
     mov     [cs:bx + 2], eax
 o32 lgdt    [cs:bx]                       ; lgdt fword ptr cs:[bx]
     mov     ax, PROTECT_MODE_CS
-    mov     [cs:bx-0x2],ax    
+    mov     [cs:bx-0x2],ax
     DB      0x66, 0xbf                   ; mov edi, SMBASE
 ASM_PFX(gSmbase): DD 0
     lea     eax, [edi + (@ProtectedMode - _SmiEntryPoint) + 0x8000]
@@ -79,7 +84,7 @@ ASM_PFX(gSmbase): DD 0
     or      ebx, 0x23
     mov     cr0, ebx
     jmp     dword 0x0:0x0
-_GdtDesc:   
+_GdtDesc:
     DW 0
     DD 0
 
@@ -112,17 +117,41 @@ ASM_PFX(gSmiCr3): DD 0
     mov     eax, TSS_SEGMENT
     ltr     ax
 
+; enable NXE if supported
+    DB      0xb0                        ; mov al, imm8
+ASM_PFX(mXdSupported):     DB      1
+    cmp     al, 0
+    jz      @SkipXd
+;
+; Check XD disable bit
+;
+    mov     ecx, MSR_IA32_MISC_ENABLE
+    rdmsr
+    sub     esp, 4
+    push    rdx                        ; save MSR_IA32_MISC_ENABLE[63-32]
+    test    edx, BIT2                  ; MSR_IA32_MISC_ENABLE[34]
+    jz      .0
+    and     dx, 0xFFFB                 ; clear XD Disable bit if it is set
+    wrmsr
+.0:
+    mov     ecx, MSR_EFER
+    rdmsr
+    or      ax, MSR_EFER_XD            ; enable NXE
+    wrmsr
+@SkipXd:
+
 ; Switch into @LongMode
     push    LONG_MODE_CS                ; push cs hardcore here
-    call    Base                       ; push reture address for retf later
+    call    Base                       ; push return address for retf later
 Base:
     add     dword [rsp], @LongMode - Base; offset for far retf, seg is the 1st arg
-    mov     ecx, 0xc0000080
+
+    mov     ecx, MSR_EFER
     rdmsr
-    or      ah, 1
+    or      ah, 1                      ; enable LME
     wrmsr
     mov     rbx, cr0
-    or      ebx, 080010000h            ; enable paging + WP
+    or      ebx, 0x80010023            ; enable paging + WP + NE + MP + PE
     mov     cr0, rbx
     retf
 @LongMode:                              ; long mode (64-bit code) starts here
@@ -140,12 +169,12 @@ Base:
 ;   jmp     _SmiHandler                 ; instruction is not needed
 
 _SmiHandler:
-    mov     rbx, [rsp]                  ; rbx <- CpuIndex
+    mov     rbx, [rsp + 0x8]             ; rcx <- CpuIndex
 
     ;
     ; Save FP registers
     ;
-    sub     rsp, 0x208
+    sub     rsp, 0x200
     DB      0x48                         ; FXSAVE64
     fxsave  [rsp]
 
@@ -154,15 +183,15 @@ _SmiHandler:
     mov     rcx, rbx
     mov     rax, CpuSmmDebugEntry
     call    rax
-    
+
     mov     rcx, rbx
     mov     rax, SmiRendezvous          ; rax <- absolute addr of SmiRedezvous
     call    rax
-    
+
     mov     rcx, rbx
     mov     rax, CpuSmmDebugExit
     call    rax
-    
+
     add     rsp, 0x20
 
     ;
@@ -171,6 +200,16 @@ _SmiHandler:
     DB      0x48                         ; FXRSTOR64
     fxrstor [rsp]
 
+    add     rsp, 0x200
+    pop     rdx                       ; get saved MSR_IA32_MISC_ENABLE[63-32]
+    test    edx, BIT2
+    jz      .1
+    mov     ecx, MSR_IA32_MISC_ENABLE
+    rdmsr
+    or      dx, BIT2                  ; set XD Disable bit if it was set before entering into SMM
+    wrmsr
+
+.1:
     rsm
 
 gcSmiHandlerSize    DW      $ - _SmiEntryPoint
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S
index 2ae6f2c..2e2792d 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S
@@ -1,6 +1,6 @@
 #------------------------------------------------------------------------------
 #
-# Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+# Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 # This program and the accompanying materials
 # are licensed and made available under the terms and conditions of the BSD License
 # which accompanies this distribution.  The full text of the license may be found at
@@ -128,244 +128,8 @@ ASM_PFX(gcSmiGdtr):
     .quad      NullSeg
 
 ASM_PFX(gcSmiIdtr):
-    .word      IDT_SIZE - 1
-    .quad      _SmiIDT
-
-
-#
-# Here is the IDT. There are 32 (not 255) entries in it since only processor
-# generated exceptions will be handled.
-#
-_SmiIDT:
-# The following segment repeats 32 times:
-# No. 1
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 2
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 3
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 4
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 5
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 6
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 7
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 8
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 9
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 10
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 11
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 12
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 13
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 14
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 15
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 16
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 17
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 18
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 19
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 20
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 21
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 22
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 23
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 24
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 25
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 26
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 27
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 28
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 29
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 30
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 31
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-# No. 32
-    .word 0                             # Offset 0:15
-    .word CODE_SEL
-    .byte 0                             # Unused
-    .byte 0x8e                          # Interrupt Gate, Present
-    .word 0                             # Offset 16:31
-    .quad 0                             # Offset 32:63
-
-_SmiIDTEnd:
-
-.equ  IDT_SIZE, (_SmiIDTEnd - _SmiIDT)
+    .word      0
+    .quad      0
 
     .text
 
@@ -600,11 +364,3 @@ L5:
     addq    $16, %rsp                    # skip INT# & ErrCode
     iretq
 
-ASM_GLOBAL ASM_PFX(InitializeIDTSmmStackGuard)
-ASM_PFX(InitializeIDTSmmStackGuard):
-# If SMM Stack Guard feature is enabled, set the IST field of
-# the interrupt gate for Page Fault Exception to be 1
-#
-    movabsq  $_SmiIDT + 14 * 16, %rax
-    movb     $1, 4(%rax)
-    ret
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm
index ab71645..f55ba72 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm
@@ -1,5 +1,5 @@
 ;------------------------------------------------------------------------------ ;
-; Copyright (c) 2009 - 2015, Intel Corporation. All rights reserved.<BR>
+; Copyright (c) 2009 - 2016, Intel Corporation. All rights reserved.<BR>
 ; This program and the accompanying materials
 ; are licensed and made available under the terms and conditions of the BSD License
 ; which accompanies this distribution.  The full text of the license may be found at
@@ -144,27 +144,8 @@ gcSmiGdtr   LABEL   FWORD
     DQ      offset NullSeg
 
 gcSmiIdtr   LABEL   FWORD
-    DW      IDT_SIZE - 1
-    DQ      offset _SmiIDT
-
-    .data
-
-;
-; Here is the IDT. There are 32 (not 255) entries in it since only processor
-; generated exceptions will be handled.
-;
-_SmiIDT:
-REPEAT      32
-    DW      0                           ; Offset 0:15
-    DW      CODE_SEL                    ; Segment selector
-    DB      0                           ; Unused
-    DB      8eh                         ; Interrupt Gate, Present
-    DW      0                           ; Offset 16:31
-    DQ      0                           ; Offset 32:63
-            ENDM
-_SmiIDTEnd:
-
-IDT_SIZE = (offset _SmiIDTEnd - offset _SmiIDT)
+    DW      0
+    DQ      0
 
     .code
 
@@ -400,14 +381,4 @@ PageFaultIdtHandlerSmmProfile    PROC
     iretq
 PageFaultIdtHandlerSmmProfile ENDP
 
-InitializeIDTSmmStackGuard   PROC
-;
-; If SMM Stack Guard feature is enabled, set the IST field of
-; the interrupt gate for Page Fault Exception to be 1
-;
-    lea     rax, _SmiIDT + 14 * 16
-    mov     byte ptr [rax + 4], 1
-    ret
-InitializeIDTSmmStackGuard   ENDP
-
     END
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm
index 821ee18..bc8d95d 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm
@@ -145,25 +145,8 @@ ASM_PFX(gcSmiGdtr):
     DQ        NullSeg
 
 ASM_PFX(gcSmiIdtr):
-    DW      IDT_SIZE - 1
-    DQ        _SmiIDT
-
-;
-; Here is the IDT. There are 32 (not 255) entries in it since only processor
-; generated exceptions will be handled.
-;
-_SmiIDT:
-%rep 32
-    DW      0                           ;   0:15
-    DW      CODE_SEL                    ; Segment selector
-    DB      0                           ; Unused
-    DB      0x8e                         ; Interrupt Gate, Present
-    DW      0                           ;   16:31
-    DQ      0                           ;   32:63
-%endrep
-_SmiIDTEnd:
-
-IDT_SIZE equ  _SmiIDTEnd -   _SmiIDT
+    DW      0
+    DQ      0
 
     DEFAULT REL
     SECTION .text
@@ -400,13 +383,3 @@ ASM_PFX(PageFaultIdtHandlerSmmProfile):
     add     rsp, 16           ; skip INT# & ErrCode
     iretq
 
-global ASM_PFX(InitializeIDTSmmStackGuard)
-ASM_PFX(InitializeIDTSmmStackGuard):
-;
-; If SMM Stack Guard feature is enabled, set the IST field of
-; the interrupt gate for Page Fault Exception to be 1
-;
-    lea     rax, [_SmiIDT + 14 * 16]
-    mov     byte [rax + 4], 1
-    ret
-
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
index b53aa45..e2eca73 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
@@ -1,7 +1,7 @@
 /** @file
   SMM CPU misc functions for x64 arch specific.
   
-Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2015 - 2016, Intel Corporation. All rights reserved.<BR>
 This program and the accompanying materials
 are licensed and made available under the terms and conditions of the BSD License
 which accompanies this distribution.  The full text of the license may be found at
@@ -14,6 +14,30 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 
 #include "PiSmmCpuDxeSmm.h"
 
+EFI_PHYSICAL_ADDRESS                mGdtBuffer;
+UINTN                               mGdtBufferSize;
+
+/**
+  Initialize IDT for SMM Stack Guard.
+
+**/
+VOID
+EFIAPI
+InitializeIDTSmmStackGuard (
+  VOID
+  )
+{
+  IA32_IDT_GATE_DESCRIPTOR  *IdtGate;
+
+  //
+  // If SMM Stack Guard feature is enabled, set the IST field of
+  // the interrupt gate for Page Fault Exception to be 1
+  //
+  IdtGate = (IA32_IDT_GATE_DESCRIPTOR *)gcSmiIdtr.Base;
+  IdtGate += EXCEPT_IA32_PAGE_FAULT;
+  IdtGate->Bits.Reserved_0 = 1;
+}
+
 /**
   Initialize Gdt for all processors.
   
@@ -41,8 +65,10 @@ InitGdt (
   // on each SMI entry.
   //
   GdtTssTableSize = (gcSmiGdtr.Limit + 1 + TSS_SIZE + 7) & ~7; // 8 bytes aligned
-  GdtTssTables = (UINT8*)AllocatePages (EFI_SIZE_TO_PAGES (GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus));
+  mGdtBufferSize = GdtTssTableSize * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus;
+  GdtTssTables = (UINT8*)AllocateCodePages (EFI_SIZE_TO_PAGES (mGdtBufferSize));
   ASSERT (GdtTssTables != NULL);
+  mGdtBuffer = (UINTN)GdtTssTables;
   GdtTableStepSize = GdtTssTableSize;
 
   for (Index = 0; Index < gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus; Index++) {
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c
index 065fb2c..cc393dc 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c
@@ -1,7 +1,7 @@
 /** @file
 X64 processor specific functions to enable SMM profile.
 
-Copyright (c) 2012 - 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2012 - 2016, Intel Corporation. All rights reserved.<BR>
 This program and the accompanying materials
 are licensed and made available under the terms and conditions of the BSD License
 which accompanies this distribution.  The full text of the license may be found at
@@ -45,12 +45,13 @@ InitSmmS3Cr3 (
   //
   // Generate PAE page table for the first 4GB memory space
   //
-  Pages = Gen4GPageTable (1, FALSE);
+  Pages = Gen4GPageTable (FALSE);
 
   //
   // Fill Page-Table-Level4 (PML4) entry
   //
-  PTEntry = (UINT64*)(UINTN)(Pages - EFI_PAGES_TO_SIZE (1));
+  PTEntry = (UINT64*)AllocatePageTableMemory (1);
+  ASSERT (PTEntry != NULL);
   *PTEntry = Pages | PAGE_ATTRIBUTE_BITS;
   ZeroMem (PTEntry + 1, EFI_PAGE_SIZE - sizeof (*PTEntry));
 
-- 
2.7.4.windows.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm paging protection.
  2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
                   ` (4 preceding siblings ...)
  2016-11-04  9:30 ` [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection Jiewen Yao
@ 2016-11-04  9:30 ` Jiewen Yao
  2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
  2016-11-08  1:22 ` Laszlo Ersek
  7 siblings, 0 replies; 38+ messages in thread
From: Jiewen Yao @ 2016-11-04  9:30 UTC (permalink / raw)
  To: edk2-devel
  Cc: Michael D Kinney, Kelly Steele, Jeff Fan, Feng Tian, Star Zeng,
	Laszlo Ersek

Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Kelly Steele <kelly.steele@intel.com>
Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
---
 QuarkPlatformPkg/Quark.dsc | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/QuarkPlatformPkg/Quark.dsc b/QuarkPlatformPkg/Quark.dsc
index d5988da..9804b70 100644
--- a/QuarkPlatformPkg/Quark.dsc
+++ b/QuarkPlatformPkg/Quark.dsc
@@ -891,3 +891,9 @@
 
 [BuildOptions.common.EDKII.DXE_RUNTIME_DRIVER]
   MSFT:*_*_*_DLINK_FLAGS = /ALIGN:4096
+
+# Force PE/COFF sections to be aligned at 4KB boundaries to support page level protection of DXE_SMM_DRIVER/SMM_CORE modules
+[BuildOptions.common.EDKII.DXE_SMM_DRIVER, BuildOptions.common.EDKII.SMM_CORE]
+  MSFT:*_*_*_DLINK_FLAGS = /ALIGN:4096
+  GCC:*_*_*_DLINK_FLAGS = -z common-page-size=0x1000
+
-- 
2.7.4.windows.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
                   ` (5 preceding siblings ...)
  2016-11-04  9:30 ` [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm " Jiewen Yao
@ 2016-11-04 22:40 ` Laszlo Ersek
  2016-11-04 22:46   ` Yao, Jiewen
  2016-11-08  1:22 ` Laszlo Ersek
  7 siblings, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-04 22:40 UTC (permalink / raw)
  To: Jiewen Yao, edk2-devel; +Cc: Michael D Kinney, Feng Tian, Jeff Fan, Star Zeng

On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.

Jiewen, can you please push this series to a new branch in your repo?

I see a branch called "SmmProtection_V2", but it seems to end with an
incomplete patch (26f482d8b611d0fcb07d3ffbf3f4468fd249767b, subject
"pismmcpu"), so I figured I'd ask explicitly.

Thanks
Laszlo

> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
> 
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
> 
> Cc: Jeff Fan <jeff.fan@intel.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Star Zeng <star.zeng@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
> 
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.
> 
>  MdeModulePkg/Core/PiSmmCore/Dispatcher.c               |   66 +
>  MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c    | 1509 ++++++++++++++++++++
>  MdeModulePkg/Core/PiSmmCore/Page.c                     |  775 +++++++++-
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.c                |   40 +
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.h                |   91 ++
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf              |    2 +
>  MdeModulePkg/Core/PiSmmCore/Pool.c                     |   16 +
>  MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h |   51 +
>  MdeModulePkg/MdeModulePkg.dec                          |    3 +
>  QuarkPlatformPkg/Quark.dsc                             |    6 +
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c               |   71 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S              |   67 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm            |   68 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm           |   70 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S          |  226 +--
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm        |   36 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm       |   36 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c          |   37 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c        |    4 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c                  |  127 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c             |  142 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h             |  156 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf           |    5 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c     |  871 +++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c                 |   39 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h                 |   15 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c                |  274 +++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S               |   51 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm             |   54 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm            |   61 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S           |  250 +---
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm         |   35 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm        |   31 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c           |   30 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c         |    7 +-
>  UefiCpuPkg/UefiCpuPkg.dec                              |    8 +
>  36 files changed, 4529 insertions(+), 801 deletions(-)
>  create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
>  create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
>  create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
@ 2016-11-04 22:46   ` Yao, Jiewen
  2016-11-04 23:08     ` Laszlo Ersek
  0 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-04 22:46 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@ml01.01.org
  Cc: Kinney, Michael D, Tian, Feng, Fan, Jeff, Zeng, Star

Ah, yes. Laszlo. You are right.

I forget to push the last update yesterday. Thank you to remind me.
Now it is synced.

Thank you
Yao Jiewen

From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Saturday, November 5, 2016 6:40 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; edk2-devel@ml01.01.org
Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.

Jiewen, can you please push this series to a new branch in your repo?

I see a branch called "SmmProtection_V2", but it seems to end with an
incomplete patch (26f482d8b611d0fcb07d3ffbf3f4468fd249767b, subject
"pismmcpu"), so I figured I'd ask explicitly.

Thanks
Laszlo

> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.
>
>  MdeModulePkg/Core/PiSmmCore/Dispatcher.c               |   66 +
>  MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c    | 1509 ++++++++++++++++++++
>  MdeModulePkg/Core/PiSmmCore/Page.c                     |  775 +++++++++-
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.c                |   40 +
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.h                |   91 ++
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf              |    2 +
>  MdeModulePkg/Core/PiSmmCore/Pool.c                     |   16 +
>  MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h |   51 +
>  MdeModulePkg/MdeModulePkg.dec                          |    3 +
>  QuarkPlatformPkg/Quark.dsc                             |    6 +
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c               |   71 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S              |   67 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm            |   68 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm           |   70 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S          |  226 +--
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm        |   36 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm       |   36 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c          |   37 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c        |    4 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c                  |  127 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c             |  142 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h             |  156 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf           |    5 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c     |  871 +++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c                 |   39 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h                 |   15 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c                |  274 +++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S               |   51 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm             |   54 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm            |   61 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S           |  250 +---
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm         |   35 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm        |   31 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c           |   30 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c         |    7 +-
>  UefiCpuPkg/UefiCpuPkg.dec                              |    8 +
>  36 files changed, 4529 insertions(+), 801 deletions(-)
>  create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
>  create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
>  create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-04 22:46   ` Yao, Jiewen
@ 2016-11-04 23:08     ` Laszlo Ersek
  0 siblings, 0 replies; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-04 23:08 UTC (permalink / raw)
  To: Yao, Jiewen, edk2-devel@ml01.01.org
  Cc: Kinney, Michael D, Tian, Feng, Fan, Jeff, Zeng, Star

On 11/04/16 23:46, Yao, Jiewen wrote:
> Ah, yes. Laszlo. You are right.
> 
> I forget to push the last update yesterday. Thank you to remind me.
> Now it is synced.

Thanks! The commit message updates and the v1->v2 differences look
good/reasonable to me (I diffed the code-level end results of the two
versions, plus I compared the commit messages pairwise). I hope to test
v2 sometime next week, and I intend to look into the S3 instability too
(I took note of Paolo's advice with the "info tlb" QEMU monitor command).

Going through the (now documented) SMRAM impact again, I realize the
platform can elect to set PcdCpuSmmStaticPageTable dynamically as well.
I'm sort of guessing that we might want to set the PCD in OVMF's
PlatformPei, based on the guest-phys address width (which we also
calculate in PlatformPei), in combination with availability of 1G
paging. The case we should likely avoid is

>     A) If the system only supports 2M paging,
>     When the whole memory/MMIO is 48bit, we need 1+256+256*256 pages
>       (~ 257M)

Anyway, I don't want to be too clever about this until we see a problem
(out-of-SMRAM) in practice.

Thanks!
Laszlo

> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Saturday, November 5, 2016 6:40 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>; edk2-devel@ml01.01.org
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> 
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
> 
> Jiewen, can you please push this series to a new branch in your repo?
> 
> I see a branch called "SmmProtection_V2", but it seems to end with an
> incomplete patch (26f482d8b611d0fcb07d3ffbf3f4468fd249767b, subject
> "pismmcpu"), so I figured I'd ask explicitly.
> 
> Thanks
> Laszlo
> 
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2  X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>>
>> Jiewen Yao (6):
>>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>>   QuarkPlatformPkg/dsc: enable Smm paging protection.
>>
>>  MdeModulePkg/Core/PiSmmCore/Dispatcher.c               |   66 +
>>  MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c    | 1509 ++++++++++++++++++++
>>  MdeModulePkg/Core/PiSmmCore/Page.c                     |  775 +++++++++-
>>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.c                |   40 +
>>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.h                |   91 ++
>>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf              |    2 +
>>  MdeModulePkg/Core/PiSmmCore/Pool.c                     |   16 +
>>  MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h |   51 +
>>  MdeModulePkg/MdeModulePkg.dec                          |    3 +
>>  QuarkPlatformPkg/Quark.dsc                             |    6 +
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c               |   71 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S              |   67 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm            |   68 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm           |   70 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S          |  226 +--
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm        |   36 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm       |   36 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c          |   37 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c        |    4 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c                  |  127 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c             |  142 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h             |  156 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf           |    5 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c     |  871 +++++++++++
>>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c                 |   39 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h                 |   15 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c                |  274 +++-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S               |   51 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm             |   54 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm            |   61 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S           |  250 +---
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm         |   35 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm        |   31 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c           |   30 +-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c         |    7 +-
>>  UefiCpuPkg/UefiCpuPkg.dec                              |    8 +
>>  36 files changed, 4529 insertions(+), 801 deletions(-)
>>  create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
>>  create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
>>  create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
                   ` (6 preceding siblings ...)
  2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
@ 2016-11-08  1:22 ` Laszlo Ersek
  2016-11-08 12:59   ` Yao, Jiewen
                     ` (2 more replies)
  7 siblings, 3 replies; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-08  1:22 UTC (permalink / raw)
  To: Jiewen Yao
  Cc: edk2-devel, Michael D Kinney, Feng Tian, Jeff Fan, Star Zeng,
	Paolo Bonzini

On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>
> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Star Zeng <star.zeng@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>

I have new test results. Let's start with the table again:

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
 2 no      Ia32     255                             n/a                      52x2x2   pass       untested
 3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
 4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
 5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
 6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
 7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
 8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
 9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested

* Case 8: this test case failed with v2 as well, but this time with
  different symptoms:

> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)

  I didn't try to narrow this down.

* Case 13 (the "unreliable S3 resume" case): Here the news are both bad
  and good. The good news is for Jiewen: this patch series does not
  cause the unreliability, it "only" amplifies it severely. The bad news
  is correspondingly for everyone else: S3 resume is actually unreliable
  even in case 4, that is, without this series applied, it's just the
  failure rate is much-much lower.

  Namely, in my new testing, in case 13, S3 resume failed 8 times out of
  21 tries. (I stopped testing at the 8th failure.)

  Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
  exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
  #12 that failed; I continued testing and aborted the test after the
  55th try.)

  So, while the series hugely amplifies the failure rate, the failure
  does exist without the series. Which is why I modified the case 4
  results in the table, and also lower-cased the word "unreliable" in
  case 13.

  Below I will return to this problem separately; let's go over the rest
  of the table first.

* Case 17: I guess this is not a real failure, I'm just including it for
  completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
  additional SMRAM demand (see the commit message on patch V2 4/6). This
  case fails with

> SmmLockBox Command - 4
> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
> SmmLockBox SmmLockBoxHandler Exit
> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)

  which is an SMRAM allocation failure. If I lower the VCPU count to
  50x2x2, then the guest boots fine.

----*----

Before I get to the S3 resume problem (which, again, reproduces without
this series, although much less frequently), I'd like to comment on the
removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
function, on the return value of SmmBlockingStartupThisAp(). This change
allows v2 to proceed past that point; however, I'm seeing a whole lot of

> !mSmmMpSyncData->CpuData[1].Present
> !mSmmMpSyncData->CpuData[2].Present
> !mSmmMpSyncData->CpuData[3].Present
> ...

messages in the OVMF boot log, interspersed with

> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065

style messages. (That is, one error message for each AP, per
ConvertPageEntryAttribute() message.)

Is this okay / intentional? The number of these messages can go up to
several thousands and that sort of drowns out everything else in the
log.

It's also not easy to mask the message, because it's logged on the
DEBUG_ERROR level.

----*----

* Okay, so the S3 problem. Last time I suspected that the failure point
  (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
  9A1D0, according to the OVMF log). In order to test this idea, I
  exercised this series with S3 against a Windows 8.1 guest (--> case 13
  again). The failure reproduced on the second S3 resume, with identical
  RIP, despite the Windows wakeup vector being located elsewhere (at
  0x1000).

  Quoting the OVMF log leading up to the resume:

> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 1000

  QEMU log (same as before):

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  So, we can exclude the suspicion that the problem is guest OS
  dependent.

* Then I looked for the base address of the page containing the
  RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
  some firmware component might have allocated that area actually. Here
  we go:

> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000

  That is, the failure hits (when it hits -- not always) in the area
  where the CpuMpPei driver *borrows* memory for the startup vector of
  the APs, for the purposes of the MP service PPI. ("Wakeup" is an
  overloaded word here; the "wakeup buffer" has nothing to do with S3
  resume, it just serves for booting the APs temporarily in PEI, for
  implementing the MP service PPI.)

  When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
  the original contents of this area. This occurs just before
  transfering control to the guest OS wakeup vector: see the
  "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
  quoted from the OVMF log.

  I documented (parts of) this logic in OVMF commit

    https://github.com/tianocore/edk2/commit/e3e3090a959a0

  (see the code comments as well).

* At that time, I thought to have identified a memory management bug in
  CpuMpPei; see the following discussion and bug report for details:

    https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
    https://bugzilla.tianocore.org/show_bug.cgi?id=67

  However, with the extraction / introduction of MpInitLib, this issue
  has been fixed: GetWakeupBuffer() now calls
  CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
  no longer; we shouldn't be looking there for the root cause.

* Either way, I don't understand why anything would want to execute code
  in the one page that happens to host the MP services PPI startup
  buffer for APs during PEI.

  Not understanding the "why", I looked at the "what", and resorted to
  tracing KVM. Because the problem readily reproduces with this series
  applied (case 13), it wasn't hard to start the tracing while the guest
  was suspended, and capture just the actions that led from the
  KVM-level wakeup to the failure.

  The QEMU state dumps are visible above in the email. I've also
  uploaded the compressed OVMF log and the textual KVM trace here:

    http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/

  I sincerely hope that Paolo will have a field day with the KVM trace
  :) I managed to identify the following curiosities (remember this is
  all on the S3 resume path):

  * First, the VCPUs (there are four of them) enter and leave SMM in a
    really funky pattern:

      vcpu#0  vcpu#1  vcpu#2  vcpu#3
      ------  ------  ------  ------
              enter
               |
              leave

                      enter
                        |
                      leave

                              enter
                                |
                              leave

      enter
        |
      leave

              enter           enter
       enter    |     enter     |
         |      |       |       |
       leave    |       |       |
                |       |       |
       enter    |       |       |
         |      |       |       |
       leave  leave   leave   leave

    That is, first we have each VCPU enter and leave SMM in complete
    isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
    followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
    temporarily (it comes back in later), while the other three remain
    in SMM. Finally all four of them leave SMM together.

    After which the problem occurs.

  * Second, the instruction that causes things to blow up is <0f aa>,
    i.e., RSM. I have absolutely no clue why RSM is executed:

    (a) in the area that used to host the AP startup routine for the MP
    services PPI -- note that we also have "Transfer to 16bit OS waking
    vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
    area completeley! --,

    (b) and why *after* all four VCPUs have just left SMM, together.

  * The RSM instruction is handled successfully elsewhere, for example
    when all four VCPUs leave SMM, at the bottom of the diagram above:

> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa

  * The guest-phys address 7ff7f000 that we see just before the error:

> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)

    can be found higher up in the trace; namely, it is written to CR3
    several times. It's the root of the page tables.

  * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.

* I also tried the "info tlb" monitor command, via "virsh
  qemu-monitor-command --hmp", while the guest was auto-paused after the
  crash.

  I cannot provide results: QEMU appeared to return a message that would
  be longer than 16MB after encoding by libvirt, and libvirt rejected
  that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).

  Anyway, the KVM trace, and the QEMU register dump, look consistent
  with what Paolo said about "Code=?? ?? ??...":

    The question marks usually mean that the page tables do not map a
    page at that address.

  CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
  (SMM=0). We can't translate *any* guest-virtual address, as we can't
  even begin walking the page tables.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-08  1:22 ` Laszlo Ersek
@ 2016-11-08 12:59   ` Yao, Jiewen
  2016-11-08 13:22     ` Laszlo Ersek
  2016-11-09  6:25   ` Yao, Jiewen
  2016-11-09 11:23   ` Paolo Bonzini
  2 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-08 12:59 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

HI Laszlo
Thanks for the detail test result.

Quick comment for the debug message:

1)      For "ConvertPageEntryAttribute 0x7F92B067->0x7F92B065", I agree to change to DEBUG_VERBOSE, because it pure debug purpose.


2)      For "!mSmmMpSyncData->CpuData[1].Present", I think people has interest to know startup failure reason. I would prefer to keep current DEBUG_ERROR.



At same time, I understand your OVMF concern on too many debug message in FlushTlb. So I plan to resolve problem in another way.

I will check "mSmmMpSyncData->CpuData[1].Present" before calling SmmBlockingStartupThisAp(). So you will not see any debug message in FlashTlb(). :)

What about your idea?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Tuesday, November 8, 2016 9:22 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>
> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>

I have new test results. Let's start with the table again:

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
 2 no      Ia32     255                             n/a                      52x2x2   pass       untested
 3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
 4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
 5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
 6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
 7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
 8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
 9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested

* Case 8: this test case failed with v2 as well, but this time with
  different symptoms:

> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)

  I didn't try to narrow this down.

* Case 13 (the "unreliable S3 resume" case): Here the news are both bad
  and good. The good news is for Jiewen: this patch series does not
  cause the unreliability, it "only" amplifies it severely. The bad news
  is correspondingly for everyone else: S3 resume is actually unreliable
  even in case 4, that is, without this series applied, it's just the
  failure rate is much-much lower.

  Namely, in my new testing, in case 13, S3 resume failed 8 times out of
  21 tries. (I stopped testing at the 8th failure.)

  Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
  exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
  #12 that failed; I continued testing and aborted the test after the
  55th try.)

  So, while the series hugely amplifies the failure rate, the failure
  does exist without the series. Which is why I modified the case 4
  results in the table, and also lower-cased the word "unreliable" in
  case 13.

  Below I will return to this problem separately; let's go over the rest
  of the table first.

* Case 17: I guess this is not a real failure, I'm just including it for
  completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
  additional SMRAM demand (see the commit message on patch V2 4/6). This
  case fails with

> SmmLockBox Command - 4
> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
> SmmLockBox SmmLockBoxHandler Exit
> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)

  which is an SMRAM allocation failure. If I lower the VCPU count to
  50x2x2, then the guest boots fine.

----*----

Before I get to the S3 resume problem (which, again, reproduces without
this series, although much less frequently), I'd like to comment on the
removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
function, on the return value of SmmBlockingStartupThisAp(). This change
allows v2 to proceed past that point; however, I'm seeing a whole lot of

> !mSmmMpSyncData->CpuData[1].Present
> !mSmmMpSyncData->CpuData[2].Present
> !mSmmMpSyncData->CpuData[3].Present
> ...

messages in the OVMF boot log, interspersed with

> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065

style messages. (That is, one error message for each AP, per
ConvertPageEntryAttribute() message.)

Is this okay / intentional? The number of these messages can go up to
several thousands and that sort of drowns out everything else in the
log.

It's also not easy to mask the message, because it's logged on the
DEBUG_ERROR level.

----*----

* Okay, so the S3 problem. Last time I suspected that the failure point
  (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
  9A1D0, according to the OVMF log). In order to test this idea, I
  exercised this series with S3 against a Windows 8.1 guest (--> case 13
  again). The failure reproduced on the second S3 resume, with identical
  RIP, despite the Windows wakeup vector being located elsewhere (at
  0x1000).

  Quoting the OVMF log leading up to the resume:

> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 1000

  QEMU log (same as before):

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  So, we can exclude the suspicion that the problem is guest OS
  dependent.

* Then I looked for the base address of the page containing the
  RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
  some firmware component might have allocated that area actually. Here
  we go:

> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000

  That is, the failure hits (when it hits -- not always) in the area
  where the CpuMpPei driver *borrows* memory for the startup vector of
  the APs, for the purposes of the MP service PPI. ("Wakeup" is an
  overloaded word here; the "wakeup buffer" has nothing to do with S3
  resume, it just serves for booting the APs temporarily in PEI, for
  implementing the MP service PPI.)

  When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
  the original contents of this area. This occurs just before
  transfering control to the guest OS wakeup vector: see the
  "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
  quoted from the OVMF log.

  I documented (parts of) this logic in OVMF commit

    https://github.com/tianocore/edk2/commit/e3e3090a959a0

  (see the code comments as well).

* At that time, I thought to have identified a memory management bug in
  CpuMpPei; see the following discussion and bug report for details:

    https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
    https://bugzilla.tianocore.org/show_bug.cgi?id=67

  However, with the extraction / introduction of MpInitLib, this issue
  has been fixed: GetWakeupBuffer() now calls
  CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
  no longer; we shouldn't be looking there for the root cause.

* Either way, I don't understand why anything would want to execute code
  in the one page that happens to host the MP services PPI startup
  buffer for APs during PEI.

  Not understanding the "why", I looked at the "what", and resorted to
  tracing KVM. Because the problem readily reproduces with this series
  applied (case 13), it wasn't hard to start the tracing while the guest
  was suspended, and capture just the actions that led from the
  KVM-level wakeup to the failure.

  The QEMU state dumps are visible above in the email. I've also
  uploaded the compressed OVMF log and the textual KVM trace here:

    http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/

  I sincerely hope that Paolo will have a field day with the KVM trace
  :) I managed to identify the following curiosities (remember this is
  all on the S3 resume path):

  * First, the VCPUs (there are four of them) enter and leave SMM in a
    really funky pattern:

      vcpu#0  vcpu#1  vcpu#2  vcpu#3
      ------  ------  ------  ------
              enter
               |
              leave

                      enter
                        |
                      leave

                              enter
                                |
                              leave

      enter
        |
      leave

              enter           enter
       enter    |     enter     |
         |      |       |       |
       leave    |       |       |
                |       |       |
       enter    |       |       |
         |      |       |       |
       leave  leave   leave   leave

    That is, first we have each VCPU enter and leave SMM in complete
    isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
    followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
    temporarily (it comes back in later), while the other three remain
    in SMM. Finally all four of them leave SMM together.

    After which the problem occurs.

  * Second, the instruction that causes things to blow up is <0f aa>,
    i.e., RSM. I have absolutely no clue why RSM is executed:

    (a) in the area that used to host the AP startup routine for the MP
    services PPI -- note that we also have "Transfer to 16bit OS waking
    vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
    area completeley! --,

    (b) and why *after* all four VCPUs have just left SMM, together.

  * The RSM instruction is handled successfully elsewhere, for example
    when all four VCPUs leave SMM, at the bottom of the diagram above:

> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa

  * The guest-phys address 7ff7f000 that we see just before the error:

> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)

    can be found higher up in the trace; namely, it is written to CR3
    several times. It's the root of the page tables.

  * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.

* I also tried the "info tlb" monitor command, via "virsh
  qemu-monitor-command --hmp", while the guest was auto-paused after the
  crash.

  I cannot provide results: QEMU appeared to return a message that would
  be longer than 16MB after encoding by libvirt, and libvirt rejected
  that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).

  Anyway, the KVM trace, and the QEMU register dump, look consistent
  with what Paolo said about "Code=?? ?? ??...":

    The question marks usually mean that the page tables do not map a
    page at that address.

  CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
  (SMM=0). We can't translate *any* guest-virtual address, as we can't
  even begin walking the page tables.

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-08 12:59   ` Yao, Jiewen
@ 2016-11-08 13:22     ` Laszlo Ersek
  2016-11-08 13:41       ` Yao, Jiewen
  0 siblings, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-08 13:22 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

On 11/08/16 13:59, Yao, Jiewen wrote:
> HI Laszlo
> 
> Thanks for the detail test result.
> 
>  
> 
> Quick comment for the debug message:
> 
> 1)      For “ConvertPageEntryAttribute 0x7F92B067->0x7F92B065”, I agree
> to change to DEBUG_VERBOSE, because it pure debug purpose.
> 
>  
> 
> 2)      For “!mSmmMpSyncData->CpuData[1].Present”, I think people has
> interest to know startup failure reason. I would prefer to keep current
> DEBUG_ERROR.

I agree that DEBUG_ERROR is approprite for messages that can directly
relate to startup failures.

However, does this condition unavoidably imply startup failure? Because,
as demonstrated by QEMU + OVMF, a platform where an SMI does not pull
all processors into SMM at once can still work with PiSmmCpuDxeSmm,
assuming the appropriate PCD settings.

Therefore, can we make this error message conditional on

  (mSmmMpSyncData->EffectiveSyncMode == SmmCpuSyncModeTradition)

? Because, "not present" is an error for the traditional sync mode, but
for the relaxed / directed mode, "not present" is expected. Isn't it?


> At same time, I understand your OVMF concern on too many debug message
> in FlushTlb. So I plan to resolve problem in another way.
> 
> I will check “mSmmMpSyncData->CpuData[1].Present” before calling
> SmmBlockingStartupThisAp(). So you will not see any debug message in
> FlashTlb(). J
> 
>  
> 
> What about your idea?

If we cannot omit (or downgrade) the message for
SmmCpuSyncModeRelaxedAp, then decreasing its frequency would be appreciated.

Thanks
Laszlo

>  
> 
> *From:*edk2-devel [mailto:edk2-devel-bounces@lists.01.org] *On Behalf Of
> *Laszlo Ersek
> *Sent:* Tuesday, November 8, 2016 9:22 AM
> *To:* Yao, Jiewen <jiewen.yao@intel.com>
> *Cc:* Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney,
> Michael D <michael.d.kinney@intel.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star
> <star.zeng@intel.com>
> *Subject:* Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> 
>  
> 
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2  X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com <mailto:jeff.fan@intel.com>>
>> Cc: Feng Tian <feng.tian@intel.com <mailto:feng.tian@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com <mailto:star.zeng@intel.com>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>>
>> Cc: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>
> 
> I have new test results. Let's start with the table again:
> 
> Legend:
> 
> - "untested" means the test was not executed because the same test
>   failed or proved unreliable in a less demanding configuration already,
> 
> - "n/a" means a setting or test case was impossible,
> 
> - "fail" and "unreliable" (lower case) are outside the scope of this
>   series; they either capture the pre-series status, or are expected
>   even with the series applied due to the pre-series status,
> 
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>   series.
> 
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
> 
>    series  OVMF                                                              VCPU     boot       S3 resume
>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>  1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
>  2 no      Ia32     255                             n/a                      52x2x2   pass       untested
>  3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
>  4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
>  5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
>  6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
>  7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
>  8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
>  9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
> 10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
> 11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
> 12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
> 13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
> 14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
> 15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
> 16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
> 17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
> 18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested
> 
> * Case 8: this test case failed with v2 as well, but this time with
>   different symptoms:
> 
>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitExitBootServicesCallback() done!
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
> 
>   I didn't try to narrow this down.
> 
> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>   and good. The good news is for Jiewen: this patch series does not
>   cause the unreliability, it "only" amplifies it severely. The bad news
>   is correspondingly for everyone else: S3 resume is actually unreliable
>   even in case 4, that is, without this series applied, it's just the
>   failure rate is much-much lower.
> 
>   Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>   21 tries. (I stopped testing at the 8th failure.)
> 
>   Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>   exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>   #12 that failed; I continued testing and aborted the test after the
>   55th try.)
> 
>   So, while the series hugely amplifies the failure rate, the failure
>   does exist without the series. Which is why I modified the case 4
>   results in the table, and also lower-cased the word "unreliable" in
>   case 13.
> 
>   Below I will return to this problem separately; let's go over the rest
>   of the table first.
> 
> * Case 17: I guess this is not a real failure, I'm just including it for
>   completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>   additional SMRAM demand (see the commit message on patch V2 4/6). This
>   case fails with
> 
>> SmmLockBox Command - 4
>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>> SmmLockBox SmmLockBoxHandler Exit
>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
> 
>   which is an SMRAM allocation failure. If I lower the VCPU count to
>   50x2x2, then the guest boots fine.
> 
> ----*----
> 
> Before I get to the S3 resume problem (which, again, reproduces without
> this series, although much less frequently), I'd like to comment on the
> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
> function, on the return value of SmmBlockingStartupThisAp(). This change
> allows v2 to proceed past that point; however, I'm seeing a whole lot of
> 
>> !mSmmMpSyncData->CpuData[1].Present
>> !mSmmMpSyncData->CpuData[2].Present
>> !mSmmMpSyncData->CpuData[3].Present
>> ...
> 
> messages in the OVMF boot log, interspersed with
> 
>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
> 
> style messages. (That is, one error message for each AP, per
> ConvertPageEntryAttribute() message.)
> 
> Is this okay / intentional? The number of these messages can go up to
> several thousands and that sort of drowns out everything else in the
> log.
> 
> It's also not easy to mask the message, because it's logged on the
> DEBUG_ERROR level.
> 
> ----*----
> 
> * Okay, so the S3 problem. Last time I suspected that the failure point
>   (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>   9A1D0, according to the OVMF log). In order to test this idea, I
>   exercised this series with S3 against a Windows 8.1 guest (--> case 13
>   again). The failure reproduced on the second S3 resume, with identical
>   RIP, despite the Windows wakeup vector being located elsewhere (at
>   0x1000).
> 
>   Quoting the OVMF log leading up to the resume:
> 
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 1000
> 
>   QEMU log (same as before):
> 
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> 
>   So, we can exclude the suspicion that the problem is guest OS
>   dependent.
> 
> * Then I looked for the base address of the page containing the
>   RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>   some firmware component might have allocated that area actually. Here
>   we go:
> 
>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
> 
>   That is, the failure hits (when it hits -- not always) in the area
>   where the CpuMpPei driver *borrows* memory for the startup vector of
>   the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>   overloaded word here; the "wakeup buffer" has nothing to do with S3
>   resume, it just serves for booting the APs temporarily in PEI, for
>   implementing the MP service PPI.)
> 
>   When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>   the original contents of this area. This occurs just before
>   transfering control to the guest OS wakeup vector: see the
>   "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>   quoted from the OVMF log.
> 
>   I documented (parts of) this logic in OVMF commit
> 
>     https://github.com/tianocore/edk2/commit/e3e3090a959a0
> 
>   (see the code comments as well).
> 
> * At that time, I thought to have identified a memory management bug in
>   CpuMpPei; see the following discussion and bug report for details:
> 
>     https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>     https://bugzilla.tianocore.org/show_bug.cgi?id=67
> 
>   However, with the extraction / introduction of MpInitLib, this issue
>   has been fixed: GetWakeupBuffer() now calls
>   CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>   no longer; we shouldn't be looking there for the root cause.
> 
> * Either way, I don't understand why anything would want to execute code
>   in the one page that happens to host the MP services PPI startup
>   buffer for APs during PEI.
> 
>   Not understanding the "why", I looked at the "what", and resorted to
>   tracing KVM. Because the problem readily reproduces with this series
>   applied (case 13), it wasn't hard to start the tracing while the guest
>   was suspended, and capture just the actions that led from the
>   KVM-level wakeup to the failure.
> 
>   The QEMU state dumps are visible above in the email. I've also
>   uploaded the compressed OVMF log and the textual KVM trace here:
> 
>     http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
> 
>   I sincerely hope that Paolo will have a field day with the KVM trace
>   :) I managed to identify the following curiosities (remember this is
>   all on the S3 resume path):
> 
>   * First, the VCPUs (there are four of them) enter and leave SMM in a
>     really funky pattern:
> 
>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>       ------  ------  ------  ------
>               enter
>                |
>               leave
> 
>                       enter
>                         |
>                       leave
> 
>                               enter
>                                 |
>                               leave
> 
>       enter
>         |
>       leave
> 
>               enter           enter
>        enter    |     enter     |
>          |      |       |       |
>        leave    |       |       |
>                 |       |       |
>        enter    |       |       |
>          |      |       |       |
>        leave  leave   leave   leave
> 
>     That is, first we have each VCPU enter and leave SMM in complete
>     isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>     followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>     temporarily (it comes back in later), while the other three remain
>     in SMM. Finally all four of them leave SMM together.
> 
>     After which the problem occurs.
> 
>   * Second, the instruction that causes things to blow up is <0f aa>,
>     i.e., RSM. I have absolutely no clue why RSM is executed:
> 
>     (a) in the area that used to host the AP startup routine for the MP
>     services PPI -- note that we also have "Transfer to 16bit OS waking
>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>     area completeley! --,
> 
>     (b) and why *after* all four VCPUs have just left SMM, together.
> 
>   * The RSM instruction is handled successfully elsewhere, for example
>     when all four VCPUs leave SMM, at the bottom of the diagram above:
> 
>> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
>> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
>> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
>> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
> 
>   * The guest-phys address 7ff7f000 that we see just before the error:
> 
>> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
>> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
>> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)
> 
>     can be found higher up in the trace; namely, it is written to CR3
>     several times. It's the root of the page tables.
> 
>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
> 
> * I also tried the "info tlb" monitor command, via "virsh
>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>   crash.
> 
>   I cannot provide results: QEMU appeared to return a message that would
>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
> 
>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>   with what Paolo said about "Code=?? ?? ??...":
> 
>     The question marks usually mean that the page tables do not map a
>     page at that address.
> 
>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>   even begin walking the page tables.
> 
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-08 13:22     ` Laszlo Ersek
@ 2016-11-08 13:41       ` Yao, Jiewen
  0 siblings, 0 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-08 13:41 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Yes, it is a good idea to check "(mSmmMpSyncData->EffectiveSyncMode == SmmCpuSyncModeTradition)".
I agree.

Thank you
Yao Jiewen

From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Tuesday, November 8, 2016 9:23 PM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

On 11/08/16 13:59, Yao, Jiewen wrote:
> HI Laszlo
>
> Thanks for the detail test result.
>
>
>
> Quick comment for the debug message:
>
> 1)      For "ConvertPageEntryAttribute 0x7F92B067->0x7F92B065", I agree
> to change to DEBUG_VERBOSE, because it pure debug purpose.
>
>
>
> 2)      For "!mSmmMpSyncData->CpuData[1].Present", I think people has
> interest to know startup failure reason. I would prefer to keep current
> DEBUG_ERROR.

I agree that DEBUG_ERROR is approprite for messages that can directly
relate to startup failures.

However, does this condition unavoidably imply startup failure? Because,
as demonstrated by QEMU + OVMF, a platform where an SMI does not pull
all processors into SMM at once can still work with PiSmmCpuDxeSmm,
assuming the appropriate PCD settings.

Therefore, can we make this error message conditional on

  (mSmmMpSyncData->EffectiveSyncMode == SmmCpuSyncModeTradition)

? Because, "not present" is an error for the traditional sync mode, but
for the relaxed / directed mode, "not present" is expected. Isn't it?


> At same time, I understand your OVMF concern on too many debug message
> in FlushTlb. So I plan to resolve problem in another way.
>
> I will check "mSmmMpSyncData->CpuData[1].Present" before calling
> SmmBlockingStartupThisAp(). So you will not see any debug message in
> FlashTlb(). J
>
>
>
> What about your idea?

If we cannot omit (or downgrade) the message for
SmmCpuSyncModeRelaxedAp, then decreasing its frequency would be appreciated.

Thanks
Laszlo

>
>
> *From:*edk2-devel [mailto:edk2-devel-bounces@lists.01.org] *On Behalf Of
> *Laszlo Ersek
> *Sent:* Tuesday, November 8, 2016 9:22 AM
> *To:* Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
> *Cc:* Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney,
> Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini
> <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star
> <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> *Subject:* Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
>
>
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2  X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com <mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com %3cmailto:jeff.fan@intel.com>>>
>> Cc: Feng Tian <feng.tian@intel.com <mailto:feng.tian@intel.com<mailto:feng.tian@intel.com %3cmailto:feng.tian@intel.com>>>
>> Cc: Star Zeng <star.zeng@intel.com <mailto:star.zeng@intel.com<mailto:star.zeng@intel.com %3cmailto:star.zeng@intel.com>>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com %3cmailto:michael.d.kinney@intel.com>>>
>> Cc: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com<mailto:lersek@redhat.com %3cmailto:lersek@redhat.com>>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com %3cmailto:jiewen.yao@intel.com>>>
>
> I have new test results. Let's start with the table again:
>
> Legend:
>
> - "untested" means the test was not executed because the same test
>   failed or proved unreliable in a less demanding configuration already,
>
> - "n/a" means a setting or test case was impossible,
>
> - "fail" and "unreliable" (lower case) are outside the scope of this
>   series; they either capture the pre-series status, or are expected
>   even with the series applied due to the pre-series status,
>
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>   series.
>
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
>
>    series  OVMF                                                              VCPU     boot       S3 resume
>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>  1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
>  2 no      Ia32     255                             n/a                      52x2x2   pass       untested
>  3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
>  4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
>  5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
>  6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
>  7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
>  8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
>  9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
> 10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
> 11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
> 12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
> 13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
> 14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
> 15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
> 16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
> 17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
> 18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested
>
> * Case 8: this test case failed with v2 as well, but this time with
>   different symptoms:
>
>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitExitBootServicesCallback() done!
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>
>   I didn't try to narrow this down.
>
> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>   and good. The good news is for Jiewen: this patch series does not
>   cause the unreliability, it "only" amplifies it severely. The bad news
>   is correspondingly for everyone else: S3 resume is actually unreliable
>   even in case 4, that is, without this series applied, it's just the
>   failure rate is much-much lower.
>
>   Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>   21 tries. (I stopped testing at the 8th failure.)
>
>   Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>   exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>   #12 that failed; I continued testing and aborted the test after the
>   55th try.)
>
>   So, while the series hugely amplifies the failure rate, the failure
>   does exist without the series. Which is why I modified the case 4
>   results in the table, and also lower-cased the word "unreliable" in
>   case 13.
>
>   Below I will return to this problem separately; let's go over the rest
>   of the table first.
>
> * Case 17: I guess this is not a real failure, I'm just including it for
>   completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>   additional SMRAM demand (see the commit message on patch V2 4/6). This
>   case fails with
>
>> SmmLockBox Command - 4
>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>> SmmLockBox SmmLockBoxHandler Exit
>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>
>   which is an SMRAM allocation failure. If I lower the VCPU count to
>   50x2x2, then the guest boots fine.
>
> ----*----
>
> Before I get to the S3 resume problem (which, again, reproduces without
> this series, although much less frequently), I'd like to comment on the
> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
> function, on the return value of SmmBlockingStartupThisAp(). This change
> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>
>> !mSmmMpSyncData->CpuData[1].Present
>> !mSmmMpSyncData->CpuData[2].Present
>> !mSmmMpSyncData->CpuData[3].Present
>> ...
>
> messages in the OVMF boot log, interspersed with
>
>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>
> style messages. (That is, one error message for each AP, per
> ConvertPageEntryAttribute() message.)
>
> Is this okay / intentional? The number of these messages can go up to
> several thousands and that sort of drowns out everything else in the
> log.
>
> It's also not easy to mask the message, because it's logged on the
> DEBUG_ERROR level.
>
> ----*----
>
> * Okay, so the S3 problem. Last time I suspected that the failure point
>   (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>   9A1D0, according to the OVMF log). In order to test this idea, I
>   exercised this series with S3 against a Windows 8.1 guest (--> case 13
>   again). The failure reproduced on the second S3 resume, with identical
>   RIP, despite the Windows wakeup vector being located elsewhere (at
>   0x1000).
>
>   Quoting the OVMF log leading up to the resume:
>
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 1000
>
>   QEMU log (same as before):
>
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>
>   So, we can exclude the suspicion that the problem is guest OS
>   dependent.
>
> * Then I looked for the base address of the page containing the
>   RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>   some firmware component might have allocated that area actually. Here
>   we go:
>
>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>
>   That is, the failure hits (when it hits -- not always) in the area
>   where the CpuMpPei driver *borrows* memory for the startup vector of
>   the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>   overloaded word here; the "wakeup buffer" has nothing to do with S3
>   resume, it just serves for booting the APs temporarily in PEI, for
>   implementing the MP service PPI.)
>
>   When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>   the original contents of this area. This occurs just before
>   transfering control to the guest OS wakeup vector: see the
>   "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>   quoted from the OVMF log.
>
>   I documented (parts of) this logic in OVMF commit
>
>     https://github.com/tianocore/edk2/commit/e3e3090a959a0
>
>   (see the code comments as well).
>
> * At that time, I thought to have identified a memory management bug in
>   CpuMpPei; see the following discussion and bug report for details:
>
>     https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>     https://bugzilla.tianocore.org/show_bug.cgi?id=67
>
>   However, with the extraction / introduction of MpInitLib, this issue
>   has been fixed: GetWakeupBuffer() now calls
>   CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>   no longer; we shouldn't be looking there for the root cause.
>
> * Either way, I don't understand why anything would want to execute code
>   in the one page that happens to host the MP services PPI startup
>   buffer for APs during PEI.
>
>   Not understanding the "why", I looked at the "what", and resorted to
>   tracing KVM. Because the problem readily reproduces with this series
>   applied (case 13), it wasn't hard to start the tracing while the guest
>   was suspended, and capture just the actions that led from the
>   KVM-level wakeup to the failure.
>
>   The QEMU state dumps are visible above in the email. I've also
>   uploaded the compressed OVMF log and the textual KVM trace here:
>
>     http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>
>   I sincerely hope that Paolo will have a field day with the KVM trace
>   :) I managed to identify the following curiosities (remember this is
>   all on the S3 resume path):
>
>   * First, the VCPUs (there are four of them) enter and leave SMM in a
>     really funky pattern:
>
>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>       ------  ------  ------  ------
>               enter
>                |
>               leave
>
>                       enter
>                         |
>                       leave
>
>                               enter
>                                 |
>                               leave
>
>       enter
>         |
>       leave
>
>               enter           enter
>        enter    |     enter     |
>          |      |       |       |
>        leave    |       |       |
>                 |       |       |
>        enter    |       |       |
>          |      |       |       |
>        leave  leave   leave   leave
>
>     That is, first we have each VCPU enter and leave SMM in complete
>     isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>     followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>     temporarily (it comes back in later), while the other three remain
>     in SMM. Finally all four of them leave SMM together.
>
>     After which the problem occurs.
>
>   * Second, the instruction that causes things to blow up is <0f aa>,
>     i.e., RSM. I have absolutely no clue why RSM is executed:
>
>     (a) in the area that used to host the AP startup routine for the MP
>     services PPI -- note that we also have "Transfer to 16bit OS waking
>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>     area completeley! --,
>
>     (b) and why *after* all four VCPUs have just left SMM, together.
>
>   * The RSM instruction is handled successfully elsewhere, for example
>     when all four VCPUs leave SMM, at the bottom of the diagram above:
>
>> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
>> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
>> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
>> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
>
>   * The guest-phys address 7ff7f000 that we see just before the error:
>
>> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
>> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
>> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)
>
>     can be found higher up in the trace; namely, it is written to CR3
>     several times. It's the root of the page tables.
>
>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>   crash.
>
>   I cannot provide results: QEMU appeared to return a message that would
>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>   with what Paolo said about "Code=?? ?? ??...":
>
>     The question marks usually mean that the page tables do not map a
>     page at that address.
>
>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>   even begin walking the page tables.
>
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org> <mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-08  1:22 ` Laszlo Ersek
  2016-11-08 12:59   ` Yao, Jiewen
@ 2016-11-09  6:25   ` Yao, Jiewen
  2016-11-09 11:30     ` Paolo Bonzini
  2016-11-09 20:46     ` Laszlo Ersek
  2016-11-09 11:23   ` Paolo Bonzini
  2 siblings, 2 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-09  6:25 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Hi Laszlo
I will fix DEBUG message issue in V3 patch.

Below is rest issues:


l  Case 13: S3 fails randomly.
A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.


1)      We believe the dead CPU is AP. Not BSP.
The reason is that:

1.1)   The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.

1.2)   The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.

1.3)   The current dead CPU still has CR3 in SMM. (Which is obvious wrong)


2)      Based upon the 1), we reviewed S3 resume AP flow.
Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.


3)      The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.


4)      I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.


5)      The fix, I think, should be below:
We should always put AP to protected mode, so that no paging is needed.
We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.


Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.

There is no need to do more investigation. Thanks for your great help on that. :)




l  Case 17 - I do not think it is a real issue, because SMM is out of resource.


l  Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
SPIN_LOCK *
EFIAPI
InitializeSpinLock (
  OUT      SPIN_LOCK                 *SpinLock
  )
{
  ASSERT (SpinLock != NULL);

  _ReadWriteBarrier();
  *SpinLock = SPIN_LOCK_RELEASED;
  _ReadWriteBarrier();

  return SpinLock;
}

If you can have a quick check on below, that would be great.

1)      Which processor triggers this ASSERT? BSP or AP.

2)      Which module triggers this ASSERT? Which module contains current RIP value?

At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
If you can share a step by step to me, that would be great.

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Tuesday, November 8, 2016 9:22 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

On 11/04/16 10:30, Jiewen Yao wrote:
> ==== below is V2 description ====
> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
> 2) PiSmmCpu: Add debug info on StartupAp() fails.
> 3) PiSmmCpu: Add ASSERT for AllocatePages().
> 4) PiSmmCpu: Add protection detail in commit message.
> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>
> ==== below is V1 description ====
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>

I have new test results. Let's start with the table again:

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
 2 no      Ia32     255                             n/a                      52x2x2   pass       untested
 3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
 4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
 5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
 6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
 7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
 8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
 9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested

* Case 8: this test case failed with v2 as well, but this time with
  different symptoms:

> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> PixelBlueGreenRedReserved8BitPerColor
> ConvertPages: Incompatible memory types
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)

  I didn't try to narrow this down.

* Case 13 (the "unreliable S3 resume" case): Here the news are both bad
  and good. The good news is for Jiewen: this patch series does not
  cause the unreliability, it "only" amplifies it severely. The bad news
  is correspondingly for everyone else: S3 resume is actually unreliable
  even in case 4, that is, without this series applied, it's just the
  failure rate is much-much lower.

  Namely, in my new testing, in case 13, S3 resume failed 8 times out of
  21 tries. (I stopped testing at the 8th failure.)

  Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
  exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
  #12 that failed; I continued testing and aborted the test after the
  55th try.)

  So, while the series hugely amplifies the failure rate, the failure
  does exist without the series. Which is why I modified the case 4
  results in the table, and also lower-cased the word "unreliable" in
  case 13.

  Below I will return to this problem separately; let's go over the rest
  of the table first.

* Case 17: I guess this is not a real failure, I'm just including it for
  completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
  additional SMRAM demand (see the commit message on patch V2 4/6). This
  case fails with

> SmmLockBox Command - 4
> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
> SmmLockBox SmmLockBoxHandler Exit
> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)

  which is an SMRAM allocation failure. If I lower the VCPU count to
  50x2x2, then the guest boots fine.

----*----

Before I get to the S3 resume problem (which, again, reproduces without
this series, although much less frequently), I'd like to comment on the
removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
function, on the return value of SmmBlockingStartupThisAp(). This change
allows v2 to proceed past that point; however, I'm seeing a whole lot of

> !mSmmMpSyncData->CpuData[1].Present
> !mSmmMpSyncData->CpuData[2].Present
> !mSmmMpSyncData->CpuData[3].Present
> ...

messages in the OVMF boot log, interspersed with

> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065

style messages. (That is, one error message for each AP, per
ConvertPageEntryAttribute() message.)

Is this okay / intentional? The number of these messages can go up to
several thousands and that sort of drowns out everything else in the
log.

It's also not easy to mask the message, because it's logged on the
DEBUG_ERROR level.

----*----

* Okay, so the S3 problem. Last time I suspected that the failure point
  (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
  9A1D0, according to the OVMF log). In order to test this idea, I
  exercised this series with S3 against a Windows 8.1 guest (--> case 13
  again). The failure reproduced on the second S3 resume, with identical
  RIP, despite the Windows wakeup vector being located elsewhere (at
  0x1000).

  Quoting the OVMF log leading up to the resume:

> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 1000

  QEMU log (same as before):

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  So, we can exclude the suspicion that the problem is guest OS
  dependent.

* Then I looked for the base address of the page containing the
  RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
  some firmware component might have allocated that area actually. Here
  we go:

> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000

  That is, the failure hits (when it hits -- not always) in the area
  where the CpuMpPei driver *borrows* memory for the startup vector of
  the APs, for the purposes of the MP service PPI. ("Wakeup" is an
  overloaded word here; the "wakeup buffer" has nothing to do with S3
  resume, it just serves for booting the APs temporarily in PEI, for
  implementing the MP service PPI.)

  When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
  the original contents of this area. This occurs just before
  transfering control to the guest OS wakeup vector: see the
  "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
  quoted from the OVMF log.

  I documented (parts of) this logic in OVMF commit

    https://github.com/tianocore/edk2/commit/e3e3090a959a0

  (see the code comments as well).

* At that time, I thought to have identified a memory management bug in
  CpuMpPei; see the following discussion and bug report for details:

    https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
    https://bugzilla.tianocore.org/show_bug.cgi?id=67

  However, with the extraction / introduction of MpInitLib, this issue
  has been fixed: GetWakeupBuffer() now calls
  CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
  no longer; we shouldn't be looking there for the root cause.

* Either way, I don't understand why anything would want to execute code
  in the one page that happens to host the MP services PPI startup
  buffer for APs during PEI.

  Not understanding the "why", I looked at the "what", and resorted to
  tracing KVM. Because the problem readily reproduces with this series
  applied (case 13), it wasn't hard to start the tracing while the guest
  was suspended, and capture just the actions that led from the
  KVM-level wakeup to the failure.

  The QEMU state dumps are visible above in the email. I've also
  uploaded the compressed OVMF log and the textual KVM trace here:

    http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/

  I sincerely hope that Paolo will have a field day with the KVM trace
  :) I managed to identify the following curiosities (remember this is
  all on the S3 resume path):

  * First, the VCPUs (there are four of them) enter and leave SMM in a
    really funky pattern:

      vcpu#0  vcpu#1  vcpu#2  vcpu#3
      ------  ------  ------  ------
              enter
               |
              leave

                      enter
                        |
                      leave

                              enter
                                |
                              leave

      enter
        |
      leave

              enter           enter
       enter    |     enter     |
         |      |       |       |
       leave    |       |       |
                |       |       |
       enter    |       |       |
         |      |       |       |
       leave  leave   leave   leave

    That is, first we have each VCPU enter and leave SMM in complete
    isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
    followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
    temporarily (it comes back in later), while the other three remain
    in SMM. Finally all four of them leave SMM together.

    After which the problem occurs.

  * Second, the instruction that causes things to blow up is <0f aa>,
    i.e., RSM. I have absolutely no clue why RSM is executed:

    (a) in the area that used to host the AP startup routine for the MP
    services PPI -- note that we also have "Transfer to 16bit OS waking
    vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
    area completeley! --,

    (b) and why *after* all four VCPUs have just left SMM, together.

  * The RSM instruction is handled successfully elsewhere, for example
    when all four VCPUs leave SMM, at the bottom of the diagram above:

> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa

  * The guest-phys address 7ff7f000 that we see just before the error:

> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)

    can be found higher up in the trace; namely, it is written to CR3
    several times. It's the root of the page tables.

  * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.

* I also tried the "info tlb" monitor command, via "virsh
  qemu-monitor-command --hmp", while the guest was auto-paused after the
  crash.

  I cannot provide results: QEMU appeared to return a message that would
  be longer than 16MB after encoding by libvirt, and libvirt rejected
  that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).

  Anyway, the KVM trace, and the QEMU register dump, look consistent
  with what Paolo said about "Code=?? ?? ??...":

    The question marks usually mean that the page tables do not map a
    page at that address.

  CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
  (SMM=0). We can't translate *any* guest-virtual address, as we can't
  even begin walking the page tables.

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-08  1:22 ` Laszlo Ersek
  2016-11-08 12:59   ` Yao, Jiewen
  2016-11-09  6:25   ` Yao, Jiewen
@ 2016-11-09 11:23   ` Paolo Bonzini
  2016-11-09 15:16     ` Yao, Jiewen
  2 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 11:23 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Jiewen Yao, edk2-devel, Michael D Kinney, Feng Tian, Jeff Fan,
	Star Zeng


>   * Second, the instruction that causes things to blow up is <0f aa>,
>     i.e., RSM. I have absolutely no clue why RSM is executed:

It's probably not RSM.  RSM is probably the last instruction executed
before, and it's still in the buffer because, as you said, there's no
way that you can fetch an instruction while CR3 points into SMM.

My first thought was that the MMU is for some reason out of contact
with reality, but actually the CR3 write is correct:

             CPU-24446 [002] 39841.871040: kvm_exit:             reason CR_ACCESS rip 0x9f05e info 103 0
             CPU-24446 [002] 39841.871040: kvm_cr:               cr_write 3 = 0x7ff7f000

and it's coming from the stub as well.  So the second thought was that
the wakeup buffer has the wrong CR3 put into the wakeup buffer's Cr3 location.
I'm glad I kept looking because it was much more entertaining.  Especially
knowing that I (probably) will not have to fix it. :)

The basic idea for debugging was to look for interesting events and
use 0x402 writes to correlate them to the debug log.  For example, most
accesses to 0x9f??? are obviously not traced by KVM, but the first ones
are:

31519-              CPU-24444 [006] 39841.783344: kvm_exit:             reason EPT_VIOLATION rip 0x855d82 info 181 0
31520:              CPU-24444 [006] 39841.783344: kvm_page_fault:       address 9f000 error_code 181
280224-             CPU-24444 [006] 39841.860940: kvm_exit:             reason EPT_VIOLATION rip 0x7ffd0d15 info 182 0
280225:             CPU-24444 [006] 39841.860940: kvm_page_fault:       address 9f000 error_code 182

(The number is just the line number in the trace).  Luckily your machine
didn't have EPT accessed/dirty bits, so KVM trapped both the first read
and the first write.

The read is at

WakeupBufferStart = 9F000, WakeupBufferSize = 1000

but it's not too interesting.  The second is a good one to start debugging
because it's from SMRAM (though not from SMM, since the first kvm_enter_smm
happens later at 305930).  So it makes sense that it writes an SMRAM CR3.
There is a write to the debug log just before, at 279993, and it writes
"SmmRestoreCpu()".  As expected, the write is followed by a flurry of MSR
writes, the APIC programming at 280131, so I am pretty sure that the write to
mExchangeInfo->Cr3 comes from PrepareApStartupVector.

FWIW, I first looked at the call chain up from BackupAndPrepareWakeupBuffer,
but that led me nowhere for an hour.  So I was a bit lucky indeed. :)

Anyhow, SmmRestoreCpu is the SmmS3ResumeEntryPoint for S3Resume2Pei, and
indeed, earlier in the log you have this debugging output from S3Resume2Pei:

SMM S3 CR3                      = 7FF7F000

Doh, maybe I should have looked at the log before the trace.  Who knows.
Anyway, the SMM_S3_RESUME_STATE is initialized by InitSmmS3ResumeState,
so the CR3 is the one that is initialized by InitSmmS3Cr3 in
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c.  At this point I
was still thinking that this CR3 was wrong, but by looking at the
places where SMM is entered, and correlating that with debug log writes,
the puzzle was relatively easy to solve:

1) SMBASE relocation, done by SmmRestoreCpu:

305930:             CPU-24445 [005] 39841.871264: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
306000:             CPU-24445 [005] 39841.871318: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
306051:             CPU-24446 [002] 39841.871349: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
306108:             CPU-24446 [002] 39841.871390: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
306161:             CPU-24447 [004] 39841.871421: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
306218:             CPU-24447 [004] 39841.871463: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
306254:             CPU-24444 [006] 39841.871473: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
306311:             CPU-24444 [006] 39841.871512: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000

2) S3ResumeExecuteBootScript (again, the previous 0x402 write ends
at 334597 and promptly gives us a clue):

334698:             CPU-24445 [005] 39841.882706: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x7ffb3000
334699:             CPU-24447 [004] 39841.882706: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb7000
334741:             CPU-24444 [006] 39841.882723: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
334742:             CPU-24446 [002] 39841.882724: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x7ffb5000
334875:             CPU-24444 [006] 39841.882755: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000

Here I think that it's where things go awry.  The lines after
S3ResumeExecuteBootScript() are

   Close all SMRAM regions before executing boot script
   Lock all SMRAM regions before executing boot script

and indeed the first is at 334898, immediately after VCPU0 leaves
SMM.  But, closing and locking of SMRAM happens while the APs are
still in SMM!  The BSP instead goes on merrily and, after the debug
log has "PeiMpInitLib: CpuMpEndOfPeiCallback () invoked" (0x402
write ends at 364869) we have another access to 0x9f000, this time a
write.  It's RestoreWakeupBuffer:

364908-             CPU-24444 [006] 39841.890320: kvm_exit:             reason EPT_VIOLATION rip 0x855d82 info 182 0
364909:             CPU-24444 [006] 39841.890320: kvm_page_fault:       address 9f000 error_code 182

Again VCPUs 1..3 are still in SMM, but the BSP couldn't care less. :)

We're only 35% through the trace but we're actually close to the end.
At 365704 OVMF says it's transferring control to the Linux's wakeup
vector, and Linux takes control real soon:

365805:             CPU-24444 [006] 39841.890477: kvm_exit:             reason CR_ACCESS rip 0x9aec5 info 4 0
365807:             CPU-24444 [006] 39841.890477: kvm_cr:               cr_write 4 = 0xb0
365817:             CPU-24444 [006] 39841.890479: kvm_entry:            vcpu 0

We don't even need to look closer at what happens after this point,
as we can imagine that the APs are just waiting for something to happen.
But if you do look, all you see is reads to the PMTimer, which makes sense.
And a while after, once they are fed up, they bring VCPU 0 back to SMM:

994855               CPU-24446 [000] 39841.982774: kvm_apic:             apic_write APIC_ICR = 0x4200
994856               CPU-24447 [002] 39841.982774: kvm_apic:             apic_write APIC_ICR = 0x4200
994857               CPU-24445 [005] 39841.982774: kvm_apic:             apic_write APIC_ICR = 0x4200
994858               CPU-24446 [000] 39841.982774: kvm_apic_ipi:         dst 0 vec 0 (SMI|physical|assert|edge|dst)
994859               CPU-24445 [005] 39841.982774: kvm_apic_ipi:         dst 0 vec 0 (SMI|physical|assert|edge|dst)
994860               CPU-24447 [002] 39841.982774: kvm_apic_ipi:         dst 0 vec 0 (SMI|physical|assert|edge|dst)
994861               CPU-24446 [000] 39841.982775: kvm_apic_accept_irq:  apicid 0 vec 0 (SMI|edge)
994862               CPU-24445 [005] 39841.982775: kvm_apic_accept_irq:  apicid 0 vec 0 (SMI|edge)
994863               CPU-24447 [002] 39841.982775: kvm_apic_accept_irq:  apicid 0 vec 0 (SMI|edge)

The rendezvous completes, the APs can finally leave SMM but all they can do
is meet their fate and crash horribly:

994869               CPU-24444 [006] 39841.982776: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a9548 info 0 800000fd
...
994880               CPU-24444 [006] 39841.982777: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
995135:             CPU-24444 [006] 39841.982821: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
995136:             CPU-24445 [005] 39841.982821: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
995137:             CPU-24446 [000] 39841.982821: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
995138:             CPU-24447 [002] 39841.982821: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
995148:             CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
995152:             CPU-24446 [000] 39841.982828: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL

I hope you enjoyed it more than the poor APs. :)

Paolo

>     (a) in the area that used to host the AP startup routine for the MP
>     services PPI -- note that we also have "Transfer to 16bit OS waking
>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>     area completeley! --,
> 
>     (b) and why *after* all four VCPUs have just left SMM, together.
> 
>   * The RSM instruction is handled successfully elsewhere, for example
>     when all four VCPUs leave SMM, at the bottom of the diagram above:
> 
> > CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
> > CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
> > CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
> > CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
> 
>   * The guest-phys address 7ff7f000 that we see just before the error:
> 
> > CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000
> > error_code 83
> > CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000
> > error_code 83
> > CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
> > CPU-24444 [006] 39841.982827: kvm_exit:             reason
> > EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> > CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
> > CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason
> > KVM_EXIT_INTERNAL_ERROR (17)
> 
>     can be found higher up in the trace; namely, it is written to CR3
>     several times. It's the root of the page tables.
> 
>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
> 
> * I also tried the "info tlb" monitor command, via "virsh
>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>   crash.
> 
>   I cannot provide results: QEMU appeared to return a message that would
>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
> 
>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>   with what Paolo said about "Code=?? ?? ??...":
> 
>     The question marks usually mean that the page tables do not map a
>     page at that address.
> 
>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>   even begin walking the page tables.
> 
> Thanks
> Laszlo
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09  6:25   ` Yao, Jiewen
@ 2016-11-09 11:30     ` Paolo Bonzini
  2016-11-09 15:01       ` Yao, Jiewen
  2016-11-09 20:46     ` Laszlo Ersek
  1 sibling, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 11:30 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
	Zeng, Star



On 09/11/2016 07:25, Yao, Jiewen wrote:
> Current BSP just uses its own context to initialize AP. So that AP
> takes BSP CR3, which is SMM CR3, unfortunately. After BSP initialized
> APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw,
> because X64 mode halt still need paging.
> 
> 3)      The error happen, once the AP receives an interrupt (for
> whatever reason), AP starts executing code. However, that that time
> the AP might not be in SMM mode. It means SMM CR3 is not available.
> And then we see this.
> 
> 4)      I guess we did not see the error, or this is RANDOM issue,
> because it depends on if AP receives an interrupt before BSP send
> INIT-SIPI-SIPI.
> 
> 5)      The fix, I think, should be below: We should always put AP to
> protected mode, so that no paging is needed. We should put AP in
> above 1M reserved memory, instead of <1M memory, because <1M memory
> is restored.

For what it's worth, this is not what I observed.  What I found is that
the BSP doesn't wait for the AP rendezvous before closing SMRAM.

I'm not sure if the two things are related, but (3) would be a much
worse bug.  APs should not be receiving an interrupt.  Perhaps an NMI if
API is sitting in a CLI;HLT loop, but this is not what is happening.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 11:30     ` Paolo Bonzini
@ 2016-11-09 15:01       ` Yao, Jiewen
  2016-11-09 15:54         ` Paolo Bonzini
  0 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-09 15:01 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
	Zeng, Star

 What I found is that the BSP doesn't wait for the AP rendezvous before closing SMRAM.
[Jiewen] That is a good catch. Thanks to explain.
I believe that is more convincible than AP getting interrupt. :)

We have some places where BSP talking to AP in S3.

1)      CpuS3.c - EarlyInitializeCpu()

2)      CpuS3.c - SmmRelocateBases()

3)      CpuS3.c - InitializeCpu()

4)      S3Resume.c - SendSmiIpiAllExcludingSelf()

I believe we can guarantee 1/2/3 is good, because I found we check BSP check mNumberToFinish.
4 is a risk, because there is no AP finish check. If the AP is in below 1M with CR3 in SMRAM, it will be a trouble.

Once the AP executes RSM and return to non-SMM, the CR3 is no longer valid and AP must be crashed immediately. WoW!

The fix, I believe, is same.
We should make 1) AP is in above 1M reserved memory, and 2) AP is in protected mode with paging disabled.

Thank you
Yao Jiewen

From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini
Sent: Wednesday, November 9, 2016 7:30 PM
To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.



On 09/11/2016 07:25, Yao, Jiewen wrote:
> Current BSP just uses its own context to initialize AP. So that AP
> takes BSP CR3, which is SMM CR3, unfortunately. After BSP initialized
> APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw,
> because X64 mode halt still need paging.
>
> 3)      The error happen, once the AP receives an interrupt (for
> whatever reason), AP starts executing code. However, that that time
> the AP might not be in SMM mode. It means SMM CR3 is not available.
> And then we see this.
>
> 4)      I guess we did not see the error, or this is RANDOM issue,
> because it depends on if AP receives an interrupt before BSP send
> INIT-SIPI-SIPI.
>
> 5)      The fix, I think, should be below: We should always put AP to
> protected mode, so that no paging is needed. We should put AP in
> above 1M reserved memory, instead of <1M memory, because <1M memory
> is restored.

For what it's worth, this is not what I observed.  What I found is that
the BSP doesn't wait for the AP rendezvous before closing SMRAM.

I'm not sure if the two things are related, but (3) would be a much
worse bug.  APs should not be receiving an interrupt.  Perhaps an NMI if
API is sitting in a CLI;HLT loop, but this is not what is happening.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 11:23   ` Paolo Bonzini
@ 2016-11-09 15:16     ` Yao, Jiewen
  0 siblings, 0 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-09 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek
  Cc: edk2-devel@ml01.01.org, Kinney, Michael D, Tian, Feng, Fan, Jeff,
	Zeng, Star

Great work! I appreciate that.

It seems the slow emulated SMM keeps exposing the corner case on the code. :)

We will fix the bad AP in another patch.

Thank you
Yao Jiewen

From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Wednesday, November 9, 2016 7:24 PM
To: Laszlo Ersek <lersek@redhat.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.


>   * Second, the instruction that causes things to blow up is <0f aa>,
>     i.e., RSM. I have absolutely no clue why RSM is executed:

It's probably not RSM.  RSM is probably the last instruction executed
before, and it's still in the buffer because, as you said, there's no
way that you can fetch an instruction while CR3 points into SMM.

My first thought was that the MMU is for some reason out of contact
with reality, but actually the CR3 write is correct:

             CPU-24446 [002] 39841.871040: kvm_exit:             reason CR_ACCESS rip 0x9f05e info 103 0
             CPU-24446 [002] 39841.871040: kvm_cr:               cr_write 3 = 0x7ff7f000

and it's coming from the stub as well.  So the second thought was that
the wakeup buffer has the wrong CR3 put into the wakeup buffer's Cr3 location.
I'm glad I kept looking because it was much more entertaining.  Especially
knowing that I (probably) will not have to fix it. :)

The basic idea for debugging was to look for interesting events and
use 0x402 writes to correlate them to the debug log.  For example, most
accesses to 0x9f??? are obviously not traced by KVM, but the first ones
are:

31519-              CPU-24444 [006] 39841.783344: kvm_exit:             reason EPT_VIOLATION rip 0x855d82 info 181 0
31520:              CPU-24444 [006] 39841.783344: kvm_page_fault:       address 9f000 error_code 181
280224-             CPU-24444 [006] 39841.860940: kvm_exit:             reason EPT_VIOLATION rip 0x7ffd0d15 info 182 0
280225:             CPU-24444 [006] 39841.860940: kvm_page_fault:       address 9f000 error_code 182

(The number is just the line number in the trace).  Luckily your machine
didn't have EPT accessed/dirty bits, so KVM trapped both the first read
and the first write.

The read is at

WakeupBufferStart = 9F000, WakeupBufferSize = 1000

but it's not too interesting.  The second is a good one to start debugging
because it's from SMRAM (though not from SMM, since the first kvm_enter_smm
happens later at 305930).  So it makes sense that it writes an SMRAM CR3.
There is a write to the debug log just before, at 279993, and it writes
"SmmRestoreCpu()".  As expected, the write is followed by a flurry of MSR
writes, the APIC programming at 280131, so I am pretty sure that the write to
mExchangeInfo->Cr3 comes from PrepareApStartupVector.

FWIW, I first looked at the call chain up from BackupAndPrepareWakeupBuffer,
but that led me nowhere for an hour.  So I was a bit lucky indeed. :)

Anyhow, SmmRestoreCpu is the SmmS3ResumeEntryPoint for S3Resume2Pei, and
indeed, earlier in the log you have this debugging output from S3Resume2Pei:

SMM S3 CR3                      = 7FF7F000

Doh, maybe I should have looked at the log before the trace.  Who knows.
Anyway, the SMM_S3_RESUME_STATE is initialized by InitSmmS3ResumeState,
so the CR3 is the one that is initialized by InitSmmS3Cr3 in
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c.  At this point I
was still thinking that this CR3 was wrong, but by looking at the
places where SMM is entered, and correlating that with debug log writes,
the puzzle was relatively easy to solve:

1) SMBASE relocation, done by SmmRestoreCpu:

305930:             CPU-24445 [005] 39841.871264: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
306000:             CPU-24445 [005] 39841.871318: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
306051:             CPU-24446 [002] 39841.871349: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
306108:             CPU-24446 [002] 39841.871390: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
306161:             CPU-24447 [004] 39841.871421: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
306218:             CPU-24447 [004] 39841.871463: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
306254:             CPU-24444 [006] 39841.871473: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
306311:             CPU-24444 [006] 39841.871512: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000

2) S3ResumeExecuteBootScript (again, the previous 0x402 write ends
at 334597 and promptly gives us a clue):

334698:             CPU-24445 [005] 39841.882706: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x7ffb3000
334699:             CPU-24447 [004] 39841.882706: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb7000
334741:             CPU-24444 [006] 39841.882723: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
334742:             CPU-24446 [002] 39841.882724: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x7ffb5000
334875:             CPU-24444 [006] 39841.882755: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000

Here I think that it's where things go awry.  The lines after
S3ResumeExecuteBootScript() are

   Close all SMRAM regions before executing boot script
   Lock all SMRAM regions before executing boot script

and indeed the first is at 334898, immediately after VCPU0 leaves
SMM.  But, closing and locking of SMRAM happens while the APs are
still in SMM!  The BSP instead goes on merrily and, after the debug
log has "PeiMpInitLib: CpuMpEndOfPeiCallback () invoked" (0x402
write ends at 364869) we have another access to 0x9f000, this time a
write.  It's RestoreWakeupBuffer:

364908-             CPU-24444 [006] 39841.890320: kvm_exit:             reason EPT_VIOLATION rip 0x855d82 info 182 0
364909:             CPU-24444 [006] 39841.890320: kvm_page_fault:       address 9f000 error_code 182

Again VCPUs 1..3 are still in SMM, but the BSP couldn't care less. :)

We're only 35% through the trace but we're actually close to the end.
At 365704 OVMF says it's transferring control to the Linux's wakeup
vector, and Linux takes control real soon:

365805:             CPU-24444 [006] 39841.890477: kvm_exit:             reason CR_ACCESS rip 0x9aec5 info 4 0
365807:             CPU-24444 [006] 39841.890477: kvm_cr:               cr_write 4 = 0xb0
365817:             CPU-24444 [006] 39841.890479: kvm_entry:            vcpu 0

We don't even need to look closer at what happens after this point,
as we can imagine that the APs are just waiting for something to happen.
But if you do look, all you see is reads to the PMTimer, which makes sense.
And a while after, once they are fed up, they bring VCPU 0 back to SMM:

994855               CPU-24446 [000] 39841.982774: kvm_apic:             apic_write APIC_ICR = 0x4200
994856               CPU-24447 [002] 39841.982774: kvm_apic:             apic_write APIC_ICR = 0x4200
994857               CPU-24445 [005] 39841.982774: kvm_apic:             apic_write APIC_ICR = 0x4200
994858               CPU-24446 [000] 39841.982774: kvm_apic_ipi:         dst 0 vec 0 (SMI|physical|assert|edge|dst)
994859               CPU-24445 [005] 39841.982774: kvm_apic_ipi:         dst 0 vec 0 (SMI|physical|assert|edge|dst)
994860               CPU-24447 [002] 39841.982774: kvm_apic_ipi:         dst 0 vec 0 (SMI|physical|assert|edge|dst)
994861               CPU-24446 [000] 39841.982775: kvm_apic_accept_irq:  apicid 0 vec 0 (SMI|edge)
994862               CPU-24445 [005] 39841.982775: kvm_apic_accept_irq:  apicid 0 vec 0 (SMI|edge)
994863               CPU-24447 [002] 39841.982775: kvm_apic_accept_irq:  apicid 0 vec 0 (SMI|edge)

The rendezvous completes, the APs can finally leave SMM but all they can do
is meet their fate and crash horribly:

994869               CPU-24444 [006] 39841.982776: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a9548 info 0 800000fd
...
994880               CPU-24444 [006] 39841.982777: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
995135:             CPU-24444 [006] 39841.982821: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
995136:             CPU-24445 [005] 39841.982821: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
995137:             CPU-24446 [000] 39841.982821: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
995138:             CPU-24447 [002] 39841.982821: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
995148:             CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
995152:             CPU-24446 [000] 39841.982828: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL

I hope you enjoyed it more than the poor APs. :)

Paolo

>     (a) in the area that used to host the AP startup routine for the MP
>     services PPI -- note that we also have "Transfer to 16bit OS waking
>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>     area completeley! --,
>
>     (b) and why *after* all four VCPUs have just left SMM, together.
>
>   * The RSM instruction is handled successfully elsewhere, for example
>     when all four VCPUs leave SMM, at the bottom of the diagram above:
>
> > CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
> > CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
> > CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
> > CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
>
>   * The guest-phys address 7ff7f000 that we see just before the error:
>
> > CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000
> > error_code 83
> > CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000
> > error_code 83
> > CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
> > CPU-24444 [006] 39841.982827: kvm_exit:             reason
> > EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
> > CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
> > CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason
> > KVM_EXIT_INTERNAL_ERROR (17)
>
>     can be found higher up in the trace; namely, it is written to CR3
>     several times. It's the root of the page tables.
>
>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>   crash.
>
>   I cannot provide results: QEMU appeared to return a message that would
>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>   with what Paolo said about "Code=?? ?? ??...":
>
>     The question marks usually mean that the page tables do not map a
>     page at that address.
>
>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>   even begin walking the page tables.
>
> Thanks
> Laszlo
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 15:01       ` Yao, Jiewen
@ 2016-11-09 15:54         ` Paolo Bonzini
  2016-11-09 16:06           ` Paolo Bonzini
  2016-11-09 22:28           ` Laszlo Ersek
  0 siblings, 2 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 15:54 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
	Zeng, Star



On 09/11/2016 16:01, Yao, Jiewen wrote:
> 1)      CpuS3.c – EarlyInitializeCpu()
> 2)      CpuS3.c – SmmRelocateBases()
> 3)      CpuS3.c – InitializeCpu()
> 4)      S3Resume.c – SendSmiIpiAllExcludingSelf()
> 
> I believe we can guarantee 1/2/3 is good, because I found we check BSP
> check mNumberToFinish.
> 
> 4 is a risk, because there is no AP finish check. If the AP is in below
> 1M with CR3 in SMRAM, it will be a trouble.
> 
> Once the AP executes RSM and return to non-SMM, the CR3 is no longer
> valid and AP must be crashed immediately. WoW!
> 
> The fix, I believe, is same.
> 
> We should make 1) AP is in above 1M reserved memory,

Is this because of the NMI case?

> and 2) AP is in protected mode with paging disabled.

It is not clear to me what the (4) SIPI done is there for, and why it is
triggered in S3Resume.c rather than CpuS3.c.  And why does it take so
much for APs to complete it?

That said, by the time you close and lock SMRAM, you aren't even sure
that you have reached the cli;hlt loop in the rendezvous funnel.  In
practice you will be there, but there is still a theoretical race.

InterlockedDecrement (&mNumberToFinish) should be moved from
EarlyMPRendezvousProcedure/MPRendezvousProcedure to GoToSleep, and
GoToSleep should leave 64-bit mode before doing it.  This will fix the
S3 bug as well.  It's only needed for 64-bit mode, but it is doable for
the Ia32 version as well.

Perhaps EarlyMPRendezvousProcedure and MPRendezvousProcedure can return
&mNumberToFinish; what do you think?

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 15:54         ` Paolo Bonzini
@ 2016-11-09 16:06           ` Paolo Bonzini
  2016-11-09 22:28           ` Laszlo Ersek
  1 sibling, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 16:06 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek
  Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
	Fan, Jeff



On 09/11/2016 16:54, Paolo Bonzini wrote:
>> > and 2) AP is in protected mode with paging disabled.
> It is not clear to me what the (4) SIPI done is there for, and why it is
> triggered in S3Resume.c rather than CpuS3.c.  And why does it take so
> much for APs to complete it?

SMI of course, not SIPI.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09  6:25   ` Yao, Jiewen
  2016-11-09 11:30     ` Paolo Bonzini
@ 2016-11-09 20:46     ` Laszlo Ersek
  2016-11-10 10:41       ` Yao, Jiewen
  1 sibling, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-09 20:46 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

On 11/09/16 07:25, Yao, Jiewen wrote:
> Hi Laszlo
> I will fix DEBUG message issue in V3 patch.
> 
> Below is rest issues:
> 
> 
> l  Case 13: S3 fails randomly.
> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
> 
> 
> 1)      We believe the dead CPU is AP. Not BSP.
> The reason is that:
> 
> 1.1)   The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
> 
> 1.2)   The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
> 
> 1.3)   The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
> 
> 
> 2)      Based upon the 1), we reviewed S3 resume AP flow.
> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
> 
> 
> 3)      The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
> 
> 
> 4)      I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
> 
> 
> 5)      The fix, I think, should be below:
> We should always put AP to protected mode, so that no paging is needed.
> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
> 
> 
> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
> 
> There is no need to do more investigation. Thanks for your great help on that. :)

Thank you for your help!

I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is

    BSP exits SMM and closes SMRAM on the S3 resume path before
    meeting with AP(s)

I hope the title is mostly right. I didn't add any other details (I
haven't gone through the thread in detail yet, and without that I can't
even write up a semi-reasonable report myself). Instead, I referenced
this message of yours in the report, and I also linked Paolo's analysis
from elsewhere in the thread. I hope this will do for the report.

(Also, thank you Paolo, from the amazing analysis -- I haven't digested
it yet, but I can already tell it's amazing! :))

> l  Case 17 - I do not think it is a real issue, because SMM is out of resource.
> 
> 
> l  Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
> SPIN_LOCK *
> EFIAPI
> InitializeSpinLock (
>   OUT      SPIN_LOCK                 *SpinLock
>   )
> {
>   ASSERT (SpinLock != NULL);
> 
>   _ReadWriteBarrier();
>   *SpinLock = SPIN_LOCK_RELEASED;
>   _ReadWriteBarrier();
> 
>   return SpinLock;
> }
> 
> If you can have a quick check on below, that would be great.
> 
> 1)      Which processor triggers this ASSERT? BSP or AP.
> 
> 2)      Which module triggers this ASSERT? Which module contains current RIP value?

First, one additional piece of info I have learned is that the issue
does not always present itself. Sometimes the boot just works fine,
other times the assert fires.

Using the QEMU monitor, I managed to get the following information with
the "info cpus" command:

* CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
  CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
  CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
  CPU #3: pc=0x000000007ffd17ca thread_id=7838

VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
point into SMRAM again.

In the OVMF log, I see

Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
PiSmmCpuDxeSmm.efi

So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
entry point, 0x8577, 0x253 bytes less).

Running

  objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug

first I see confirmation that

  start address 0x00000253

and then

000087bd <CpuDeadLoop>:
VOID
EFIAPI
CpuDeadLoop (
  VOID
  )
{
    87bd:       55                      push   %ebp
    87be:       89 e5                   mov    %esp,%ebp
    87c0:       83 ec 10                sub    $0x10,%esp
  volatile UINTN  Index;

  for (Index = 0; Index == 0;);
    87c3:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
    87ca:       8b 45 fc                mov    -0x4(%ebp),%eax  <-- HERE
    87cd:       85 c0                   test   %eax,%eax
    87cf:       74 f9                   je     87ca <CpuDeadLoop+0xd>
}
    87d1:       c9                      leave
    87d2:       c3                      ret

This seems consistent with an assertion failure.

I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
like a possible caller:

      //
      // The BUSY lock is initialized to Released state. This needs to
      // be done early enough to be ready for BSP's SmmStartupThisAp()
      // call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
      // called immediately after AP's present flag is detected.
      //
      InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);

Just a guess, of course.

> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
> If you can share a step by step to me, that would be great.

(1) Grab a host computer with a CPU that supports VMX and EPT.

(2) Download and install Fedora 24 (for example):

https://getfedora.org/en/workstation/download/
http://docs.fedoraproject.org/install-guide

(3) Install the "qemu-system-x86" package with DNF

dnf install qemu-system-x86

(4) clone edk2 with git

(5) embed OpenSSL optionally (for secure boot); see
"CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"

(6) build OVMF:

source edksetup.sh
make -C "$EDK_TOOLS_PATH"

# Ia32
build \
  -a IA32 \
  -p OvmfPkg/OvmfPkgIa32.dsc \
  -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
  -t GCC5 -b DEBUG

# Ia32X64
build \
  -a IA32 -a X64 \
  -p OvmfPkg/OvmfPkgIa32X64.dsc \
  -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
  -t GCC5 -b DEBUG

(7) Create disk images:

qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
  -o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G

qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
  -o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G

(8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
that you downloaded already (the ISO image).

For 32-bit guest OS, this one used to work:

https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/

minimally the 20141209 release. Hm... actually, I think the maintainer
of that image has discontinued the downloadable files :(

So, I don't know what 32-bit UEFI OS to recommend for testing.

32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
times, with some help from a Microsoft developer, but we couldn't solve
it), so I can't recommend Windows as an alternative.

Perhaps you can use

https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html

as a 32-bit guest OS, I never tried.

(9) Anyway, once you have an installer ISO, set the "ISO" environment
variable to the ISO image's full pathname, and then run QEMU like this:

# Settings for Ia32 only:

ISO=...
DISK=.../disk-ia32.img
FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
VARS=vars-32.fd
QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
DEBUG=debug-32.log

# Settings for Ia32X64 only:

ISO=...
DISK=.../disk-ia32x64.img
FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
VARS=vars-3264.fd
QEMU_COMMAND=qemu-system-x86_64
DEBUG=debug-3264.log

# Common commands for both target arches:

# create variable store from varstore template
# if the former doesn't exist yet
if ! [ -e "$VARS" ]; then
  cp -- "$TEMPLATE" "$VARS"
fi

$QEMU_COMMAND \
  -machine q35,smm=on,accel=kvm \
  -m 4096 \
  -smp sockets=1,cores=2,threads=2 \
  -global driver=cfi.pflash01,property=secure,value=on \
  -drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
  -drive if=pflash,format=raw,unit=1,file=${VARS} \
  \
  -chardev file,id=debugfile,path=$DEBUG \
  -device isa-debugcon,iobase=0x402,chardev=debugfile \
  \
  -chardev stdio,id=char0,signal=off,mux=on \
  -mon chardev=char0,mode=readline,default \
  -serial chardev:char0 \
  \
  -drive id=iso,if=none,format=raw,readonly,file=$ISO \
  -drive id=disk,if=none,format=qcow2,file=$DISK \
  \
  -device virtio-scsi-pci,id=scsi0 \
  -device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
  -device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
  \
  -device VGA

This will capture the OVMF debug output in the $DEBUG file. Also, the
terminal where you run the command can be switched between the guest's
serial console and the QEMU monitor with [Ctrl-A C].

Thanks
Laszlo

> 
> Thank you
> Yao Jiewen
> 
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Tuesday, November 8, 2016 9:22 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> 
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2  X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
> 
> I have new test results. Let's start with the table again:
> 
> Legend:
> 
> - "untested" means the test was not executed because the same test
>   failed or proved unreliable in a less demanding configuration already,
> 
> - "n/a" means a setting or test case was impossible,
> 
> - "fail" and "unreliable" (lower case) are outside the scope of this
>   series; they either capture the pre-series status, or are expected
>   even with the series applied due to the pre-series status,
> 
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>   series.
> 
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
> 
>    series  OVMF                                                              VCPU     boot       S3 resume
>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>  1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
>  2 no      Ia32     255                             n/a                      52x2x2   pass       untested
>  3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
>  4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
>  5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
>  6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
>  7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
>  8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
>  9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
> 10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
> 11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
> 12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
> 13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
> 14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
> 15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
> 16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
> 17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
> 18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested
> 
> * Case 8: this test case failed with v2 as well, but this time with
>   different symptoms:
> 
>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitExitBootServicesCallback() done!
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
> 
>   I didn't try to narrow this down.
> 
> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>   and good. The good news is for Jiewen: this patch series does not
>   cause the unreliability, it "only" amplifies it severely. The bad news
>   is correspondingly for everyone else: S3 resume is actually unreliable
>   even in case 4, that is, without this series applied, it's just the
>   failure rate is much-much lower.
> 
>   Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>   21 tries. (I stopped testing at the 8th failure.)
> 
>   Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>   exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>   #12 that failed; I continued testing and aborted the test after the
>   55th try.)
> 
>   So, while the series hugely amplifies the failure rate, the failure
>   does exist without the series. Which is why I modified the case 4
>   results in the table, and also lower-cased the word "unreliable" in
>   case 13.
> 
>   Below I will return to this problem separately; let's go over the rest
>   of the table first.
> 
> * Case 17: I guess this is not a real failure, I'm just including it for
>   completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>   additional SMRAM demand (see the commit message on patch V2 4/6). This
>   case fails with
> 
>> SmmLockBox Command - 4
>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>> SmmLockBox SmmLockBoxHandler Exit
>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
> 
>   which is an SMRAM allocation failure. If I lower the VCPU count to
>   50x2x2, then the guest boots fine.
> 
> ----*----
> 
> Before I get to the S3 resume problem (which, again, reproduces without
> this series, although much less frequently), I'd like to comment on the
> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
> function, on the return value of SmmBlockingStartupThisAp(). This change
> allows v2 to proceed past that point; however, I'm seeing a whole lot of
> 
>> !mSmmMpSyncData->CpuData[1].Present
>> !mSmmMpSyncData->CpuData[2].Present
>> !mSmmMpSyncData->CpuData[3].Present
>> ...
> 
> messages in the OVMF boot log, interspersed with
> 
>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
> 
> style messages. (That is, one error message for each AP, per
> ConvertPageEntryAttribute() message.)
> 
> Is this okay / intentional? The number of these messages can go up to
> several thousands and that sort of drowns out everything else in the
> log.
> 
> It's also not easy to mask the message, because it's logged on the
> DEBUG_ERROR level.
> 
> ----*----
> 
> * Okay, so the S3 problem. Last time I suspected that the failure point
>   (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>   9A1D0, according to the OVMF log). In order to test this idea, I
>   exercised this series with S3 against a Windows 8.1 guest (--> case 13
>   again). The failure reproduced on the second S3 resume, with identical
>   RIP, despite the Windows wakeup vector being located elsewhere (at
>   0x1000).
> 
>   Quoting the OVMF log leading up to the resume:
> 
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 1000
> 
>   QEMU log (same as before):
> 
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> 
>   So, we can exclude the suspicion that the problem is guest OS
>   dependent.
> 
> * Then I looked for the base address of the page containing the
>   RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>   some firmware component might have allocated that area actually. Here
>   we go:
> 
>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
> 
>   That is, the failure hits (when it hits -- not always) in the area
>   where the CpuMpPei driver *borrows* memory for the startup vector of
>   the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>   overloaded word here; the "wakeup buffer" has nothing to do with S3
>   resume, it just serves for booting the APs temporarily in PEI, for
>   implementing the MP service PPI.)
> 
>   When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>   the original contents of this area. This occurs just before
>   transfering control to the guest OS wakeup vector: see the
>   "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>   quoted from the OVMF log.
> 
>   I documented (parts of) this logic in OVMF commit
> 
>     https://github.com/tianocore/edk2/commit/e3e3090a959a0
> 
>   (see the code comments as well).
> 
> * At that time, I thought to have identified a memory management bug in
>   CpuMpPei; see the following discussion and bug report for details:
> 
>     https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>     https://bugzilla.tianocore.org/show_bug.cgi?id=67
> 
>   However, with the extraction / introduction of MpInitLib, this issue
>   has been fixed: GetWakeupBuffer() now calls
>   CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>   no longer; we shouldn't be looking there for the root cause.
> 
> * Either way, I don't understand why anything would want to execute code
>   in the one page that happens to host the MP services PPI startup
>   buffer for APs during PEI.
> 
>   Not understanding the "why", I looked at the "what", and resorted to
>   tracing KVM. Because the problem readily reproduces with this series
>   applied (case 13), it wasn't hard to start the tracing while the guest
>   was suspended, and capture just the actions that led from the
>   KVM-level wakeup to the failure.
> 
>   The QEMU state dumps are visible above in the email. I've also
>   uploaded the compressed OVMF log and the textual KVM trace here:
> 
>     http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
> 
>   I sincerely hope that Paolo will have a field day with the KVM trace
>   :) I managed to identify the following curiosities (remember this is
>   all on the S3 resume path):
> 
>   * First, the VCPUs (there are four of them) enter and leave SMM in a
>     really funky pattern:
> 
>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>       ------  ------  ------  ------
>               enter
>                |
>               leave
> 
>                       enter
>                         |
>                       leave
> 
>                               enter
>                                 |
>                               leave
> 
>       enter
>         |
>       leave
> 
>               enter           enter
>        enter    |     enter     |
>          |      |       |       |
>        leave    |       |       |
>                 |       |       |
>        enter    |       |       |
>          |      |       |       |
>        leave  leave   leave   leave
> 
>     That is, first we have each VCPU enter and leave SMM in complete
>     isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>     followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>     temporarily (it comes back in later), while the other three remain
>     in SMM. Finally all four of them leave SMM together.
> 
>     After which the problem occurs.
> 
>   * Second, the instruction that causes things to blow up is <0f aa>,
>     i.e., RSM. I have absolutely no clue why RSM is executed:
> 
>     (a) in the area that used to host the AP startup routine for the MP
>     services PPI -- note that we also have "Transfer to 16bit OS waking
>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>     area completeley! --,
> 
>     (b) and why *after* all four VCPUs have just left SMM, together.
> 
>   * The RSM instruction is handled successfully elsewhere, for example
>     when all four VCPUs leave SMM, at the bottom of the diagram above:
> 
>> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
>> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
>> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
>> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
> 
>   * The guest-phys address 7ff7f000 that we see just before the error:
> 
>> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
>> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
>> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)
> 
>     can be found higher up in the trace; namely, it is written to CR3
>     several times. It's the root of the page tables.
> 
>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
> 
> * I also tried the "info tlb" monitor command, via "virsh
>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>   crash.
> 
>   I cannot provide results: QEMU appeared to return a message that would
>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
> 
>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>   with what Paolo said about "Code=?? ?? ??...":
> 
>     The question marks usually mean that the page tables do not map a
>     page at that address.
> 
>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>   even begin walking the page tables.
> 
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 15:54         ` Paolo Bonzini
  2016-11-09 16:06           ` Paolo Bonzini
@ 2016-11-09 22:28           ` Laszlo Ersek
  2016-11-09 22:59             ` Paolo Bonzini
  1 sibling, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-09 22:28 UTC (permalink / raw)
  To: Paolo Bonzini, Yao, Jiewen
  Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
	Fan, Jeff

On 11/09/16 16:54, Paolo Bonzini wrote:
> 
> 
> On 09/11/2016 16:01, Yao, Jiewen wrote:
>> 1)      CpuS3.c – EarlyInitializeCpu()
>> 2)      CpuS3.c – SmmRelocateBases()
>> 3)      CpuS3.c – InitializeCpu()
>> 4)      S3Resume.c – SendSmiIpiAllExcludingSelf()
>>
>> I believe we can guarantee 1/2/3 is good, because I found we check BSP
>> check mNumberToFinish.
>>
>> 4 is a risk, because there is no AP finish check. If the AP is in below
>> 1M with CR3 in SMRAM, it will be a trouble.
>>
>> Once the AP executes RSM and return to non-SMM, the CR3 is no longer
>> valid and AP must be crashed immediately. WoW!
>>
>> The fix, I believe, is same.
>>
>> We should make 1) AP is in above 1M reserved memory,
> 
> Is this because of the NMI case?
> 
>> and 2) AP is in protected mode with paging disabled.
> 
> It is not clear to me what the (4) SIPI done is there for,

After reading through your great analysis with a keen focus :), I wanted
to ask the exact same thing. I managed to follow / recall the control
flow mostly, but when I saw that SMI, I didn't (and don't) understand
that it was (is) good for.

After all, we're not setting up any request parameters etc. for the
processors to handle in SMM. What's happening there?

Another question I have -- and I feel I should really know it, but I
don't... -- is *why* the APs are executing code from the page at
0x9f000. When the BSP exits SMM, replays the S3 boot script, and finally
finishes off the PEI phase and restores the page at 0x9f000, the APs
seem to be affected -- but why do they care about that page at all? That
page never belonged to PiSmmCpuSmmDxe, it belongs CpuMpPei.

I do understand that the CR3 registers for the APs point into SMRAM,
while they wait for the BSP in SMM. Thus, the BSP closing/locking down
SMRAM, in S3ResumeExecuteBootScript(), breaks the APs -- that's
understandable.

What I don't get is, again:
(1) why S3ResumeExecuteBootScript() raises SMIs at all, before locking
down SMRAM,
(2) what the AP SMM routine (from PiSmmCpuDxeSmm) has to do with the
Wakeup buffer that is allocated and used *solely* by CpuMpPei.

I could be utterly and inexcusably wrong, but I think that the
RIP=0x9f0fd symptom is a red herring. I wrote,

>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>       ------  ------  ------  ------
>               enter
>                |
>               leave
>
>                       enter
>                         |
>                       leave
>
>                               enter
>                                 |
>                               leave
>
>       enter
>         |
>       leave
>
>               enter           enter
>        enter    |     enter     |
>          |      |       |       |
>        leave    |       |       |
>             <--------------------------- BAD
>        enter    |       |       |
>          |      |       |       |
>        leave  leave   leave   leave

Thanks to Paolo's analysis, we now know where that gap comes from and
what it does (so I marked it with BAD now) -- in the gap, the BSP leaves
SMM alone, closes/locks SMRAM, finishes off the PEI phase, restores the
contents of the borrowed wakeup buffer of CpuMpPei, and even transfers
control to Linux's S3 resume vector.

I don't understand why we don't get horrible faults on the APs
*immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
page tables, executable code, everything, will read as 0xff on QEMU. How
can the APs continue in SMM long enough to

(a) time out and pull the BSP back into SMM,
(b) complete the rendezvous and exit SMM?

... Anyway, I think I do have an idea for question (2). Namely, when the
BSP starts executing S3ResumeExecuteBootScript(), in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" -- for which the cue
is ultimately given by the DXE IPL PEIM, as the last action in PEI --,
CpuMpPei has been dispatched already! And, CpuMpPei has placed all the
APs into their comfy HLT loops, so that the MP services PPI could serve
multiprocessing requests.

Thus, the APs are executing code (the HLT loop) from CpuMpPei's wakeup
buffer on page 0x9f000 as *normal business*. That is where the SMI,
raised by the BSP in S3ResumeExecuteBootScript(), rips them out of. And
that's also where KVM tries to return them to, once they finish in SMM
and execute RSM. Too bad by the time KVM returns them there, the wakeup
page has been restored by the BSP.

In other words, the address RIP=0x9f0fd *is* a red herring, that's
simply where the APs happened to be when the SMI was raised, and where
KVM remembers to return the APs to, once the APs execute RSM.

I think I sort of answered question (2). (Apologies if Paolo and Jiewen
explained the exact same thing before; I had to spell it out for
myself.) That leaves question (1) open. Why enter SMM in
S3ResumeExecuteBootScript() at all?

Anyway, I think if the BSP and the APs are properly synchronized around
the SMI injections in S3ResumeExecuteBootScript(), then this bug is
fixed. In that case, the APs' RSMs will restore the full context for the
APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
buffer -- but the APs will sleep on), and then Linux will bring up the
APs, after taking control.

Thanks
Laszlo

> and why it is
> triggered in S3Resume.c rather than CpuS3.c.  And why does it take so
> much for APs to complete it?
> 
> That said, by the time you close and lock SMRAM, you aren't even sure
> that you have reached the cli;hlt loop in the rendezvous funnel.  In
> practice you will be there, but there is still a theoretical race.
> 
> InterlockedDecrement (&mNumberToFinish) should be moved from
> EarlyMPRendezvousProcedure/MPRendezvousProcedure to GoToSleep, and
> GoToSleep should leave 64-bit mode before doing it.  This will fix the
> S3 bug as well.  It's only needed for 64-bit mode, but it is doable for
> the Ia32 version as well.
> 
> Perhaps EarlyMPRendezvousProcedure and MPRendezvousProcedure can return
> &mNumberToFinish; what do you think?
> 
> Paolo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 22:28           ` Laszlo Ersek
@ 2016-11-09 22:59             ` Paolo Bonzini
  2016-11-09 23:27               ` Laszlo Ersek
                                 ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-09 22:59 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Jiewen Yao, Michael D Kinney, Feng Tian, edk2-devel, Star Zeng,
	Jeff Fan


> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.

This I can answer. :)

The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop.  When the AP exits SMM, it is in the JMP instruction.

As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?).  After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like

    POP EAX   ; pop return address
    POP EAX   ; pop Context1 which is &mNumberToFinish
    DEC [EAX]
 1: CLI
    HLT
    JMP 1

> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.

I wouldn't call it a red herring.  After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.

What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd.  That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.

> >       vcpu#0  vcpu#1  vcpu#2  vcpu#3
> >       ------  ------  ------  ------
> >               enter           enter
> >        enter    |     enter     |
> >          |      |       |       |
> >        leave    |       |       |
> >             <--------------------------- BAD
> >        enter    |       |       |
> >          |      |       |       |
> >        leave  leave   leave   leave
> 
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
> 
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?

Because the "0xff" only applies when you're out of SMM.  The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed".  (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).

> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
> 
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.

Agreed.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 22:59             ` Paolo Bonzini
@ 2016-11-09 23:27               ` Laszlo Ersek
  2016-11-10  1:13                 ` Yao, Jiewen
  2016-11-10  0:49               ` Yao, Jiewen
  2016-11-10  0:50               ` Yao, Jiewen
  2 siblings, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-09 23:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jiewen Yao, Michael D Kinney, Feng Tian, edk2-devel, Star Zeng,
	Jeff Fan

On 11/09/16 23:59, Paolo Bonzini wrote:
> 
>> Another question I have -- and I feel I should really know it, but I
>> don't... -- is *why* the APs are executing code from the page at
>> 0x9f000.
> 
> This I can answer. :)
> 
> The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
> loop.  When the AP exits SMM, it is in the JMP instruction.
> 
> As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
> in the 0-640K area (perhaps it could be in what your doc calls the
> "permanent PEI memory for the S3 resume path"?).  After thinking a
> bit more about it, it seems simplest to me if CpuS3.c just uses
> SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
> to jump to a small stub like
> 
>     POP EAX   ; pop return address
>     POP EAX   ; pop Context1 which is &mNumberToFinish
>     DEC [EAX]
>  1: CLI
>     HLT
>     JMP 1
> 
>> I could be utterly and inexcusably wrong, but I think that the
>> RIP=0x9f0fd symptom is a red herring.
> 
> I wouldn't call it a red herring.  After all, CR3 points to SMM
> exactly because the CR3 that was set up for the 0x9f000 stub is
> CpuS3.c's SMRAM page table root.

Hrmpf. The stub at 0x9f000 does not belong to PiSmmCpuDxeSmm. Regardless
of the boot path (normal boot or S3 resume), it belongs to CpuMpPei, and
it partakes in the implementation of the MP services PPI. It is
practically the "parking lot" for the APs when they are not executing
any MP job, submitted by an MP services PPI client.

So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).

(*) For example, OVMF's PlatformPei uses this service to program
MSR_IA32_FEATURE_CONTROL from fw_cfg. On the resume path too, that
occurs before we do the SMBASE relocation.

(I.e., before S3RestoreConfig2() in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" calls
SmmRestoreCpu() in "UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c", via
SmmS3ResumeState->SmmS3ResumeEntryPoint.)

When an AP executes RSM, its CR3 should automatically be restored to the
original (non-SMM) value, should it not? I mean I do remember the CR3
value from the QEMU register dump, but now I don't understand how that's
possible with SMM=0.

Sorry if I'm being dense :)

> What _is_ a red herring is KVM's trace showing a RSM instruction
> at RIP=0x9f0fd.  That is clearly bogus, RSM was rather the last
> instruction executed _before_ getting to that RIP.
> 
>>>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>>>       ------  ------  ------  ------
>>>               enter           enter
>>>        enter    |     enter     |
>>>          |      |       |       |
>>>        leave    |       |       |
>>>             <--------------------------- BAD
>>>        enter    |       |       |
>>>          |      |       |       |
>>>        leave  leave   leave   leave
>>
>> I don't understand why we don't get horrible faults on the APs
>> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
>> page tables, executable code, everything, will read as 0xff on QEMU. How
>> can the APs continue in SMM long enough to
>>
>> (a) time out and pull the BSP back into SMM,
>> (b) complete the rendezvous and exit SMM?
> 
> Because the "0xff" only applies when you're out of SMM.  The three
> states (open, closed, closed/locked) only apply when you're not in SMM.
> While the AP is in SMM they are executing in a separate address space
> where SMRAM is "not closed".  (In QEMU that's a separate AddressSpace
> struct, smram_address_space in target-i386/kvm.c).

Sigh, in retrospect, this should have been obvious. :) Thanks for
pointing it out!

Laszlo

>> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
>> explained the exact same thing before; I had to spell it out for
>> myself.) That leaves question (1) open. Why enter SMM in
>> S3ResumeExecuteBootScript() at all?
>>
>> Anyway, I think if the BSP and the APs are properly synchronized around
>> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
>> fixed. In that case, the APs' RSMs will restore the full context for the
>> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
>> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
>> buffer -- but the APs will sleep on), and then Linux will bring up the
>> APs, after taking control.
> 
> Agreed.
> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 22:59             ` Paolo Bonzini
  2016-11-09 23:27               ` Laszlo Ersek
@ 2016-11-10  0:49               ` Yao, Jiewen
  2016-11-10  0:50               ` Yao, Jiewen
  2 siblings, 0 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10  0:49 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek
  Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
	Fan, Jeff

> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.

Agreed.

[Jiewen] I hold different opinion on that.

If the AP is in hlt-loop at some below 1M memory with CR3 pointing to SMRAM, it has 2 issues.

* The below 1M memory (0x9f0fd) might be consumed by OS with other instruction.

* If AP starts running the code, it will get exception because CR3 is obviously wrong. The AP cannot fetch any code.

We might have 2 possibles way to trigger this scenario, at least.

A)     Jeff and I have discovered one possible case – AP may receive NMI/SMI, such as periodic SMI, before OS sends INIT-SIPI-SIPI to wake up AP.

B)     Paolo has found one real case - AP is in SMRAM when BSP is out and about to close SMRAM.

IMHO, letting BSP/AP sync in S3 resume just resolved B). But it does not help on A).
If the system has some special SMI, such as periodic SMI. It will definitely trigger case A).

So we have to fix the AP state anyway.

Now, if the AP state is fixed, I do not think we do not need worry about the BSP/AP out of sync issue.
BSP and AP can be independent. AP can receive NMI/SMI at any time and just work in HLT-loop.

Thank you
Yao Jiewen

From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, November 10, 2016 7:00 AM
To: Laszlo Ersek <lersek@redhat.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Zeng, Star <star.zeng@intel.com>; Fan, Jeff <jeff.fan@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.


> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.

This I can answer. :)

The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop.  When the AP exits SMM, it is in the JMP instruction.

As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?).  After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like

    POP EAX   ; pop return address
    POP EAX   ; pop Context1 which is &mNumberToFinish
    DEC [EAX]
 1: CLI
    HLT
    JMP 1

> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.

I wouldn't call it a red herring.  After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.

What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd.  That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.

> >       vcpu#0  vcpu#1  vcpu#2  vcpu#3
> >       ------  ------  ------  ------
> >               enter           enter
> >        enter    |     enter     |
> >          |      |       |       |
> >        leave    |       |       |
> >             <--------------------------- BAD
> >        enter    |       |       |
> >          |      |       |       |
> >        leave  leave   leave   leave
>
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
>
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?

Because the "0xff" only applies when you're out of SMM.  The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed".  (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).

> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
>
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.

Agreed.

Paolo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 22:59             ` Paolo Bonzini
  2016-11-09 23:27               ` Laszlo Ersek
  2016-11-10  0:49               ` Yao, Jiewen
@ 2016-11-10  0:50               ` Yao, Jiewen
  2016-11-10  1:02                 ` Fan, Jeff
  2 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10  0:50 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek
  Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
	Fan, Jeff

Fix a typo.

From: Yao, Jiewen
Sent: Thursday, November 10, 2016 8:49 AM
To: 'Paolo Bonzini' <pbonzini@redhat.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Zeng, Star <star.zeng@intel.com>; Fan, Jeff <jeff.fan@intel.com>
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.

Agreed.

[Jiewen] I hold different opinion on that.

If the AP is in hlt-loop at some below 1M memory with CR3 pointing to SMRAM, it has 2 issues.

* The below 1M memory (0x9f0fd) might be consumed by OS with other instruction.

* If AP starts running the code, it will get exception because CR3 is obviously wrong. The AP cannot fetch any code.

We might have 2 possibles way to trigger this scenario, at least.

A)     Jeff and I have discovered one possible case – AP may receive NMI/SMI, such as periodic SMI, before OS sends INIT-SIPI-SIPI to wake up AP.

B)     Paolo has found one real case - AP is in SMRAM when BSP is out and about to close SMRAM.

IMHO, letting BSP/AP sync in S3 resume just resolved B). But it does not help on A).
If the system has some special SMI, such as periodic SMI. It will definitely trigger case A).

So we have to fix the AP state anyway.

Now, if the AP state is fixed, I do not think we need worry about the BSP/AP out of sync issue.
BSP and AP can be independent. AP can receive NMI/SMI at any time and just work in HLT-loop.

Thank you
Yao Jiewen

From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, November 10, 2016 7:00 AM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.


> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.

This I can answer. :)

The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop.  When the AP exits SMM, it is in the JMP instruction.

As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?).  After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like

    POP EAX   ; pop return address
    POP EAX   ; pop Context1 which is &mNumberToFinish
    DEC [EAX]
 1: CLI
    HLT
    JMP 1

> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.

I wouldn't call it a red herring.  After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.

What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd.  That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.

> >       vcpu#0  vcpu#1  vcpu#2  vcpu#3
> >       ------  ------  ------  ------
> >               enter           enter
> >        enter    |     enter     |
> >          |      |       |       |
> >        leave    |       |       |
> >             <--------------------------- BAD
> >        enter    |       |       |
> >          |      |       |       |
> >        leave  leave   leave   leave
>
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
>
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?

Because the "0xff" only applies when you're out of SMM.  The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed".  (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).

> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
>
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.

Agreed.

Paolo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10  0:50               ` Yao, Jiewen
@ 2016-11-10  1:02                 ` Fan, Jeff
  0 siblings, 0 replies; 38+ messages in thread
From: Fan, Jeff @ 2016-11-10  1:02 UTC (permalink / raw)
  To: Yao, Jiewen, Paolo Bonzini, Laszlo Ersek
  Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star

I think it is necessary to place AP into one safe state: (hlt-loop, no page table required, > 1MB reserved space in non-SMM), just like we have done in MpInitExitBootServicesCallback() on normal boot path.

From: Yao, Jiewen
Sent: Thursday, November 10, 2016 8:51 AM
To: Paolo Bonzini; Laszlo Ersek
Cc: Kinney, Michael D; Tian, Feng; edk2-devel@ml01.01.org; Zeng, Star; Fan, Jeff
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

Fix a typo.

From: Yao, Jiewen
Sent: Thursday, November 10, 2016 8:49 AM
To: 'Paolo Bonzini' <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.

Agreed.

[Jiewen] I hold different opinion on that.

If the AP is in hlt-loop at some below 1M memory with CR3 pointing to SMRAM, it has 2 issues.

* The below 1M memory (0x9f0fd) might be consumed by OS with other instruction.

* If AP starts running the code, it will get exception because CR3 is obviously wrong. The AP cannot fetch any code.

We might have 2 possibles way to trigger this scenario, at least.

A)     Jeff and I have discovered one possible case – AP may receive NMI/SMI, such as periodic SMI, before OS sends INIT-SIPI-SIPI to wake up AP.

B)     Paolo has found one real case - AP is in SMRAM when BSP is out and about to close SMRAM.

IMHO, letting BSP/AP sync in S3 resume just resolved B). But it does not help on A).
If the system has some special SMI, such as periodic SMI. It will definitely trigger case A).

So we have to fix the AP state anyway.

Now, if the AP state is fixed, I do not think we need worry about the BSP/AP out of sync issue.
BSP and AP can be independent. AP can receive NMI/SMI at any time and just work in HLT-loop.

Thank you
Yao Jiewen

From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, November 10, 2016 7:00 AM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.


> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.

This I can answer. :)

The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop.  When the AP exits SMM, it is in the JMP instruction.

As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?).  After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like

    POP EAX   ; pop return address
    POP EAX   ; pop Context1 which is &mNumberToFinish
    DEC [EAX]
 1: CLI
    HLT
    JMP 1

> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.

I wouldn't call it a red herring.  After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.

What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd.  That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.

> >       vcpu#0  vcpu#1  vcpu#2  vcpu#3
> >       ------  ------  ------  ------
> >               enter           enter
> >        enter    |     enter     |
> >          |      |       |       |
> >        leave    |       |       |
> >             <--------------------------- BAD
> >        enter    |       |       |
> >          |      |       |       |
> >        leave  leave   leave   leave
>
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
>
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?

Because the "0xff" only applies when you're out of SMM.  The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed".  (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).

> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
>
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.

Agreed.

Paolo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 23:27               ` Laszlo Ersek
@ 2016-11-10  1:13                 ` Yao, Jiewen
  2016-11-10  6:30                   ` Fan, Jeff
  0 siblings, 1 reply; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10  1:13 UTC (permalink / raw)
  To: Laszlo Ersek, Paolo Bonzini
  Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star,
	Fan, Jeff

So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).

[Jiewen] It is very tricky.
First, in normal boot, the SMM need prepare a CR3 as SMM page table, which is obvious.

In S3, the S3Resume calls AsmWriteCr3(SmmS3ResumeState->SmmS3Cr3) then jump to SmmS3ResumeState->SmmS3ResumeEntryPoint. Now BSP hold SmmS3Cr3 but in non-SMM mode.

In SmmRestoreCpu(), BSP calls EarlyInitializeCpu()/PrepareApStartupVector() in non-SMM mode. And mExchangeInfo->Cr3         = (UINT32) (AsmReadCr3 ()); Now AP holds SmmS3Cr3 in non-SMM mode. It is OK, because SMRAM is OPEN.

When SmmRelocateBases() is called, AP is waken up and does rebase. SmmS3Cr3 is used for AP in SMM. But it does not change the fact that SmmS3Cr3 is also used in non-SMM mode.

Later in InitializeCpu(), AP wakeup buffer is put to below 1M with SmmS3Cr3.

Thank you
Yao Jiewen

From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, November 10, 2016 7:27 AM
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Zeng, Star <star.zeng@intel.com>; Fan, Jeff <jeff.fan@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

On 11/09/16 23:59, Paolo Bonzini wrote:
>
>> Another question I have -- and I feel I should really know it, but I
>> don't... -- is *why* the APs are executing code from the page at
>> 0x9f000.
>
> This I can answer. :)
>
> The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
> loop.  When the AP exits SMM, it is in the JMP instruction.
>
> As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
> in the 0-640K area (perhaps it could be in what your doc calls the
> "permanent PEI memory for the S3 resume path"?).  After thinking a
> bit more about it, it seems simplest to me if CpuS3.c just uses
> SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
> to jump to a small stub like
>
>     POP EAX   ; pop return address
>     POP EAX   ; pop Context1 which is &mNumberToFinish
>     DEC [EAX]
>  1: CLI
>     HLT
>     JMP 1
>
>> I could be utterly and inexcusably wrong, but I think that the
>> RIP=0x9f0fd symptom is a red herring.
>
> I wouldn't call it a red herring.  After all, CR3 points to SMM
> exactly because the CR3 that was set up for the 0x9f000 stub is
> CpuS3.c's SMRAM page table root.

Hrmpf. The stub at 0x9f000 does not belong to PiSmmCpuDxeSmm. Regardless
of the boot path (normal boot or S3 resume), it belongs to CpuMpPei, and
it partakes in the implementation of the MP services PPI. It is
practically the "parking lot" for the APs when they are not executing
any MP job, submitted by an MP services PPI client.

So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).

(*) For example, OVMF's PlatformPei uses this service to program
MSR_IA32_FEATURE_CONTROL from fw_cfg. On the resume path too, that
occurs before we do the SMBASE relocation.

(I.e., before S3RestoreConfig2() in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" calls
SmmRestoreCpu() in "UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c", via
SmmS3ResumeState->SmmS3ResumeEntryPoint.)

When an AP executes RSM, its CR3 should automatically be restored to the
original (non-SMM) value, should it not? I mean I do remember the CR3
value from the QEMU register dump, but now I don't understand how that's
possible with SMM=0.

Sorry if I'm being dense :)

> What _is_ a red herring is KVM's trace showing a RSM instruction
> at RIP=0x9f0fd.  That is clearly bogus, RSM was rather the last
> instruction executed _before_ getting to that RIP.
>
>>>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>>>       ------  ------  ------  ------
>>>               enter           enter
>>>        enter    |     enter     |
>>>          |      |       |       |
>>>        leave    |       |       |
>>>             <--------------------------- BAD
>>>        enter    |       |       |
>>>          |      |       |       |
>>>        leave  leave   leave   leave
>>
>> I don't understand why we don't get horrible faults on the APs
>> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
>> page tables, executable code, everything, will read as 0xff on QEMU. How
>> can the APs continue in SMM long enough to
>>
>> (a) time out and pull the BSP back into SMM,
>> (b) complete the rendezvous and exit SMM?
>
> Because the "0xff" only applies when you're out of SMM.  The three
> states (open, closed, closed/locked) only apply when you're not in SMM.
> While the AP is in SMM they are executing in a separate address space
> where SMRAM is "not closed".  (In QEMU that's a separate AddressSpace
> struct, smram_address_space in target-i386/kvm.c).

Sigh, in retrospect, this should have been obvious. :) Thanks for
pointing it out!

Laszlo

>> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
>> explained the exact same thing before; I had to spell it out for
>> myself.) That leaves question (1) open. Why enter SMM in
>> S3ResumeExecuteBootScript() at all?
>>
>> Anyway, I think if the BSP and the APs are properly synchronized around
>> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
>> fixed. In that case, the APs' RSMs will restore the full context for the
>> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
>> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
>> buffer -- but the APs will sleep on), and then Linux will bring up the
>> APs, after taking control.
>
> Agreed.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10  1:13                 ` Yao, Jiewen
@ 2016-11-10  6:30                   ` Fan, Jeff
  0 siblings, 0 replies; 38+ messages in thread
From: Fan, Jeff @ 2016-11-10  6:30 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, Paolo Bonzini
  Cc: Kinney, Michael D, Tian, Feng, edk2-devel@ml01.01.org, Zeng, Star

Laszlo,

I just sent the patch to place AP into safe hlt-loop code (in NVS range > 1MB, 32 bit protected mode).

Could you check if it could solve the S3 unstable issue on OVMF?

Thanks!
Jeff

From: Yao, Jiewen
Sent: Thursday, November 10, 2016 9:13 AM
To: Laszlo Ersek; Paolo Bonzini
Cc: Kinney, Michael D; Tian, Feng; edk2-devel@ml01.01.org; Zeng, Star; Fan, Jeff
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).
[Jiewen] It is very tricky.
First, in normal boot, the SMM need prepare a CR3 as SMM page table, which is obvious.

In S3, the S3Resume calls AsmWriteCr3(SmmS3ResumeState->SmmS3Cr3) then jump to SmmS3ResumeState->SmmS3ResumeEntryPoint. Now BSP hold SmmS3Cr3 but in non-SMM mode.

In SmmRestoreCpu(), BSP calls EarlyInitializeCpu()/PrepareApStartupVector() in non-SMM mode. And mExchangeInfo->Cr3         = (UINT32) (AsmReadCr3 ()); Now AP holds SmmS3Cr3 in non-SMM mode. It is OK, because SMRAM is OPEN.

When SmmRelocateBases() is called, AP is waken up and does rebase. SmmS3Cr3 is used for AP in SMM. But it does not change the fact that SmmS3Cr3 is also used in non-SMM mode.

Later in InitializeCpu(), AP wakeup buffer is put to below 1M with SmmS3Cr3.

Thank you
Yao Jiewen

From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, November 10, 2016 7:27 AM
To: Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>
Cc: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

On 11/09/16 23:59, Paolo Bonzini wrote:
>
>> Another question I have -- and I feel I should really know it, but I
>> don't... -- is *why* the APs are executing code from the page at
>> 0x9f000.
>
> This I can answer. :)
>
> The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
> loop.  When the AP exits SMM, it is in the JMP instruction.
>
> As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
> in the 0-640K area (perhaps it could be in what your doc calls the
> "permanent PEI memory for the S3 resume path"?).  After thinking a
> bit more about it, it seems simplest to me if CpuS3.c just uses
> SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
> to jump to a small stub like
>
>     POP EAX   ; pop return address
>     POP EAX   ; pop Context1 which is &mNumberToFinish
>     DEC [EAX]
>  1: CLI
>     HLT
>     JMP 1
>
>> I could be utterly and inexcusably wrong, but I think that the
>> RIP=0x9f0fd symptom is a red herring.
>
> I wouldn't call it a red herring.  After all, CR3 points to SMM
> exactly because the CR3 that was set up for the 0x9f000 stub is
> CpuS3.c's SMRAM page table root.

Hrmpf. The stub at 0x9f000 does not belong to PiSmmCpuDxeSmm. Regardless
of the boot path (normal boot or S3 resume), it belongs to CpuMpPei, and
it partakes in the implementation of the MP services PPI. It is
practically the "parking lot" for the APs when they are not executing
any MP job, submitted by an MP services PPI client.

So, I don't understand how the CR3s that are used by the APs when they
serve MP services PPI requests, throughout the PEI phase (*), have
anything to do with CpuS3.c's page tables (which live in SMRAM, AIUI).

(*) For example, OVMF's PlatformPei uses this service to program
MSR_IA32_FEATURE_CONTROL from fw_cfg. On the resume path too, that
occurs before we do the SMBASE relocation.

(I.e., before S3RestoreConfig2() in
"UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c" calls
SmmRestoreCpu() in "UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c", via
SmmS3ResumeState->SmmS3ResumeEntryPoint.)

When an AP executes RSM, its CR3 should automatically be restored to the
original (non-SMM) value, should it not? I mean I do remember the CR3
value from the QEMU register dump, but now I don't understand how that's
possible with SMM=0.

Sorry if I'm being dense :)

> What _is_ a red herring is KVM's trace showing a RSM instruction
> at RIP=0x9f0fd.  That is clearly bogus, RSM was rather the last
> instruction executed _before_ getting to that RIP.
>
>>>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>>>       ------  ------  ------  ------
>>>               enter           enter
>>>        enter    |     enter     |
>>>          |      |       |       |
>>>        leave    |       |       |
>>>             <--------------------------- BAD
>>>        enter    |       |       |
>>>          |      |       |       |
>>>        leave  leave   leave   leave
>>
>> I don't understand why we don't get horrible faults on the APs
>> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
>> page tables, executable code, everything, will read as 0xff on QEMU. How
>> can the APs continue in SMM long enough to
>>
>> (a) time out and pull the BSP back into SMM,
>> (b) complete the rendezvous and exit SMM?
>
> Because the "0xff" only applies when you're out of SMM.  The three
> states (open, closed, closed/locked) only apply when you're not in SMM.
> While the AP is in SMM they are executing in a separate address space
> where SMRAM is "not closed".  (In QEMU that's a separate AddressSpace
> struct, smram_address_space in target-i386/kvm.c).

Sigh, in retrospect, this should have been obvious. :) Thanks for
pointing it out!

Laszlo

>> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
>> explained the exact same thing before; I had to spell it out for
>> myself.) That leaves question (1) open. Why enter SMM in
>> S3ResumeExecuteBootScript() at all?
>>
>> Anyway, I think if the BSP and the APs are properly synchronized around
>> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
>> fixed. In that case, the APs' RSMs will restore the full context for the
>> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
>> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
>> buffer -- but the APs will sleep on), and then Linux will bring up the
>> APs, after taking control.
>
> Agreed.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-09 20:46     ` Laszlo Ersek
@ 2016-11-10 10:41       ` Yao, Jiewen
  2016-11-10 12:01         ` Laszlo Ersek
  2016-11-10 12:27         ` Paolo Bonzini
  0 siblings, 2 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10 10:41 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Thanks to report case 3 issue on bugzillar.

Let's focus on Case 8.
It seems another random failure issue.

I did more test.

1)      I tested some other our internal real platform for UEFI32 OS boot. I cannot reproduce the ASSERT.

2)      I wrote a small test app to call ExitBootServices and send SMI. I run it on current my windows QEMU but I still cannot reproduce the ASSERT.

It seem your env is the only way to repo the issue. I am trying to follow your step by step to install OS on QEMU/KVM. I haven't finish all thing yet, because of some network proxy issue. :(

Your information and analysis is great. It gives us some clue.

I guess the same thing as you and checked: InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);

This address is initialized in InitializeMpSyncData(), with gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus which is got from MpServices->GetNumberOfProcessors().
I do not know why this address is zero.

I also did not quite understand below log.

* CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
  CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
  CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
  CPU #3: pc=0x000000007ffd17ca thread_id=7838

As I recall, writing to B2 only cause BSP get SMI on OVMF. AP does not enter SMM mode.
So why #3 can enter SMM mode? Is that expected behavior? Or unexpected behavior?
If this is expected, how this happened? Does OS send SendSmiIpiAllExcludingSelf, after ExitBootServices()?

I will see if I can finish QEMU/KVM installation tomorrow.

If you have some idea on why and how #3 enter SMM, please let us know.


Thank you
Yao Jiewen


From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, November 10, 2016 4:46 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

On 11/09/16 07:25, Yao, Jiewen wrote:
> Hi Laszlo
> I will fix DEBUG message issue in V3 patch.
>
> Below is rest issues:
>
>
> l  Case 13: S3 fails randomly.
> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>
>
> 1)      We believe the dead CPU is AP. Not BSP.
> The reason is that:
>
> 1.1)   The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>
> 1.2)   The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>
> 1.3)   The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>
>
> 2)      Based upon the 1), we reviewed S3 resume AP flow.
> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>
>
> 3)      The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>
>
> 4)      I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>
>
> 5)      The fix, I think, should be below:
> We should always put AP to protected mode, so that no paging is needed.
> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>
>
> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>
> There is no need to do more investigation. Thanks for your great help on that. :)

Thank you for your help!

I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is

    BSP exits SMM and closes SMRAM on the S3 resume path before
    meeting with AP(s)

I hope the title is mostly right. I didn't add any other details (I
haven't gone through the thread in detail yet, and without that I can't
even write up a semi-reasonable report myself). Instead, I referenced
this message of yours in the report, and I also linked Paolo's analysis
from elsewhere in the thread. I hope this will do for the report.

(Also, thank you Paolo, from the amazing analysis -- I haven't digested
it yet, but I can already tell it's amazing! :))

> l  Case 17 - I do not think it is a real issue, because SMM is out of resource.
>
>
> l  Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
> SPIN_LOCK *
> EFIAPI
> InitializeSpinLock (
>   OUT      SPIN_LOCK                 *SpinLock
>   )
> {
>   ASSERT (SpinLock != NULL);
>
>   _ReadWriteBarrier();
>   *SpinLock = SPIN_LOCK_RELEASED;
>   _ReadWriteBarrier();
>
>   return SpinLock;
> }
>
> If you can have a quick check on below, that would be great.
>
> 1)      Which processor triggers this ASSERT? BSP or AP.
>
> 2)      Which module triggers this ASSERT? Which module contains current RIP value?

First, one additional piece of info I have learned is that the issue
does not always present itself. Sometimes the boot just works fine,
other times the assert fires.

Using the QEMU monitor, I managed to get the following information with
the "info cpus" command:

* CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
  CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
  CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
  CPU #3: pc=0x000000007ffd17ca thread_id=7838

VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
point into SMRAM again.

In the OVMF log, I see

Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
PiSmmCpuDxeSmm.efi

So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
entry point, 0x8577, 0x253 bytes less).

Running

  objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug

first I see confirmation that

  start address 0x00000253

and then

000087bd <CpuDeadLoop>:
VOID
EFIAPI
CpuDeadLoop (
  VOID
  )
{
    87bd:       55                      push   %ebp
    87be:       89 e5                   mov    %esp,%ebp
    87c0:       83 ec 10                sub    $0x10,%esp
  volatile UINTN  Index;

  for (Index = 0; Index == 0;);
    87c3:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
    87ca:       8b 45 fc                mov    -0x4(%ebp),%eax  <-- HERE
    87cd:       85 c0                   test   %eax,%eax
    87cf:       74 f9                   je     87ca <CpuDeadLoop+0xd>
}
    87d1:       c9                      leave
    87d2:       c3                      ret

This seems consistent with an assertion failure.

I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
like a possible caller:

      //
      // The BUSY lock is initialized to Released state. This needs to
      // be done early enough to be ready for BSP's SmmStartupThisAp()
      // call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
      // called immediately after AP's present flag is detected.
      //
      InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);

Just a guess, of course.

> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
> If you can share a step by step to me, that would be great.

(1) Grab a host computer with a CPU that supports VMX and EPT.

(2) Download and install Fedora 24 (for example):

https://getfedora.org/en/workstation/download/
http://docs.fedoraproject.org/install-guide

(3) Install the "qemu-system-x86" package with DNF

dnf install qemu-system-x86

(4) clone edk2 with git

(5) embed OpenSSL optionally (for secure boot); see
"CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"

(6) build OVMF:

source edksetup.sh
make -C "$EDK_TOOLS_PATH"

# Ia32
build \
  -a IA32 \
  -p OvmfPkg/OvmfPkgIa32.dsc \
  -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
  -t GCC5 -b DEBUG

# Ia32X64
build \
  -a IA32 -a X64 \
  -p OvmfPkg/OvmfPkgIa32X64.dsc \
  -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
  -t GCC5 -b DEBUG

(7) Create disk images:

qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
  -o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G

qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
  -o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G

(8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
that you downloaded already (the ISO image).

For 32-bit guest OS, this one used to work:

https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/

minimally the 20141209 release. Hm... actually, I think the maintainer
of that image has discontinued the downloadable files :(

So, I don't know what 32-bit UEFI OS to recommend for testing.

32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
times, with some help from a Microsoft developer, but we couldn't solve
it), so I can't recommend Windows as an alternative.

Perhaps you can use

https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html

as a 32-bit guest OS, I never tried.

(9) Anyway, once you have an installer ISO, set the "ISO" environment
variable to the ISO image's full pathname, and then run QEMU like this:

# Settings for Ia32 only:

ISO=...
DISK=.../disk-ia32.img
FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
VARS=vars-32.fd
QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
DEBUG=debug-32.log

# Settings for Ia32X64 only:

ISO=...
DISK=.../disk-ia32x64.img
FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
VARS=vars-3264.fd
QEMU_COMMAND=qemu-system-x86_64
DEBUG=debug-3264.log

# Common commands for both target arches:

# create variable store from varstore template
# if the former doesn't exist yet
if ! [ -e "$VARS" ]; then
  cp -- "$TEMPLATE" "$VARS"
fi

$QEMU_COMMAND \
  -machine q35,smm=on,accel=kvm \
  -m 4096 \
  -smp sockets=1,cores=2,threads=2 \
  -global driver=cfi.pflash01,property=secure,value=on \
  -drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
  -drive if=pflash,format=raw,unit=1,file=${VARS} \
  \
  -chardev file,id=debugfile,path=$DEBUG \
  -device isa-debugcon,iobase=0x402,chardev=debugfile \
  \
  -chardev stdio,id=char0,signal=off,mux=on \
  -mon chardev=char0,mode=readline,default \
  -serial chardev:char0 \
  \
  -drive id=iso,if=none,format=raw,readonly,file=$ISO \
  -drive id=disk,if=none,format=qcow2,file=$DISK \
  \
  -device virtio-scsi-pci,id=scsi0 \
  -device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
  -device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
  \
  -device VGA

This will capture the OVMF debug output in the $DEBUG file. Also, the
terminal where you run the command can be switched between the guest's
serial console and the QEMU monitor with [Ctrl-A C].

Thanks
Laszlo

>
> Thank you
> Yao Jiewen
>
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Tuesday, November 8, 2016 9:22 AM
> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
> On 11/04/16 10:30, Jiewen Yao wrote:
>> ==== below is V2 description ====
>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>> 4) PiSmmCpu: Add protection detail in commit message.
>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>
>> ==== below is V1 description ====
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2  X64
>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>
>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>
>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>
>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com>>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>
>
> I have new test results. Let's start with the table again:
>
> Legend:
>
> - "untested" means the test was not executed because the same test
>   failed or proved unreliable in a less demanding configuration already,
>
> - "n/a" means a setting or test case was impossible,
>
> - "fail" and "unreliable" (lower case) are outside the scope of this
>   series; they either capture the pre-series status, or are expected
>   even with the series applied due to the pre-series status,
>
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>   series.
>
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
>
>    series  OVMF                                                              VCPU     boot       S3 resume
>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>  1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
>  2 no      Ia32     255                             n/a                      52x2x2   pass       untested
>  3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
>  4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
>  5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
>  6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
>  7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
>  8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
>  9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
> 10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
> 11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
> 12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
> 13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
> 14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
> 15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
> 16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
> 17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
> 18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested
>
> * Case 8: this test case failed with v2 as well, but this time with
>   different symptoms:
>
>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> PixelBlueGreenRedReserved8BitPerColor
>> ConvertPages: Incompatible memory types
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitExitBootServicesCallback() done!
>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>
>   I didn't try to narrow this down.
>
> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>   and good. The good news is for Jiewen: this patch series does not
>   cause the unreliability, it "only" amplifies it severely. The bad news
>   is correspondingly for everyone else: S3 resume is actually unreliable
>   even in case 4, that is, without this series applied, it's just the
>   failure rate is much-much lower.
>
>   Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>   21 tries. (I stopped testing at the 8th failure.)
>
>   Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>   exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>   #12 that failed; I continued testing and aborted the test after the
>   55th try.)
>
>   So, while the series hugely amplifies the failure rate, the failure
>   does exist without the series. Which is why I modified the case 4
>   results in the table, and also lower-cased the word "unreliable" in
>   case 13.
>
>   Below I will return to this problem separately; let's go over the rest
>   of the table first.
>
> * Case 17: I guess this is not a real failure, I'm just including it for
>   completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>   additional SMRAM demand (see the commit message on patch V2 4/6). This
>   case fails with
>
>> SmmLockBox Command - 4
>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>> SmmLockBox SmmLockBoxHandler Exit
>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>
>   which is an SMRAM allocation failure. If I lower the VCPU count to
>   50x2x2, then the guest boots fine.
>
> ----*----
>
> Before I get to the S3 resume problem (which, again, reproduces without
> this series, although much less frequently), I'd like to comment on the
> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
> function, on the return value of SmmBlockingStartupThisAp(). This change
> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>
>> !mSmmMpSyncData->CpuData[1].Present
>> !mSmmMpSyncData->CpuData[2].Present
>> !mSmmMpSyncData->CpuData[3].Present
>> ...
>
> messages in the OVMF boot log, interspersed with
>
>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>
> style messages. (That is, one error message for each AP, per
> ConvertPageEntryAttribute() message.)
>
> Is this okay / intentional? The number of these messages can go up to
> several thousands and that sort of drowns out everything else in the
> log.
>
> It's also not easy to mask the message, because it's logged on the
> DEBUG_ERROR level.
>
> ----*----
>
> * Okay, so the S3 problem. Last time I suspected that the failure point
>   (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>   9A1D0, according to the OVMF log). In order to test this idea, I
>   exercised this series with S3 against a Windows 8.1 guest (--> case 13
>   again). The failure reproduced on the second S3 resume, with identical
>   RIP, despite the Windows wakeup vector being located elsewhere (at
>   0x1000).
>
>   Quoting the OVMF log leading up to the resume:
>
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 1000
>
>   QEMU log (same as before):
>
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>
>   So, we can exclude the suspicion that the problem is guest OS
>   dependent.
>
> * Then I looked for the base address of the page containing the
>   RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>   some firmware component might have allocated that area actually. Here
>   we go:
>
>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>> AP Loop Mode is 1
>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>
>   That is, the failure hits (when it hits -- not always) in the area
>   where the CpuMpPei driver *borrows* memory for the startup vector of
>   the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>   overloaded word here; the "wakeup buffer" has nothing to do with S3
>   resume, it just serves for booting the APs temporarily in PEI, for
>   implementing the MP service PPI.)
>
>   When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>   the original contents of this area. This occurs just before
>   transfering control to the guest OS wakeup vector: see the
>   "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>   quoted from the OVMF log.
>
>   I documented (parts of) this logic in OVMF commit
>
>     https://github.com/tianocore/edk2/commit/e3e3090a959a0
>
>   (see the code comments as well).
>
> * At that time, I thought to have identified a memory management bug in
>   CpuMpPei; see the following discussion and bug report for details:
>
>     https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>     https://bugzilla.tianocore.org/show_bug.cgi?id=67
>
>   However, with the extraction / introduction of MpInitLib, this issue
>   has been fixed: GetWakeupBuffer() now calls
>   CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>   no longer; we shouldn't be looking there for the root cause.
>
> * Either way, I don't understand why anything would want to execute code
>   in the one page that happens to host the MP services PPI startup
>   buffer for APs during PEI.
>
>   Not understanding the "why", I looked at the "what", and resorted to
>   tracing KVM. Because the problem readily reproduces with this series
>   applied (case 13), it wasn't hard to start the tracing while the guest
>   was suspended, and capture just the actions that led from the
>   KVM-level wakeup to the failure.
>
>   The QEMU state dumps are visible above in the email. I've also
>   uploaded the compressed OVMF log and the textual KVM trace here:
>
>     http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>
>   I sincerely hope that Paolo will have a field day with the KVM trace
>   :) I managed to identify the following curiosities (remember this is
>   all on the S3 resume path):
>
>   * First, the VCPUs (there are four of them) enter and leave SMM in a
>     really funky pattern:
>
>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>       ------  ------  ------  ------
>               enter
>                |
>               leave
>
>                       enter
>                         |
>                       leave
>
>                               enter
>                                 |
>                               leave
>
>       enter
>         |
>       leave
>
>               enter           enter
>        enter    |     enter     |
>          |      |       |       |
>        leave    |       |       |
>                 |       |       |
>        enter    |       |       |
>          |      |       |       |
>        leave  leave   leave   leave
>
>     That is, first we have each VCPU enter and leave SMM in complete
>     isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>     followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>     temporarily (it comes back in later), while the other three remain
>     in SMM. Finally all four of them leave SMM together.
>
>     After which the problem occurs.
>
>   * Second, the instruction that causes things to blow up is <0f aa>,
>     i.e., RSM. I have absolutely no clue why RSM is executed:
>
>     (a) in the area that used to host the AP startup routine for the MP
>     services PPI -- note that we also have "Transfer to 16bit OS waking
>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>     area completeley! --,
>
>     (b) and why *after* all four VCPUs have just left SMM, together.
>
>   * The RSM instruction is handled successfully elsewhere, for example
>     when all four VCPUs leave SMM, at the bottom of the diagram above:
>
>> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
>> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
>> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
>> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
>
>   * The guest-phys address 7ff7f000 that we see just before the error:
>
>> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
>> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
>> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)
>
>     can be found higher up in the trace; namely, it is written to CR3
>     several times. It's the root of the page tables.
>
>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>
> * I also tried the "info tlb" monitor command, via "virsh
>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>   crash.
>
>   I cannot provide results: QEMU appeared to return a message that would
>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>
>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>   with what Paolo said about "Code=?? ?? ??...":
>
>     The question marks usually mean that the page tables do not map a
>     page at that address.
>
>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>   even begin walking the page tables.
>
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10 10:41       ` Yao, Jiewen
@ 2016-11-10 12:01         ` Laszlo Ersek
  2016-11-10 14:48           ` Yao, Jiewen
  2016-11-10 12:27         ` Paolo Bonzini
  1 sibling, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-10 12:01 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

On 11/10/16 11:41, Yao, Jiewen wrote:
> Thanks to report case 3 issue on bugzillar.
> 
> Let's focus on Case 8.
> It seems another random failure issue.
> 
> I did more test.
> 
> 1)      I tested some other our internal real platform for UEFI32 OS boot. I cannot reproduce the ASSERT.
> 
> 2)      I wrote a small test app to call ExitBootServices and send SMI. I run it on current my windows QEMU but I still cannot reproduce the ASSERT.
> 
> It seem your env is the only way to repo the issue. I am trying to follow your step by step to install OS on QEMU/KVM. I haven't finish all thing yet, because of some network proxy issue. :(

Right, when you run a guest on TCG (QEMU's emulator) vs. on KVM (the virtualizer / accelerator in the host Linux kernel), you get very-very different timing behavior and interleaving of actions. For one, with KVM, the VCPUs really execute in parallel -- they are represented by host OS threads, and the host OS schedules them to separate "physical logical CPUs".

> 
> Your information and analysis is great. It gives us some clue.
> 
> I guess the same thing as you and checked: InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
> 
> This address is initialized in InitializeMpSyncData(), with gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus which is got from MpServices->GetNumberOfProcessors().
> I do not know why this address is zero.
> 
> I also did not quite understand below log.
> 
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
>   CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
>   CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
>   CPU #3: pc=0x000000007ffd17ca thread_id=7838
> 
> As I recall, writing to B2 only cause BSP get SMI on OVMF. AP does not enter SMM mode.
> So why #3 can enter SMM mode? Is that expected behavior? Or unexpected behavior?
> If this is expected, how this happened? Does OS send SendSmiIpiAllExcludingSelf, after ExitBootServices()?

My theory is that the OS is calling a runtime variable service during boot. That is supposed to pull in all APs into SMM, one way or another.

Also, during boot, the OS may call the runtime variable services genuinely on VCPU#3.

> 
> I will see if I can finish QEMU/KVM installation tomorrow.

Thanks! Once you can test with KVM on your side, that should speed up debugging considerably, I think!

> If you have some idea on why and how #3 enter SMM, please let us know.

Well, I captured a KVM trace for this as well (fresh boot, up to the failure). Grepping the trace for entering / leaving SMM, we see:

(1) the initial SMBASE relocation:

             CPU-6948  [004] 11545.040294: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
             CPU-6948  [004] 11545.040335: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb5000
             CPU-6949  [000] 11545.040363: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
             CPU-6949  [000] 11545.040389: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb7000
             CPU-6950  [002] 11545.040417: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
             CPU-6950  [002] 11545.040443: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb9000
             CPU-6947  [007] 11545.040453: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
             CPU-6947  [007] 11545.040474: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb3000

(2) a long stretch of VCPU#0 entering and leaving SMM, while the firmware uses variable services and such:

             CPU-6947  [007] 11545.053169: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb3000
             CPU-6947  [007] 11545.061272: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb3000
             ...

(3) a write to ioport 0xB2 from VCPU#3, then VCPU#3 entering SMM, then hitting the assert very-very soon:

             CPU-6950  [005] 11550.521195: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521196: kvm_exit:             reason IO_INSTRUCTION rip 0xf7c937b6 info b20000 0
             CPU-6950  [005] 11550.521196: kvm_pio:              pio_write at 0xb2 size 1 count 1 val 0x0 
             CPU-6950  [005] 11550.521196: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6947  [003] 11550.521196: kvm_inj_virq:         irq 253
             CPU-6950  [005] 11550.521196: kvm_fpu:              unload
             CPU-6947  [003] 11550.521197: kvm_fpu:              load
             CPU-6947  [003] 11550.521197: kvm_entry:            vcpu 0
             CPU-6950  [005] 11550.521200: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb9000
             CPU-6947  [003] 11550.521207: kvm_eoi:              apicid 0 vector 253
             CPU-6950  [005] 11550.521207: kvm_fpu:              load
             CPU-6947  [003] 11550.521207: kvm_pv_eoi:           apicid 0 vector 253
             CPU-6950  [005] 11550.521207: kvm_entry:            vcpu 3
             CPU-6947  [003] 11550.521207: kvm_exit:             reason HLT rip 0xc1844554 info 0 0
             CPU-6950  [005] 11550.521209: kvm_exit:             reason CR_ACCESS rip 0x8045 info 300 0
             CPU-6950  [005] 11550.521209: kvm_cr:               cr_write 0 = 0x33
             CPU-6950  [005] 11550.521212: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521213: kvm_exit:             reason CR_ACCESS rip 0x7ffc107d info 3 0
             CPU-6950  [005] 11550.521213: kvm_cr:               cr_write 3 = 0x7ff9a000
             CPU-6950  [005] 11550.521214: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521214: kvm_exit:             reason CPUID rip 0x7ffc1085 info 0 0
             CPU-6950  [005] 11550.521214: kvm_cpuid:            func 1 rax 6e8 rbx 3040800 rcx 80200001 rdx 1f89fbff
             CPU-6950  [005] 11550.521215: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521215: kvm_exit:             reason CR_ACCESS rip 0x7ffc10c4 info 4 0
             CPU-6950  [005] 11550.521215: kvm_cr:               cr_write 4 = 0x668
             CPU-6950  [005] 11550.521217: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521218: kvm_exit:             reason CR_ACCESS rip 0x7ffc110e info 300 0
             CPU-6950  [005] 11550.521218: kvm_cr:               cr_write 0 = 0x80010033
             CPU-6950  [005] 11550.521220: kvm_entry:            vcpu 3
             CPU-6947  [003] 11550.521220: kvm_fpu:              unload
             CPU-6950  [005] 11550.521222: kvm_exit:             reason EPT_VIOLATION rip 0x7ffcbe46 info 181 0
             CPU-6950  [005] 11550.521223: kvm_page_fault:       address 22004ebc error_code 181
             CPU-6950  [005] 11550.521231: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521236: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521236: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x41  <----------------- "A"
             CPU-6950  [005] 11550.521237: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521237: kvm_fpu:              unload
             CPU-6950  [005] 11550.521253: kvm_fpu:              load
             CPU-6950  [005] 11550.521253: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521254: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521254: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x53  <----------------- "S"
             CPU-6950  [005] 11550.521254: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521254: kvm_fpu:              unload
             CPU-6950  [005] 11550.521257: kvm_fpu:              load
             CPU-6950  [005] 11550.521257: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521258: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521258: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x53  <----------------- "S"
             CPU-6950  [005] 11550.521258: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521258: kvm_fpu:              unload
             CPU-6950  [005] 11550.521260: kvm_fpu:              load
             CPU-6950  [005] 11550.521260: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521261: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521261: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x45  <----------------- "E"
             CPU-6950  [005] 11550.521261: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521262: kvm_fpu:              unload
             CPU-6950  [005] 11550.521264: kvm_fpu:              load
             CPU-6950  [005] 11550.521264: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521264: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521264: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x52  <----------------- "R"
             CPU-6950  [005] 11550.521264: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521265: kvm_fpu:              unload
             CPU-6950  [005] 11550.521267: kvm_fpu:              load
             CPU-6950  [005] 11550.521267: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521267: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521267: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x54  <----------------- "T"
             CPU-6950  [005] 11550.521268: kvm_userspace_exit:   reason KVM_EXIT_IO (2)

This seems to be consistent with the OS calling a variable service on VCPU#3.

Also, as far as I can see, the above trace matches the assembly code in "UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm".

Is perhaps CpuIndex out of bounds?... Hmm, with the following debug patch:

> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> index d0092d2f145a..29f6e783c58f 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> @@ -1143,6 +1143,9 @@ SmiRendezvous (
>        // E.g., with Relaxed AP flow, SmmStartupThisAp() may be called immediately
>        // after AP's present flag is detected.
>        //
> +      if (CpuIndex >= 4) {
> +        DEBUG ((EFI_D_ERROR, "CpuIndex=%u\n", (UINT32)CpuIndex));
> +      }
>        InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>      }
>  
> 

I get the following debug output (note that my SMP configuration is 1x2x2):

> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> CpuIndex=780161211
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)

Ehm... what? :)

SmiRendezvous() is EFIAPI, is the calling convention followed in "Ia32/SmiEntry.nasm"?

Thanks,
Laszlo

> Thank you
> Yao Jiewen
> 
> 
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, November 10, 2016 4:46 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> 
> On 11/09/16 07:25, Yao, Jiewen wrote:
>> Hi Laszlo
>> I will fix DEBUG message issue in V3 patch.
>>
>> Below is rest issues:
>>
>>
>> l  Case 13: S3 fails randomly.
>> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>>
>>
>> 1)      We believe the dead CPU is AP. Not BSP.
>> The reason is that:
>>
>> 1.1)   The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>>
>> 1.2)   The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>>
>> 1.3)   The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>>
>>
>> 2)      Based upon the 1), we reviewed S3 resume AP flow.
>> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
>> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
>> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>>
>>
>> 3)      The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>>
>>
>> 4)      I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>>
>>
>> 5)      The fix, I think, should be below:
>> We should always put AP to protected mode, so that no paging is needed.
>> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>>
>>
>> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>>
>> There is no need to do more investigation. Thanks for your great help on that. :)
> 
> Thank you for your help!
> 
> I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is
> 
>     BSP exits SMM and closes SMRAM on the S3 resume path before
>     meeting with AP(s)
> 
> I hope the title is mostly right. I didn't add any other details (I
> haven't gone through the thread in detail yet, and without that I can't
> even write up a semi-reasonable report myself). Instead, I referenced
> this message of yours in the report, and I also linked Paolo's analysis
> from elsewhere in the thread. I hope this will do for the report.
> 
> (Also, thank you Paolo, from the amazing analysis -- I haven't digested
> it yet, but I can already tell it's amazing! :))
> 
>> l  Case 17 - I do not think it is a real issue, because SMM is out of resource.
>>
>>
>> l  Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
>> SPIN_LOCK *
>> EFIAPI
>> InitializeSpinLock (
>>   OUT      SPIN_LOCK                 *SpinLock
>>   )
>> {
>>   ASSERT (SpinLock != NULL);
>>
>>   _ReadWriteBarrier();
>>   *SpinLock = SPIN_LOCK_RELEASED;
>>   _ReadWriteBarrier();
>>
>>   return SpinLock;
>> }
>>
>> If you can have a quick check on below, that would be great.
>>
>> 1)      Which processor triggers this ASSERT? BSP or AP.
>>
>> 2)      Which module triggers this ASSERT? Which module contains current RIP value?
> 
> First, one additional piece of info I have learned is that the issue
> does not always present itself. Sometimes the boot just works fine,
> other times the assert fires.
> 
> Using the QEMU monitor, I managed to get the following information with
> the "info cpus" command:
> 
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
>   CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
>   CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
>   CPU #3: pc=0x000000007ffd17ca thread_id=7838
> 
> VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
> point into SMRAM again.
> 
> In the OVMF log, I see
> 
> Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
> PiSmmCpuDxeSmm.efi
> 
> So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
> entry point, 0x8577, 0x253 bytes less).
> 
> Running
> 
>   objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug
> 
> first I see confirmation that
> 
>   start address 0x00000253
> 
> and then
> 
> 000087bd <CpuDeadLoop>:
> VOID
> EFIAPI
> CpuDeadLoop (
>   VOID
>   )
> {
>     87bd:       55                      push   %ebp
>     87be:       89 e5                   mov    %esp,%ebp
>     87c0:       83 ec 10                sub    $0x10,%esp
>   volatile UINTN  Index;
> 
>   for (Index = 0; Index == 0;);
>     87c3:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
>     87ca:       8b 45 fc                mov    -0x4(%ebp),%eax  <-- HERE
>     87cd:       85 c0                   test   %eax,%eax
>     87cf:       74 f9                   je     87ca <CpuDeadLoop+0xd>
> }
>     87d1:       c9                      leave
>     87d2:       c3                      ret
> 
> This seems consistent with an assertion failure.
> 
> I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
> SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
> like a possible caller:
> 
>       //
>       // The BUSY lock is initialized to Released state. This needs to
>       // be done early enough to be ready for BSP's SmmStartupThisAp()
>       // call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
>       // called immediately after AP's present flag is detected.
>       //
>       InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
> 
> Just a guess, of course.
> 
>> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
>> If you can share a step by step to me, that would be great.
> 
> (1) Grab a host computer with a CPU that supports VMX and EPT.
> 
> (2) Download and install Fedora 24 (for example):
> 
> https://getfedora.org/en/workstation/download/
> http://docs.fedoraproject.org/install-guide
> 
> (3) Install the "qemu-system-x86" package with DNF
> 
> dnf install qemu-system-x86
> 
> (4) clone edk2 with git
> 
> (5) embed OpenSSL optionally (for secure boot); see
> "CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"
> 
> (6) build OVMF:
> 
> source edksetup.sh
> make -C "$EDK_TOOLS_PATH"
> 
> # Ia32
> build \
>   -a IA32 \
>   -p OvmfPkg/OvmfPkgIa32.dsc \
>   -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
>   -t GCC5 -b DEBUG
> 
> # Ia32X64
> build \
>   -a IA32 -a X64 \
>   -p OvmfPkg/OvmfPkgIa32X64.dsc \
>   -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
>   -t GCC5 -b DEBUG
> 
> (7) Create disk images:
> 
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
>   -o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G
> 
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
>   -o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G
> 
> (8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
> that you downloaded already (the ISO image).
> 
> For 32-bit guest OS, this one used to work:
> 
> https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/
> 
> minimally the 20141209 release. Hm... actually, I think the maintainer
> of that image has discontinued the downloadable files :(
> 
> So, I don't know what 32-bit UEFI OS to recommend for testing.
> 
> 32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
> times, with some help from a Microsoft developer, but we couldn't solve
> it), so I can't recommend Windows as an alternative.
> 
> Perhaps you can use
> 
> https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html
> 
> as a 32-bit guest OS, I never tried.
> 
> (9) Anyway, once you have an installer ISO, set the "ISO" environment
> variable to the ISO image's full pathname, and then run QEMU like this:
> 
> # Settings for Ia32 only:
> 
> ISO=...
> DISK=.../disk-ia32.img
> FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-32.fd
> QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
> DEBUG=debug-32.log
> 
> # Settings for Ia32X64 only:
> 
> ISO=...
> DISK=.../disk-ia32x64.img
> FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-3264.fd
> QEMU_COMMAND=qemu-system-x86_64
> DEBUG=debug-3264.log
> 
> # Common commands for both target arches:
> 
> # create variable store from varstore template
> # if the former doesn't exist yet
> if ! [ -e "$VARS" ]; then
>   cp -- "$TEMPLATE" "$VARS"
> fi
> 
> $QEMU_COMMAND \
>   -machine q35,smm=on,accel=kvm \
>   -m 4096 \
>   -smp sockets=1,cores=2,threads=2 \
>   -global driver=cfi.pflash01,property=secure,value=on \
>   -drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
>   -drive if=pflash,format=raw,unit=1,file=${VARS} \
>   \
>   -chardev file,id=debugfile,path=$DEBUG \
>   -device isa-debugcon,iobase=0x402,chardev=debugfile \
>   \
>   -chardev stdio,id=char0,signal=off,mux=on \
>   -mon chardev=char0,mode=readline,default \
>   -serial chardev:char0 \
>   \
>   -drive id=iso,if=none,format=raw,readonly,file=$ISO \
>   -drive id=disk,if=none,format=qcow2,file=$DISK \
>   \
>   -device virtio-scsi-pci,id=scsi0 \
>   -device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
>   -device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
>   \
>   -device VGA
> 
> This will capture the OVMF debug output in the $DEBUG file. Also, the
> terminal where you run the command can be switched between the guest's
> serial console and the QEMU monitor with [Ctrl-A C].
> 
> Thanks
> Laszlo
> 
>>
>> Thank you
>> Yao Jiewen
>>
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
>> Sent: Tuesday, November 8, 2016 9:22 AM
>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>>
>> On 11/04/16 10:30, Jiewen Yao wrote:
>>> ==== below is V2 description ====
>>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>>> 4) PiSmmCpu: Add protection detail in commit message.
>>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>>
>>> ==== below is V1 description ====
>>> This series patch enables SMM page level protection.
>>> Features are:
>>> 1) PiSmmCore reports SMM PE image code/data information
>>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>>> and set XD for data page and RO for code page.
>>> 3) PiSmmCpu enables Static Paging for X64 according to
>>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>>> is used as long as it is supported.
>>> 4) PiSmmCpu sets importance data structure to be read only,
>>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>>
>>> tested platform:
>>> 1) Intel internal platform (X64).
>>> 2) EDKII Quark IA32
>>> 3) EDKII Vlv2  X64
>>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>>
>>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>
>>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>
>>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>
>>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com>>>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>
>>
>> I have new test results. Let's start with the table again:
>>
>> Legend:
>>
>> - "untested" means the test was not executed because the same test
>>   failed or proved unreliable in a less demanding configuration already,
>>
>> - "n/a" means a setting or test case was impossible,
>>
>> - "fail" and "unreliable" (lower case) are outside the scope of this
>>   series; they either capture the pre-series status, or are expected
>>   even with the series applied due to the pre-series status,
>>
>> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>>   series.
>>
>> In all cases, 36 bits were used as address width in the CPU HOB (--> up
>> to 64GB guest-phys address space).
>>
>>    series  OVMF                                                              VCPU     boot       S3 resume
>>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
>> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>>  1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
>>  2 no      Ia32     255                             n/a                      52x2x2   pass       untested
>>  3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
>>  4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
>>  5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
>>  6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
>>  7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
>>  8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
>>  9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
>> 10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
>> 11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
>> 12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
>> 13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
>> 14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
>> 15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
>> 16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
>> 17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
>> 18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested
>>
>> * Case 8: this test case failed with v2 as well, but this time with
>>   different symptoms:
>>
>>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitExitBootServicesCallback() done!
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>>
>>   I didn't try to narrow this down.
>>
>> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>>   and good. The good news is for Jiewen: this patch series does not
>>   cause the unreliability, it "only" amplifies it severely. The bad news
>>   is correspondingly for everyone else: S3 resume is actually unreliable
>>   even in case 4, that is, without this series applied, it's just the
>>   failure rate is much-much lower.
>>
>>   Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>>   21 tries. (I stopped testing at the 8th failure.)
>>
>>   Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>>   exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>>   #12 that failed; I continued testing and aborted the test after the
>>   55th try.)
>>
>>   So, while the series hugely amplifies the failure rate, the failure
>>   does exist without the series. Which is why I modified the case 4
>>   results in the table, and also lower-cased the word "unreliable" in
>>   case 13.
>>
>>   Below I will return to this problem separately; let's go over the rest
>>   of the table first.
>>
>> * Case 17: I guess this is not a real failure, I'm just including it for
>>   completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>>   additional SMRAM demand (see the commit message on patch V2 4/6). This
>>   case fails with
>>
>>> SmmLockBox Command - 4
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>>> SmmLockBox SmmLockBoxHandler Exit
>>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>>
>>   which is an SMRAM allocation failure. If I lower the VCPU count to
>>   50x2x2, then the guest boots fine.
>>
>> ----*----
>>
>> Before I get to the S3 resume problem (which, again, reproduces without
>> this series, although much less frequently), I'd like to comment on the
>> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
>> function, on the return value of SmmBlockingStartupThisAp(). This change
>> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>>
>>> !mSmmMpSyncData->CpuData[1].Present
>>> !mSmmMpSyncData->CpuData[2].Present
>>> !mSmmMpSyncData->CpuData[3].Present
>>> ...
>>
>> messages in the OVMF boot log, interspersed with
>>
>>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>>
>> style messages. (That is, one error message for each AP, per
>> ConvertPageEntryAttribute() message.)
>>
>> Is this okay / intentional? The number of these messages can go up to
>> several thousands and that sort of drowns out everything else in the
>> log.
>>
>> It's also not easy to mask the message, because it's logged on the
>> DEBUG_ERROR level.
>>
>> ----*----
>>
>> * Okay, so the S3 problem. Last time I suspected that the failure point
>>   (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>>   9A1D0, according to the OVMF log). In order to test this idea, I
>>   exercised this series with S3 against a Windows 8.1 guest (--> case 13
>>   again). The failure reproduced on the second S3 resume, with identical
>>   RIP, despite the Windows wakeup vector being located elsewhere (at
>>   0x1000).
>>
>>   Quoting the OVMF log leading up to the resume:
>>
>>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>>> Install PPI: [PeiPostScriptTablePpi]
>>> Install PPI: [EfiEndOfPeiSignalPpi]
>>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>>> Transfer to 16bit OS waking vector - 1000
>>
>>   QEMU log (same as before):
>>
>>> KVM internal error. Suberror: 1
>>> KVM internal error. Suberror: 1
>>> emulation failure
>>> emulation failure
>>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT=     000000007f294000 00000047
>>> IDT=     000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT=     000000007f294000 00000047
>>> IDT=     000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>
>>   So, we can exclude the suspicion that the problem is guest OS
>>   dependent.
>>
>> * Then I looked for the base address of the page containing the
>>   RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>>   some firmware component might have allocated that area actually. Here
>>   we go:
>>
>>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>>> AP Loop Mode is 1
>>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>>
>>   That is, the failure hits (when it hits -- not always) in the area
>>   where the CpuMpPei driver *borrows* memory for the startup vector of
>>   the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>>   overloaded word here; the "wakeup buffer" has nothing to do with S3
>>   resume, it just serves for booting the APs temporarily in PEI, for
>>   implementing the MP service PPI.)
>>
>>   When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>>   the original contents of this area. This occurs just before
>>   transfering control to the guest OS wakeup vector: see the
>>   "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>>   quoted from the OVMF log.
>>
>>   I documented (parts of) this logic in OVMF commit
>>
>>     https://github.com/tianocore/edk2/commit/e3e3090a959a0
>>
>>   (see the code comments as well).
>>
>> * At that time, I thought to have identified a memory management bug in
>>   CpuMpPei; see the following discussion and bug report for details:
>>
>>     https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>>     https://bugzilla.tianocore.org/show_bug.cgi?id=67
>>
>>   However, with the extraction / introduction of MpInitLib, this issue
>>   has been fixed: GetWakeupBuffer() now calls
>>   CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>>   no longer; we shouldn't be looking there for the root cause.
>>
>> * Either way, I don't understand why anything would want to execute code
>>   in the one page that happens to host the MP services PPI startup
>>   buffer for APs during PEI.
>>
>>   Not understanding the "why", I looked at the "what", and resorted to
>>   tracing KVM. Because the problem readily reproduces with this series
>>   applied (case 13), it wasn't hard to start the tracing while the guest
>>   was suspended, and capture just the actions that led from the
>>   KVM-level wakeup to the failure.
>>
>>   The QEMU state dumps are visible above in the email. I've also
>>   uploaded the compressed OVMF log and the textual KVM trace here:
>>
>>     http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>>
>>   I sincerely hope that Paolo will have a field day with the KVM trace
>>   :) I managed to identify the following curiosities (remember this is
>>   all on the S3 resume path):
>>
>>   * First, the VCPUs (there are four of them) enter and leave SMM in a
>>     really funky pattern:
>>
>>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>>       ------  ------  ------  ------
>>               enter
>>                |
>>               leave
>>
>>                       enter
>>                         |
>>                       leave
>>
>>                               enter
>>                                 |
>>                               leave
>>
>>       enter
>>         |
>>       leave
>>
>>               enter           enter
>>        enter    |     enter     |
>>          |      |       |       |
>>        leave    |       |       |
>>                 |       |       |
>>        enter    |       |       |
>>          |      |       |       |
>>        leave  leave   leave   leave
>>
>>     That is, first we have each VCPU enter and leave SMM in complete
>>     isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>>     followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>>     temporarily (it comes back in later), while the other three remain
>>     in SMM. Finally all four of them leave SMM together.
>>
>>     After which the problem occurs.
>>
>>   * Second, the instruction that causes things to blow up is <0f aa>,
>>     i.e., RSM. I have absolutely no clue why RSM is executed:
>>
>>     (a) in the area that used to host the AP startup routine for the MP
>>     services PPI -- note that we also have "Transfer to 16bit OS waking
>>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>>     area completeley! --,
>>
>>     (b) and why *after* all four VCPUs have just left SMM, together.
>>
>>   * The RSM instruction is handled successfully elsewhere, for example
>>     when all four VCPUs leave SMM, at the bottom of the diagram above:
>>
>>> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
>>> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
>>> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
>>> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
>>
>>   * The guest-phys address 7ff7f000 that we see just before the error:
>>
>>> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>>> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>>> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
>>> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>>> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
>>> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)
>>
>>     can be found higher up in the trace; namely, it is written to CR3
>>     several times. It's the root of the page tables.
>>
>>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>>
>> * I also tried the "info tlb" monitor command, via "virsh
>>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>>   crash.
>>
>>   I cannot provide results: QEMU appeared to return a message that would
>>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>>
>>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>>   with what Paolo said about "Code=?? ?? ??...":
>>
>>     The question marks usually mean that the page tables do not map a
>>     page at that address.
>>
>>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>>   even begin walking the page tables.
>>
>> Thanks
>> Laszlo
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10 10:41       ` Yao, Jiewen
  2016-11-10 12:01         ` Laszlo Ersek
@ 2016-11-10 12:27         ` Paolo Bonzini
  1 sibling, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-10 12:27 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
	Zeng, Star



On 10/11/2016 11:41, Yao, Jiewen wrote:
> I also did not quite understand below log.
> 
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
>   CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
>   CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
>   CPU #3: pc=0x000000007ffd17ca thread_id=7838
> 
> As I recall, writing to B2 only cause BSP get SMI on OVMF. AP does not enter SMM mode.

It's not BSP that enters SMM, it's the currently executing processor.

So this means that CPU #3 has written to B2.

Thanks,

Paolo


> So why #3 can enter SMM mode? Is that expected behavior? Or unexpected behavior?
> If this is expected, how this happened? Does OS send SendSmiIpiAllExcludingSelf, after ExitBootServices()?
> 
> I will see if I can finish QEMU/KVM installation tomorrow.
> 
> If you have some idea on why and how #3 enter SMM, please let us know.
>
> 
> Thank you
> Yao Jiewen
> 
> 
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, November 10, 2016 4:46 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> 
> On 11/09/16 07:25, Yao, Jiewen wrote:
>> Hi Laszlo
>> I will fix DEBUG message issue in V3 patch.
>>
>> Below is rest issues:
>>
>>
>> l  Case 13: S3 fails randomly.
>> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>>
>>
>> 1)      We believe the dead CPU is AP. Not BSP.
>> The reason is that:
>>
>> 1.1)   The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>>
>> 1.2)   The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>>
>> 1.3)   The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>>
>>
>> 2)      Based upon the 1), we reviewed S3 resume AP flow.
>> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
>> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
>> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>>
>>
>> 3)      The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>>
>>
>> 4)      I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>>
>>
>> 5)      The fix, I think, should be below:
>> We should always put AP to protected mode, so that no paging is needed.
>> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>>
>>
>> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>>
>> There is no need to do more investigation. Thanks for your great help on that. :)
> 
> Thank you for your help!
> 
> I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is
> 
>     BSP exits SMM and closes SMRAM on the S3 resume path before
>     meeting with AP(s)
> 
> I hope the title is mostly right. I didn't add any other details (I
> haven't gone through the thread in detail yet, and without that I can't
> even write up a semi-reasonable report myself). Instead, I referenced
> this message of yours in the report, and I also linked Paolo's analysis
> from elsewhere in the thread. I hope this will do for the report.
> 
> (Also, thank you Paolo, from the amazing analysis -- I haven't digested
> it yet, but I can already tell it's amazing! :))
> 
>> l  Case 17 - I do not think it is a real issue, because SMM is out of resource.
>>
>>
>> l  Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
>> SPIN_LOCK *
>> EFIAPI
>> InitializeSpinLock (
>>   OUT      SPIN_LOCK                 *SpinLock
>>   )
>> {
>>   ASSERT (SpinLock != NULL);
>>
>>   _ReadWriteBarrier();
>>   *SpinLock = SPIN_LOCK_RELEASED;
>>   _ReadWriteBarrier();
>>
>>   return SpinLock;
>> }
>>
>> If you can have a quick check on below, that would be great.
>>
>> 1)      Which processor triggers this ASSERT? BSP or AP.
>>
>> 2)      Which module triggers this ASSERT? Which module contains current RIP value?
> 
> First, one additional piece of info I have learned is that the issue
> does not always present itself. Sometimes the boot just works fine,
> other times the assert fires.
> 
> Using the QEMU monitor, I managed to get the following information with
> the "info cpus" command:
> 
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
>   CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
>   CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
>   CPU #3: pc=0x000000007ffd17ca thread_id=7838
> 
> VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
> point into SMRAM again.
> 
> In the OVMF log, I see
> 
> Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
> PiSmmCpuDxeSmm.efi
> 
> So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
> entry point, 0x8577, 0x253 bytes less).
> 
> Running
> 
>   objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug
> 
> first I see confirmation that
> 
>   start address 0x00000253
> 
> and then
> 
> 000087bd <CpuDeadLoop>:
> VOID
> EFIAPI
> CpuDeadLoop (
>   VOID
>   )
> {
>     87bd:       55                      push   %ebp
>     87be:       89 e5                   mov    %esp,%ebp
>     87c0:       83 ec 10                sub    $0x10,%esp
>   volatile UINTN  Index;
> 
>   for (Index = 0; Index == 0;);
>     87c3:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
>     87ca:       8b 45 fc                mov    -0x4(%ebp),%eax  <-- HERE
>     87cd:       85 c0                   test   %eax,%eax
>     87cf:       74 f9                   je     87ca <CpuDeadLoop+0xd>
> }
>     87d1:       c9                      leave
>     87d2:       c3                      ret
> 
> This seems consistent with an assertion failure.
> 
> I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
> SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
> like a possible caller:
> 
>       //
>       // The BUSY lock is initialized to Released state. This needs to
>       // be done early enough to be ready for BSP's SmmStartupThisAp()
>       // call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
>       // called immediately after AP's present flag is detected.
>       //
>       InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
> 
> Just a guess, of course.
> 
>> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
>> If you can share a step by step to me, that would be great.
> 
> (1) Grab a host computer with a CPU that supports VMX and EPT.
> 
> (2) Download and install Fedora 24 (for example):
> 
> https://getfedora.org/en/workstation/download/
> http://docs.fedoraproject.org/install-guide
> 
> (3) Install the "qemu-system-x86" package with DNF
> 
> dnf install qemu-system-x86
> 
> (4) clone edk2 with git
> 
> (5) embed OpenSSL optionally (for secure boot); see
> "CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"
> 
> (6) build OVMF:
> 
> source edksetup.sh
> make -C "$EDK_TOOLS_PATH"
> 
> # Ia32
> build \
>   -a IA32 \
>   -p OvmfPkg/OvmfPkgIa32.dsc \
>   -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
>   -t GCC5 -b DEBUG
> 
> # Ia32X64
> build \
>   -a IA32 -a X64 \
>   -p OvmfPkg/OvmfPkgIa32X64.dsc \
>   -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
>   -t GCC5 -b DEBUG
> 
> (7) Create disk images:
> 
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
>   -o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G
> 
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
>   -o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G
> 
> (8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
> that you downloaded already (the ISO image).
> 
> For 32-bit guest OS, this one used to work:
> 
> https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/
> 
> minimally the 20141209 release. Hm... actually, I think the maintainer
> of that image has discontinued the downloadable files :(
> 
> So, I don't know what 32-bit UEFI OS to recommend for testing.
> 
> 32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
> times, with some help from a Microsoft developer, but we couldn't solve
> it), so I can't recommend Windows as an alternative.
> 
> Perhaps you can use
> 
> https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html
> 
> as a 32-bit guest OS, I never tried.
> 
> (9) Anyway, once you have an installer ISO, set the "ISO" environment
> variable to the ISO image's full pathname, and then run QEMU like this:
> 
> # Settings for Ia32 only:
> 
> ISO=...
> DISK=.../disk-ia32.img
> FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-32.fd
> QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
> DEBUG=debug-32.log
> 
> # Settings for Ia32X64 only:
> 
> ISO=...
> DISK=.../disk-ia32x64.img
> FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-3264.fd
> QEMU_COMMAND=qemu-system-x86_64
> DEBUG=debug-3264.log
> 
> # Common commands for both target arches:
> 
> # create variable store from varstore template
> # if the former doesn't exist yet
> if ! [ -e "$VARS" ]; then
>   cp -- "$TEMPLATE" "$VARS"
> fi
> 
> $QEMU_COMMAND \
>   -machine q35,smm=on,accel=kvm \
>   -m 4096 \
>   -smp sockets=1,cores=2,threads=2 \
>   -global driver=cfi.pflash01,property=secure,value=on \
>   -drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
>   -drive if=pflash,format=raw,unit=1,file=${VARS} \
>   \
>   -chardev file,id=debugfile,path=$DEBUG \
>   -device isa-debugcon,iobase=0x402,chardev=debugfile \
>   \
>   -chardev stdio,id=char0,signal=off,mux=on \
>   -mon chardev=char0,mode=readline,default \
>   -serial chardev:char0 \
>   \
>   -drive id=iso,if=none,format=raw,readonly,file=$ISO \
>   -drive id=disk,if=none,format=qcow2,file=$DISK \
>   \
>   -device virtio-scsi-pci,id=scsi0 \
>   -device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
>   -device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
>   \
>   -device VGA
> 
> This will capture the OVMF debug output in the $DEBUG file. Also, the
> terminal where you run the command can be switched between the guest's
> serial console and the QEMU monitor with [Ctrl-A C].
> 
> Thanks
> Laszlo
> 
>>
>> Thank you
>> Yao Jiewen
>>
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
>> Sent: Tuesday, November 8, 2016 9:22 AM
>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>>
>> On 11/04/16 10:30, Jiewen Yao wrote:
>>> ==== below is V2 description ====
>>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>>> 4) PiSmmCpu: Add protection detail in commit message.
>>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>>
>>> ==== below is V1 description ====
>>> This series patch enables SMM page level protection.
>>> Features are:
>>> 1) PiSmmCore reports SMM PE image code/data information
>>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>>> and set XD for data page and RO for code page.
>>> 3) PiSmmCpu enables Static Paging for X64 according to
>>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>>> is used as long as it is supported.
>>> 4) PiSmmCpu sets importance data structure to be read only,
>>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>>
>>> tested platform:
>>> 1) Intel internal platform (X64).
>>> 2) EDKII Quark IA32
>>> 3) EDKII Vlv2  X64
>>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>>
>>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>
>>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>
>>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>
>>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com>>>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>
>>
>> I have new test results. Let's start with the table again:
>>
>> Legend:
>>
>> - "untested" means the test was not executed because the same test
>>   failed or proved unreliable in a less demanding configuration already,
>>
>> - "n/a" means a setting or test case was impossible,
>>
>> - "fail" and "unreliable" (lower case) are outside the scope of this
>>   series; they either capture the pre-series status, or are expected
>>   even with the series applied due to the pre-series status,
>>
>> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>>   series.
>>
>> In all cases, 36 bits were used as address width in the CPU HOB (--> up
>> to 64GB guest-phys address space).
>>
>>    series  OVMF                                                              VCPU     boot       S3 resume
>>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
>> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>>  1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
>>  2 no      Ia32     255                             n/a                      52x2x2   pass       untested
>>  3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
>>  4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
>>  5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
>>  6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
>>  7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
>>  8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
>>  9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
>> 10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
>> 11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
>> 12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
>> 13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
>> 14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
>> 15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
>> 16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
>> 17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
>> 18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested
>>
>> * Case 8: this test case failed with v2 as well, but this time with
>>   different symptoms:
>>
>>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitExitBootServicesCallback() done!
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>>
>>   I didn't try to narrow this down.
>>
>> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>>   and good. The good news is for Jiewen: this patch series does not
>>   cause the unreliability, it "only" amplifies it severely. The bad news
>>   is correspondingly for everyone else: S3 resume is actually unreliable
>>   even in case 4, that is, without this series applied, it's just the
>>   failure rate is much-much lower.
>>
>>   Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>>   21 tries. (I stopped testing at the 8th failure.)
>>
>>   Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>>   exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>>   #12 that failed; I continued testing and aborted the test after the
>>   55th try.)
>>
>>   So, while the series hugely amplifies the failure rate, the failure
>>   does exist without the series. Which is why I modified the case 4
>>   results in the table, and also lower-cased the word "unreliable" in
>>   case 13.
>>
>>   Below I will return to this problem separately; let's go over the rest
>>   of the table first.
>>
>> * Case 17: I guess this is not a real failure, I'm just including it for
>>   completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>>   additional SMRAM demand (see the commit message on patch V2 4/6). This
>>   case fails with
>>
>>> SmmLockBox Command - 4
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>>> SmmLockBox SmmLockBoxHandler Exit
>>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>>
>>   which is an SMRAM allocation failure. If I lower the VCPU count to
>>   50x2x2, then the guest boots fine.
>>
>> ----*----
>>
>> Before I get to the S3 resume problem (which, again, reproduces without
>> this series, although much less frequently), I'd like to comment on the
>> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
>> function, on the return value of SmmBlockingStartupThisAp(). This change
>> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>>
>>> !mSmmMpSyncData->CpuData[1].Present
>>> !mSmmMpSyncData->CpuData[2].Present
>>> !mSmmMpSyncData->CpuData[3].Present
>>> ...
>>
>> messages in the OVMF boot log, interspersed with
>>
>>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>>
>> style messages. (That is, one error message for each AP, per
>> ConvertPageEntryAttribute() message.)
>>
>> Is this okay / intentional? The number of these messages can go up to
>> several thousands and that sort of drowns out everything else in the
>> log.
>>
>> It's also not easy to mask the message, because it's logged on the
>> DEBUG_ERROR level.
>>
>> ----*----
>>
>> * Okay, so the S3 problem. Last time I suspected that the failure point
>>   (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>>   9A1D0, according to the OVMF log). In order to test this idea, I
>>   exercised this series with S3 against a Windows 8.1 guest (--> case 13
>>   again). The failure reproduced on the second S3 resume, with identical
>>   RIP, despite the Windows wakeup vector being located elsewhere (at
>>   0x1000).
>>
>>   Quoting the OVMF log leading up to the resume:
>>
>>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>>> Install PPI: [PeiPostScriptTablePpi]
>>> Install PPI: [EfiEndOfPeiSignalPpi]
>>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>>> Transfer to 16bit OS waking vector - 1000
>>
>>   QEMU log (same as before):
>>
>>> KVM internal error. Suberror: 1
>>> KVM internal error. Suberror: 1
>>> emulation failure
>>> emulation failure
>>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT=     000000007f294000 00000047
>>> IDT=     000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT=     000000007f294000 00000047
>>> IDT=     000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>
>>   So, we can exclude the suspicion that the problem is guest OS
>>   dependent.
>>
>> * Then I looked for the base address of the page containing the
>>   RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>>   some firmware component might have allocated that area actually. Here
>>   we go:
>>
>>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>>> AP Loop Mode is 1
>>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>>
>>   That is, the failure hits (when it hits -- not always) in the area
>>   where the CpuMpPei driver *borrows* memory for the startup vector of
>>   the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>>   overloaded word here; the "wakeup buffer" has nothing to do with S3
>>   resume, it just serves for booting the APs temporarily in PEI, for
>>   implementing the MP service PPI.)
>>
>>   When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>>   the original contents of this area. This occurs just before
>>   transfering control to the guest OS wakeup vector: see the
>>   "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>>   quoted from the OVMF log.
>>
>>   I documented (parts of) this logic in OVMF commit
>>
>>     https://github.com/tianocore/edk2/commit/e3e3090a959a0
>>
>>   (see the code comments as well).
>>
>> * At that time, I thought to have identified a memory management bug in
>>   CpuMpPei; see the following discussion and bug report for details:
>>
>>     https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>>     https://bugzilla.tianocore.org/show_bug.cgi?id=67
>>
>>   However, with the extraction / introduction of MpInitLib, this issue
>>   has been fixed: GetWakeupBuffer() now calls
>>   CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>>   no longer; we shouldn't be looking there for the root cause.
>>
>> * Either way, I don't understand why anything would want to execute code
>>   in the one page that happens to host the MP services PPI startup
>>   buffer for APs during PEI.
>>
>>   Not understanding the "why", I looked at the "what", and resorted to
>>   tracing KVM. Because the problem readily reproduces with this series
>>   applied (case 13), it wasn't hard to start the tracing while the guest
>>   was suspended, and capture just the actions that led from the
>>   KVM-level wakeup to the failure.
>>
>>   The QEMU state dumps are visible above in the email. I've also
>>   uploaded the compressed OVMF log and the textual KVM trace here:
>>
>>     http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>>
>>   I sincerely hope that Paolo will have a field day with the KVM trace
>>   :) I managed to identify the following curiosities (remember this is
>>   all on the S3 resume path):
>>
>>   * First, the VCPUs (there are four of them) enter and leave SMM in a
>>     really funky pattern:
>>
>>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>>       ------  ------  ------  ------
>>               enter
>>                |
>>               leave
>>
>>                       enter
>>                         |
>>                       leave
>>
>>                               enter
>>                                 |
>>                               leave
>>
>>       enter
>>         |
>>       leave
>>
>>               enter           enter
>>        enter    |     enter     |
>>          |      |       |       |
>>        leave    |       |       |
>>                 |       |       |
>>        enter    |       |       |
>>          |      |       |       |
>>        leave  leave   leave   leave
>>
>>     That is, first we have each VCPU enter and leave SMM in complete
>>     isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>>     followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>>     temporarily (it comes back in later), while the other three remain
>>     in SMM. Finally all four of them leave SMM together.
>>
>>     After which the problem occurs.
>>
>>   * Second, the instruction that causes things to blow up is <0f aa>,
>>     i.e., RSM. I have absolutely no clue why RSM is executed:
>>
>>     (a) in the area that used to host the AP startup routine for the MP
>>     services PPI -- note that we also have "Transfer to 16bit OS waking
>>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>>     area completeley! --,
>>
>>     (b) and why *after* all four VCPUs have just left SMM, together.
>>
>>   * The RSM instruction is handled successfully elsewhere, for example
>>     when all four VCPUs leave SMM, at the bottom of the diagram above:
>>
>>> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
>>> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
>>> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
>>> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
>>
>>   * The guest-phys address 7ff7f000 that we see just before the error:
>>
>>> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>>> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>>> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
>>> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>>> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
>>> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)
>>
>>     can be found higher up in the trace; namely, it is written to CR3
>>     several times. It's the root of the page tables.
>>
>>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>>
>> * I also tried the "info tlb" monitor command, via "virsh
>>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>>   crash.
>>
>>   I cannot provide results: QEMU appeared to return a message that would
>>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>>
>>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>>   with what Paolo said about "Code=?? ?? ??...":
>>
>>     The question marks usually mean that the page tables do not map a
>>     page at that address.
>>
>>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>>   even begin walking the page tables.
>>
>> Thanks
>> Laszlo
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10 12:01         ` Laszlo Ersek
@ 2016-11-10 14:48           ` Yao, Jiewen
  2016-11-10 14:53             ` Paolo Bonzini
  2016-11-10 16:25             ` Laszlo Ersek
  0 siblings, 2 replies; 38+ messages in thread
From: Yao, Jiewen @ 2016-11-10 14:48 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Nice shot!!!!

After I reviewed SMI entry code again, I know the root-cause.

I made a huge mistake there.

We always save CpuIndex on the top of stack.
However, during SMI entry, below code *conditionally* push EDX.

; enable NXE if supported
    DB      0b0h                        ; mov al, imm8
ASM_PFX(mXdSupported):     DB      0
    cmp     al, 0
    jz      @SkipXd
;
; Check XD disable bit
;
    mov     ecx, MSR_IA32_MISC_ENABLE
    rdmsr
    push    edx                        ; save MSR_IA32_MISC_ENABLE[63-32]

then later, below code *unconditionally* set CpuIndex above pushed EDX.
    mov     ebx, [esp + 4]                  ; CPU Index

I cannot reproduce it before, because all my real hardware supports XD. My Windows QEMU also supports XD (to my surprise.)

Now I did reproduce it, after I hardcode XD to be disabled.


Laszlo, your analysis will save me one day to install the Linux QEMU. :)

Thank you
Yao Jiewen


From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Thursday, November 10, 2016 8:02 PM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.

On 11/10/16 11:41, Yao, Jiewen wrote:
> Thanks to report case 3 issue on bugzillar.
>
> Let's focus on Case 8.
> It seems another random failure issue.
>
> I did more test.
>
> 1)      I tested some other our internal real platform for UEFI32 OS boot. I cannot reproduce the ASSERT.
>
> 2)      I wrote a small test app to call ExitBootServices and send SMI. I run it on current my windows QEMU but I still cannot reproduce the ASSERT.
>
> It seem your env is the only way to repo the issue. I am trying to follow your step by step to install OS on QEMU/KVM. I haven't finish all thing yet, because of some network proxy issue. :(

Right, when you run a guest on TCG (QEMU's emulator) vs. on KVM (the virtualizer / accelerator in the host Linux kernel), you get very-very different timing behavior and interleaving of actions. For one, with KVM, the VCPUs really execute in parallel -- they are represented by host OS threads, and the host OS schedules them to separate "physical logical CPUs".

>
> Your information and analysis is great. It gives us some clue.
>
> I guess the same thing as you and checked: InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>
> This address is initialized in InitializeMpSyncData(), with gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus which is got from MpServices->GetNumberOfProcessors().
> I do not know why this address is zero.
>
> I also did not quite understand below log.
>
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
>   CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
>   CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
>   CPU #3: pc=0x000000007ffd17ca thread_id=7838
>
> As I recall, writing to B2 only cause BSP get SMI on OVMF. AP does not enter SMM mode.
> So why #3 can enter SMM mode? Is that expected behavior? Or unexpected behavior?
> If this is expected, how this happened? Does OS send SendSmiIpiAllExcludingSelf, after ExitBootServices()?

My theory is that the OS is calling a runtime variable service during boot. That is supposed to pull in all APs into SMM, one way or another.

Also, during boot, the OS may call the runtime variable services genuinely on VCPU#3.

>
> I will see if I can finish QEMU/KVM installation tomorrow.

Thanks! Once you can test with KVM on your side, that should speed up debugging considerably, I think!

> If you have some idea on why and how #3 enter SMM, please let us know.

Well, I captured a KVM trace for this as well (fresh boot, up to the failure). Grepping the trace for entering / leaving SMM, we see:

(1) the initial SMBASE relocation:

             CPU-6948  [004] 11545.040294: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
             CPU-6948  [004] 11545.040335: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb5000
             CPU-6949  [000] 11545.040363: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
             CPU-6949  [000] 11545.040389: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb7000
             CPU-6950  [002] 11545.040417: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
             CPU-6950  [002] 11545.040443: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb9000
             CPU-6947  [007] 11545.040453: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
             CPU-6947  [007] 11545.040474: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb3000

(2) a long stretch of VCPU#0 entering and leaving SMM, while the firmware uses variable services and such:

             CPU-6947  [007] 11545.053169: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb3000
             CPU-6947  [007] 11545.061272: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb3000
             ...

(3) a write to ioport 0xB2 from VCPU#3, then VCPU#3 entering SMM, then hitting the assert very-very soon:

             CPU-6950  [005] 11550.521195: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521196: kvm_exit:             reason IO_INSTRUCTION rip 0xf7c937b6 info b20000 0
             CPU-6950  [005] 11550.521196: kvm_pio:              pio_write at 0xb2 size 1 count 1 val 0x0
             CPU-6950  [005] 11550.521196: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6947  [003] 11550.521196: kvm_inj_virq:         irq 253
             CPU-6950  [005] 11550.521196: kvm_fpu:              unload
             CPU-6947  [003] 11550.521197: kvm_fpu:              load
             CPU-6947  [003] 11550.521197: kvm_entry:            vcpu 0
             CPU-6950  [005] 11550.521200: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb9000
             CPU-6947  [003] 11550.521207: kvm_eoi:              apicid 0 vector 253
             CPU-6950  [005] 11550.521207: kvm_fpu:              load
             CPU-6947  [003] 11550.521207: kvm_pv_eoi:           apicid 0 vector 253
             CPU-6950  [005] 11550.521207: kvm_entry:            vcpu 3
             CPU-6947  [003] 11550.521207: kvm_exit:             reason HLT rip 0xc1844554 info 0 0
             CPU-6950  [005] 11550.521209: kvm_exit:             reason CR_ACCESS rip 0x8045 info 300 0
             CPU-6950  [005] 11550.521209: kvm_cr:               cr_write 0 = 0x33
             CPU-6950  [005] 11550.521212: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521213: kvm_exit:             reason CR_ACCESS rip 0x7ffc107d info 3 0
             CPU-6950  [005] 11550.521213: kvm_cr:               cr_write 3 = 0x7ff9a000
             CPU-6950  [005] 11550.521214: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521214: kvm_exit:             reason CPUID rip 0x7ffc1085 info 0 0
             CPU-6950  [005] 11550.521214: kvm_cpuid:            func 1 rax 6e8 rbx 3040800 rcx 80200001 rdx 1f89fbff
             CPU-6950  [005] 11550.521215: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521215: kvm_exit:             reason CR_ACCESS rip 0x7ffc10c4 info 4 0
             CPU-6950  [005] 11550.521215: kvm_cr:               cr_write 4 = 0x668
             CPU-6950  [005] 11550.521217: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521218: kvm_exit:             reason CR_ACCESS rip 0x7ffc110e info 300 0
             CPU-6950  [005] 11550.521218: kvm_cr:               cr_write 0 = 0x80010033
             CPU-6950  [005] 11550.521220: kvm_entry:            vcpu 3
             CPU-6947  [003] 11550.521220: kvm_fpu:              unload
             CPU-6950  [005] 11550.521222: kvm_exit:             reason EPT_VIOLATION rip 0x7ffcbe46 info 181 0
             CPU-6950  [005] 11550.521223: kvm_page_fault:       address 22004ebc error_code 181
             CPU-6950  [005] 11550.521231: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521236: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521236: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x41  <----------------- "A"
             CPU-6950  [005] 11550.521237: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521237: kvm_fpu:              unload
             CPU-6950  [005] 11550.521253: kvm_fpu:              load
             CPU-6950  [005] 11550.521253: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521254: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521254: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x53  <----------------- "S"
             CPU-6950  [005] 11550.521254: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521254: kvm_fpu:              unload
             CPU-6950  [005] 11550.521257: kvm_fpu:              load
             CPU-6950  [005] 11550.521257: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521258: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521258: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x53  <----------------- "S"
             CPU-6950  [005] 11550.521258: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521258: kvm_fpu:              unload
             CPU-6950  [005] 11550.521260: kvm_fpu:              load
             CPU-6950  [005] 11550.521260: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521261: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521261: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x45  <----------------- "E"
             CPU-6950  [005] 11550.521261: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521262: kvm_fpu:              unload
             CPU-6950  [005] 11550.521264: kvm_fpu:              load
             CPU-6950  [005] 11550.521264: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521264: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521264: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x52  <----------------- "R"
             CPU-6950  [005] 11550.521264: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
             CPU-6950  [005] 11550.521265: kvm_fpu:              unload
             CPU-6950  [005] 11550.521267: kvm_fpu:              load
             CPU-6950  [005] 11550.521267: kvm_entry:            vcpu 3
             CPU-6950  [005] 11550.521267: kvm_exit:             reason IO_INSTRUCTION rip 0x7ffd1e80 info 4020000 0
             CPU-6950  [005] 11550.521267: kvm_pio:              pio_write at 0x402 size 1 count 1 val 0x54  <----------------- "T"
             CPU-6950  [005] 11550.521268: kvm_userspace_exit:   reason KVM_EXIT_IO (2)

This seems to be consistent with the OS calling a variable service on VCPU#3.

Also, as far as I can see, the above trace matches the assembly code in "UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm".

Is perhaps CpuIndex out of bounds?... Hmm, with the following debug patch:

> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> index d0092d2f145a..29f6e783c58f 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> @@ -1143,6 +1143,9 @@ SmiRendezvous (
>        // E.g., with Relaxed AP flow, SmmStartupThisAp() may be called immediately
>        // after AP's present flag is detected.
>        //
> +      if (CpuIndex >= 4) {
> +        DEBUG ((EFI_D_ERROR, "CpuIndex=%u\n", (UINT32)CpuIndex));
> +      }
>        InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>      }
>
>

I get the following debug output (note that my SMP configuration is 1x2x2):

> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitExitBootServicesCallback() done!
> CpuIndex=780161211
> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)

Ehm... what? :)

SmiRendezvous() is EFIAPI, is the calling convention followed in "Ia32/SmiEntry.nasm"?

Thanks,
Laszlo

> Thank you
> Yao Jiewen
>
>
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, November 10, 2016 4:46 AM
> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>
> On 11/09/16 07:25, Yao, Jiewen wrote:
>> Hi Laszlo
>> I will fix DEBUG message issue in V3 patch.
>>
>> Below is rest issues:
>>
>>
>> l  Case 13: S3 fails randomly.
>> A good news: I worked with Jeff Fan to root-cause the S3 resume issue. Here is detail.
>>
>>
>> 1)      We believe the dead CPU is AP. Not BSP.
>> The reason is that:
>>
>> 1.1)   The BSP already transfer control to OS waking vector. The GDT/IDT/CR3 should be set by OS.
>>
>> 1.2)   The current dead CPU still has GDT/IDT point to a BIOS reserved memory. The CS/DS/SS is typical BIOS X64 mode setting.
>>
>> 1.3)   The current dead CPU still has CR3 in SMM. (Which is obvious wrong)
>>
>>
>> 2)      Based upon the 1), we reviewed S3 resume AP flow.
>> Current BSP will wake up AP in SMRAM, for security consideration. At that time, we are using SMM mode CR3. It is OK for BSP because BSP is NOT in SMM mode yet. Even after SMM rebase, we can still use it because SMRR is not set in first SMM rebase.
>> Current BSP just uses its own context to initialize AP. So that AP takes BSP CR3, which is SMM CR3, unfortunately.
>> After BSP initialized APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, because X64 mode halt still need paging.
>>
>>
>> 3)      The error happen, once the AP receives an interrupt (for whatever reason), AP starts executing code. However, that that time the AP might not be in SMM mode. It means SMM CR3 is not available. And then we see this.
>>
>>
>> 4)      I guess we did not see the error, or this is RANDOM issue, because it depends on if AP receives an interrupt before BSP send INIT-SIPI-SIPI.
>>
>>
>> 5)      The fix, I think, should be below:
>> We should always put AP to protected mode, so that no paging is needed.
>> We should put AP in above 1M reserved memory, instead of <1M memory, because <1M memory is restored.
>>
>>
>> Would you please file a bugzillar? I think we need assign CPU owner to fix that critical issue.
>>
>> There is no need to do more investigation. Thanks for your great help on that. :)
>
> Thank you for your help!
>
> I filed <https://bugzilla.tianocore.org/show_bug.cgi?id=216>. The title is
>
>     BSP exits SMM and closes SMRAM on the S3 resume path before
>     meeting with AP(s)
>
> I hope the title is mostly right. I didn't add any other details (I
> haven't gone through the thread in detail yet, and without that I can't
> even write up a semi-reasonable report myself). Instead, I referenced
> this message of yours in the report, and I also linked Paolo's analysis
> from elsewhere in the thread. I hope this will do for the report.
>
> (Also, thank you Paolo, from the amazing analysis -- I haven't digested
> it yet, but I can already tell it's amazing! :))
>
>> l  Case 17 - I do not think it is a real issue, because SMM is out of resource.
>>
>>
>> l  Case 8 - that is a very weird issue. I talk with Jeff again. I do not have a clear clue yet.
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>> Here is code. We do not know why there is some code need InitializeSpinLock after ExitBootServices.
>> SPIN_LOCK *
>> EFIAPI
>> InitializeSpinLock (
>>   OUT      SPIN_LOCK                 *SpinLock
>>   )
>> {
>>   ASSERT (SpinLock != NULL);
>>
>>   _ReadWriteBarrier();
>>   *SpinLock = SPIN_LOCK_RELEASED;
>>   _ReadWriteBarrier();
>>
>>   return SpinLock;
>> }
>>
>> If you can have a quick check on below, that would be great.
>>
>> 1)      Which processor triggers this ASSERT? BSP or AP.
>>
>> 2)      Which module triggers this ASSERT? Which module contains current RIP value?
>
> First, one additional piece of info I have learned is that the issue
> does not always present itself. Sometimes the boot just works fine,
> other times the assert fires.
>
> Using the QEMU monitor, I managed to get the following information with
> the "info cpus" command:
>
> * CPU #0: pc=0x00000000c1844555 (halted) thread_id=7835
>   CPU #1: pc=0x00000000c1844555 (halted) thread_id=7836
>   CPU #2: pc=0x00000000c1844555 (halted) thread_id=7837
>   CPU #3: pc=0x000000007ffd17ca thread_id=7838
>
> VCPU#3 is an AP (the last AP), I think. The instruction pointer seems to
> point into SMRAM again.
>
> In the OVMF log, I see
>
> Loading SMM driver at 0x0007FFC9000 EntryPoint=0x0007FFC9253
> PiSmmCpuDxeSmm.efi
>
> So the offset into PiSmmCpuDxeSmm.efi is 0x87CA (or, relative to the
> entry point, 0x8577, 0x253 bytes less).
>
> Running
>
>   objdump -x -S Build/OvmfIa32/DEBUG_GCC48/IA32/PiSmmCpuDxeSmm.debug
>
> first I see confirmation that
>
>   start address 0x00000253
>
> and then
>
> 000087bd <CpuDeadLoop>:
> VOID
> EFIAPI
> CpuDeadLoop (
>   VOID
>   )
> {
>     87bd:       55                      push   %ebp
>     87be:       89 e5                   mov    %esp,%ebp
>     87c0:       83 ec 10                sub    $0x10,%esp
>   volatile UINTN  Index;
>
>   for (Index = 0; Index == 0;);
>     87c3:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
>     87ca:       8b 45 fc                mov    -0x4(%ebp),%eax  <-- HERE
>     87cd:       85 c0                   test   %eax,%eax
>     87cf:       74 f9                   je     87ca <CpuDeadLoop+0xd>
> }
>     87d1:       c9                      leave
>     87d2:       c3                      ret
>
> This seems consistent with an assertion failure.
>
> I searched UefiCpuPkg/PiSmmCpuDxeSmm/ for InitializeSpinLock(), and the
> SmiRendezvous() function [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c] looks
> like a possible caller:
>
>       //
>       // The BUSY lock is initialized to Released state. This needs to
>       // be done early enough to be ready for BSP's SmmStartupThisAp()
>       // call. E.g., with Relaxed AP flow, SmmStartupThisAp() may be
>       // called immediately after AP's present flag is detected.
>       //
>       InitializeSpinLock (mSmmMpSyncData->CpuData[CpuIndex].Busy);
>
> Just a guess, of course.
>
>> At same time, all my OS test is on real platform. I have not setup OVMF env to run an OS yet.
>> If you can share a step by step to me, that would be great.
>
> (1) Grab a host computer with a CPU that supports VMX and EPT.
>
> (2) Download and install Fedora 24 (for example):
>
> https://getfedora.org/en/workstation/download/
> http://docs.fedoraproject.org/install-guide
>
> (3) Install the "qemu-system-x86" package with DNF
>
> dnf install qemu-system-x86
>
> (4) clone edk2 with git
>
> (5) embed OpenSSL optionally (for secure boot); see
> "CryptoPkg/Library/OpensslLib/Patch-HOWTO.txt"
>
> (6) build OVMF:
>
> source edksetup.sh
> make -C "$EDK_TOOLS_PATH"
>
> # Ia32
> build \
>   -a IA32 \
>   -p OvmfPkg/OvmfPkgIa32.dsc \
>   -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
>   -t GCC5 -b DEBUG
>
> # Ia32X64
> build \
>   -a IA32 -a X64 \
>   -p OvmfPkg/OvmfPkgIa32X64.dsc \
>   -D SMM_REQUIRE -D SECURE_BOOT_ENABLE \
>   -t GCC5 -b DEBUG
>
> (7) Create disk images:
>
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
>   -o preallocation=metadata -o lazy_refcounts=on disk-ia32.img 100G
>
> qemu-img create -f qcow2 -o compat=1.1 -o cluster_size=65536 \
>   -o preallocation=metadata -o lazy_refcounts=on disk-ia32x64.img 100G
>
> (8) For a 64-bit guest OS, you can again use the Fedora 24 Workstation
> that you downloaded already (the ISO image).
>
> For 32-bit guest OS, this one used to work:
>
> https://www.happyassassin.net/fedlet-a-fedora-remix-for-bay-trail-tablets/
>
> minimally the 20141209 release. Hm... actually, I think the maintainer
> of that image has discontinued the downloadable files :(
>
> So, I don't know what 32-bit UEFI OS to recommend for testing.
>
> 32-bit Windows doesn't boot on OVMF (I looked into that earlier, several
> times, with some help from a Microsoft developer, but we couldn't solve
> it), so I can't recommend Windows as an alternative.
>
> Perhaps you can use
>
> https://linuxiumcomau.blogspot.com/2016/10/running-ubuntu-on-intel-bay-trail-and.html
>
> as a 32-bit guest OS, I never tried.
>
> (9) Anyway, once you have an installer ISO, set the "ISO" environment
> variable to the ISO image's full pathname, and then run QEMU like this:
>
> # Settings for Ia32 only:
>
> ISO=...
> DISK=.../disk-ia32.img
> FW=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/OvmfIa32/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-32.fd
> QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
> DEBUG=debug-32.log
>
> # Settings for Ia32X64 only:
>
> ISO=...
> DISK=.../disk-ia32x64.img
> FW=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_CODE.fd
> TEMPLATE=.../Build/Ovmf3264/DEBUG_GCC5/FV/OVMF_VARS.fd
> VARS=vars-3264.fd
> QEMU_COMMAND=qemu-system-x86_64
> DEBUG=debug-3264.log
>
> # Common commands for both target arches:
>
> # create variable store from varstore template
> # if the former doesn't exist yet
> if ! [ -e "$VARS" ]; then
>   cp -- "$TEMPLATE" "$VARS"
> fi
>
> $QEMU_COMMAND \
>   -machine q35,smm=on,accel=kvm \
>   -m 4096 \
>   -smp sockets=1,cores=2,threads=2 \
>   -global driver=cfi.pflash01,property=secure,value=on \
>   -drive if=pflash,format=raw,unit=0,file=${FW},readonly=on \
>   -drive if=pflash,format=raw,unit=1,file=${VARS} \
>   \
>   -chardev file,id=debugfile,path=$DEBUG \
>   -device isa-debugcon,iobase=0x402,chardev=debugfile \
>   \
>   -chardev stdio,id=char0,signal=off,mux=on \
>   -mon chardev=char0,mode=readline,default \
>   -serial chardev:char0 \
>   \
>   -drive id=iso,if=none,format=raw,readonly,file=$ISO \
>   -drive id=disk,if=none,format=qcow2,file=$DISK \
>   \
>   -device virtio-scsi-pci,id=scsi0 \
>   -device scsi-cd,drive=iso,bus=scsi0.0,bootindex=2 \
>   -device scsi-hd,drive=disk,bus=scsi0.0,bootindex=1 \
>   \
>   -device VGA
>
> This will capture the OVMF debug output in the $DEBUG file. Also, the
> terminal where you run the command can be switched between the guest's
> serial console and the QEMU monitor with [Ctrl-A C].
>
> Thanks
> Laszlo
>
>>
>> Thank you
>> Yao Jiewen
>>
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
>> Sent: Tuesday, November 8, 2016 9:22 AM
>> To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>
>> Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org%3cmailto:edk2-devel@ml01.01.org>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com<mailto:pbonzini@redhat.com%3cmailto:pbonzini@redhat.com>>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>
>> Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
>>
>> On 11/04/16 10:30, Jiewen Yao wrote:
>>> ==== below is V2 description ====
>>> 1) PiSmmCpu: resolve OVMF multiple processors boot hang issue.
>>> 2) PiSmmCpu: Add debug info on StartupAp() fails.
>>> 3) PiSmmCpu: Add ASSERT for AllocatePages().
>>> 4) PiSmmCpu: Add protection detail in commit message.
>>> 5) UefiCpuPkg.dsc: Add page table footprint info in commit message.
>>>
>>> ==== below is V1 description ====
>>> This series patch enables SMM page level protection.
>>> Features are:
>>> 1) PiSmmCore reports SMM PE image code/data information
>>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>>> and set XD for data page and RO for code page.
>>> 3) PiSmmCpu enables Static Paging for X64 according to
>>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>>> is used as long as it is supported.
>>> 4) PiSmmCpu sets importance data structure to be read only,
>>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>>
>>> tested platform:
>>> 1) Intel internal platform (X64).
>>> 2) EDKII Quark IA32
>>> 3) EDKII Vlv2  X64
>>> 4) EDKII OVMF IA32 and IA32X64. (with -smp 8)
>>>
>>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com<mailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com%3cmailto:jeff.fan@intel.com>>>>
>>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com<mailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com%3cmailto:feng.tian@intel.com>>>>
>>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com<mailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com%3cmailto:star.zeng@intel.com>>>>
>>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com%3cmailto:michael.d.kinney@intel.com>>>>
>>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com<mailto:lersek@redhat.com%3cmailto:lersek@redhat.com%3cmailto:lersek@redhat.com%3cmailto:lersek@redhat.com>>>>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com<mailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com%3cmailto:jiewen.yao@intel.com>>>>
>>
>> I have new test results. Let's start with the table again:
>>
>> Legend:
>>
>> - "untested" means the test was not executed because the same test
>>   failed or proved unreliable in a less demanding configuration already,
>>
>> - "n/a" means a setting or test case was impossible,
>>
>> - "fail" and "unreliable" (lower case) are outside the scope of this
>>   series; they either capture the pre-series status, or are expected
>>   even with the series applied due to the pre-series status,
>>
>> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>>   series.
>>
>> In all cases, 36 bits were used as address width in the CPU HOB (--> up
>> to 64GB guest-phys address space).
>>
>>    series  OVMF                                                              VCPU     boot       S3 resume
>>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
>> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>>  1 no      Ia32      64                             n/a                      1x2x2    pass       unreliable
>>  2 no      Ia32     255                             n/a                      52x2x2   pass       untested
>>  3 no      Ia32     255                             n/a                      53x2x2   unreliable untested
>>  4 no      Ia32X64   64                             n/a                      1x2x2    pass       unreliable
>>  5 no      Ia32X64  255                             n/a                      52x2x2   pass       untested
>>  6 no      Ia32X64  255                             n/a                      54x2x2   fail       n/a
>>  7 v2      Ia32      64                             FALSE                    1x2x2    pass       untested
>>  8 v2      Ia32      64                             TRUE                     1x2x2    FAIL       untested
>>  9 v2      Ia32     255                             FALSE                    52x2x2   pass       untested
>> 10 v2      Ia32     255                             FALSE                    53x2x2   untested   untested
>> 11 v2      Ia32     255                             TRUE                     52x2x2   untested   untested
>> 12 v2      Ia32     255                             TRUE                     53x2x2   untested   untested
>> 13 v2      Ia32X64   64                             FALSE                    1x2x2    pass       unreliable
>> 14 v2      Ia32X64   64                             TRUE                     1x2x2    pass       untested
>> 15 v2      Ia32X64  255                             FALSE                    52x2x2   pass       untested
>> 16 v2      Ia32X64  255                             FALSE                    54x2x2   untested   untested
>> 17 v2      Ia32X64  255                             TRUE                     52x2x2   FAIL       untested
>> 18 v2      Ia32X64  255                             TRUE                     54x2x2   untested   untested
>>
>> * Case 8: this test case failed with v2 as well, but this time with
>>   different symptoms:
>>
>>> FSOpen: Open '\EFI\fedora\grubia32.efi' Success
>>> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E4037A8
>>> Loading driver at 0x0007DA7F000 EntryPoint=0x0007DA7F400
>>> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E403A90
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> PixelBlueGreenRedReserved8BitPerColor
>>> ConvertPages: Incompatible memory types
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitExitBootServicesCallback() done!
>>> ASSERT MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(73): SpinLock != ((void *) 0)
>>
>>   I didn't try to narrow this down.
>>
>> * Case 13 (the "unreliable S3 resume" case): Here the news are both bad
>>   and good. The good news is for Jiewen: this patch series does not
>>   cause the unreliability, it "only" amplifies it severely. The bad news
>>   is correspondingly for everyone else: S3 resume is actually unreliable
>>   even in case 4, that is, without this series applied, it's just the
>>   failure rate is much-much lower.
>>
>>   Namely, in my new testing, in case 13, S3 resume failed 8 times out of
>>   21 tries. (I stopped testing at the 8th failure.)
>>
>>   Whereas in case 4, S3 resume failed with *identical symptoms* (e.g.,
>>   exact same RIP=000000000009f0fd), 1 time out of 55 tries. (It was try
>>   #12 that failed; I continued testing and aborted the test after the
>>   55th try.)
>>
>>   So, while the series hugely amplifies the failure rate, the failure
>>   does exist without the series. Which is why I modified the case 4
>>   results in the table, and also lower-cased the word "unreliable" in
>>   case 13.
>>
>>   Below I will return to this problem separately; let's go over the rest
>>   of the table first.
>>
>> * Case 17: I guess this is not a real failure, I'm just including it for
>>   completeness, as PcdCpuSmmStaticPageTable==TRUE is known to present
>>   additional SMRAM demand (see the commit message on patch V2 4/6). This
>>   case fails with
>>
>>> SmmLockBox Command - 4
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Enter
>>> SmmLockBoxSmmLib SetLockBoxAttributes - Exit (Success)
>>> SmmLockBox SmmLockBoxHandler Exit
>>> SmmLockBoxDxeLib SetLockBoxAttributes - Exit (Success)
>>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c(892): mGdtForAp != ((void *) 0)
>>
>>   which is an SMRAM allocation failure. If I lower the VCPU count to
>>   50x2x2, then the guest boots fine.
>>
>> ----*----
>>
>> Before I get to the S3 resume problem (which, again, reproduces without
>> this series, although much less frequently), I'd like to comment on the
>> removal of the ASSERT(), from v1 to v2, in the FlushTlbForAll()
>> function, on the return value of SmmBlockingStartupThisAp(). This change
>> allows v2 to proceed past that point; however, I'm seeing a whole lot of
>>
>>> !mSmmMpSyncData->CpuData[1].Present
>>> !mSmmMpSyncData->CpuData[2].Present
>>> !mSmmMpSyncData->CpuData[3].Present
>>> ...
>>
>> messages in the OVMF boot log, interspersed with
>>
>>> ConvertPageEntryAttribute 0x7F92B067->0x7F92B065
>>
>> style messages. (That is, one error message for each AP, per
>> ConvertPageEntryAttribute() message.)
>>
>> Is this okay / intentional? The number of these messages can go up to
>> several thousands and that sort of drowns out everything else in the
>> log.
>>
>> It's also not easy to mask the message, because it's logged on the
>> DEBUG_ERROR level.
>>
>> ----*----
>>
>> * Okay, so the S3 problem. Last time I suspected that the failure point
>>   (RIP=9f0fd) was in the Linux guest's S3 wakeup vector (which starts at
>>   9A1D0, according to the OVMF log). In order to test this idea, I
>>   exercised this series with S3 against a Windows 8.1 guest (--> case 13
>>   again). The failure reproduced on the second S3 resume, with identical
>>   RIP, despite the Windows wakeup vector being located elsewhere (at
>>   0x1000).
>>
>>   Quoting the OVMF log leading up to the resume:
>>
>>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>>> Install PPI: [PeiPostScriptTablePpi]
>>> Install PPI: [EfiEndOfPeiSignalPpi]
>>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>>> Transfer to 16bit OS waking vector - 1000
>>
>>   QEMU log (same as before):
>>
>>> KVM internal error. Suberror: 1
>>> KVM internal error. Suberror: 1
>>> emulation failure
>>> emulation failure
>>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd058
>>> RSI=0000000000000004 RDI=000000007fedd040 RBP=0000000000000000 RSP=000000007e1a7000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT=     000000007f294000 00000047
>>> IDT=     000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdb168 RDX=000000007fedd070
>>> RSI=0000000000000004 RDI=000000007fedd058 RBP=0000000000000000 RSP=000000007e19f000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>>> GDT=     000000007f294000 00000047
>>> IDT=     000000007f294048 00000fff
>>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff7f000 CR4=00000220
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000500
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>
>>   So, we can exclude the suspicion that the problem is guest OS
>>   dependent.
>>
>> * Then I looked for the base address of the page containing the
>>   RIP=9f0fd address, in earlier parts of the OVMF log, on the hunch that
>>   some firmware component might have allocated that area actually. Here
>>   we go:
>>
>>> Loading PEIM at 0x000008552C0 EntryPoint=0x000008554E0 CpuMpPei.efi
>>> AP Loop Mode is 1
>>> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
>>
>>   That is, the failure hits (when it hits -- not always) in the area
>>   where the CpuMpPei driver *borrows* memory for the startup vector of
>>   the APs, for the purposes of the MP service PPI. ("Wakeup" is an
>>   overloaded word here; the "wakeup buffer" has nothing to do with S3
>>   resume, it just serves for booting the APs temporarily in PEI, for
>>   implementing the MP service PPI.)
>>
>>   When exiting the PEI phase (on the S3 resume path), CpuMpPei restores
>>   the original contents of this area. This occurs just before
>>   transfering control to the guest OS wakeup vector: see the
>>   "EfiEndOfPeiSignalPpi" and "CpuMpEndOfPeiCallback" strings just above,
>>   quoted from the OVMF log.
>>
>>   I documented (parts of) this logic in OVMF commit
>>
>>     https://github.com/tianocore/edk2/commit/e3e3090a959a0
>>
>>   (see the code comments as well).
>>
>> * At that time, I thought to have identified a memory management bug in
>>   CpuMpPei; see the following discussion and bug report for details:
>>
>>     https://www.mail-archive.com/edk2-devel@lists.01.org/msg13892.html
>>     https://bugzilla.tianocore.org/show_bug.cgi?id=67
>>
>>   However, with the extraction / introduction of MpInitLib, this issue
>>   has been fixed: GetWakeupBuffer() now calls
>>   CheckOverlapWithAllocatedBuffer(), so that "memory management bug" is
>>   no longer; we shouldn't be looking there for the root cause.
>>
>> * Either way, I don't understand why anything would want to execute code
>>   in the one page that happens to host the MP services PPI startup
>>   buffer for APs during PEI.
>>
>>   Not understanding the "why", I looked at the "what", and resorted to
>>   tracing KVM. Because the problem readily reproduces with this series
>>   applied (case 13), it wasn't hard to start the tracing while the guest
>>   was suspended, and capture just the actions that led from the
>>   KVM-level wakeup to the failure.
>>
>>   The QEMU state dumps are visible above in the email. I've also
>>   uploaded the compressed OVMF log and the textual KVM trace here:
>>
>>     http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/
>>
>>   I sincerely hope that Paolo will have a field day with the KVM trace
>>   :) I managed to identify the following curiosities (remember this is
>>   all on the S3 resume path):
>>
>>   * First, the VCPUs (there are four of them) enter and leave SMM in a
>>     really funky pattern:
>>
>>       vcpu#0  vcpu#1  vcpu#2  vcpu#3
>>       ------  ------  ------  ------
>>               enter
>>                |
>>               leave
>>
>>                       enter
>>                         |
>>                       leave
>>
>>                               enter
>>                                 |
>>                               leave
>>
>>       enter
>>         |
>>       leave
>>
>>               enter           enter
>>        enter    |     enter     |
>>          |      |       |       |
>>        leave    |       |       |
>>                 |       |       |
>>        enter    |       |       |
>>          |      |       |       |
>>        leave  leave   leave   leave
>>
>>     That is, first we have each VCPU enter and leave SMM in complete
>>     isolation (1, 2, 3, 0). Then VCPUs 1 and 3 enter SMM together, soon
>>     followed by VPCUS 0 and 2, also together. VCPU#0 drops out of SMM
>>     temporarily (it comes back in later), while the other three remain
>>     in SMM. Finally all four of them leave SMM together.
>>
>>     After which the problem occurs.
>>
>>   * Second, the instruction that causes things to blow up is <0f aa>,
>>     i.e., RSM. I have absolutely no clue why RSM is executed:
>>
>>     (a) in the area that used to host the AP startup routine for the MP
>>     services PPI -- note that we also have "Transfer to 16bit OS waking
>>     vector" in the log, so CpuMpEndOfPeiCallback() restores the borrowed
>>     area completeley! --,
>>
>>     (b) and why *after* all four VCPUs have just left SMM, together.
>>
>>   * The RSM instruction is handled successfully elsewhere, for example
>>     when all four VCPUs leave SMM, at the bottom of the diagram above:
>>
>>> CPU-24447 [002] 39841.982810: kvm_emulate_insn:     0:7ffbf179: 0f aa
>>> CPU-24446 [000] 39841.982810: kvm_emulate_insn:     0:7ffbd179: 0f aa
>>> CPU-24445 [005] 39841.982810: kvm_emulate_insn:     0:7ffbb179: 0f aa
>>> CPU-24444 [006] 39841.982811: kvm_emulate_insn:     0:7ffb9179: 0f aa
>>
>>   * The guest-phys address 7ff7f000 that we see just before the error:
>>
>>> CPU-24447 [002] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>>> CPU-24446 [000] 39841.982825: kvm_page_fault:       address 7ff7f000 error_code 83
>>> CPU-24447 [002] 39841.982826: kvm_emulate_insn:     0:9f0fd: 0f aa
>>> CPU-24444 [006] 39841.982827: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xffffffff813a954f info 0 800000fc
>>> CPU-24447 [002] 39841.982827: kvm_emulate_insn:     0:9f0fd: 0f aa FAIL
>>> CPU-24447 [002] 39841.982827: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)
>>
>>     can be found higher up in the trace; namely, it is written to CR3
>>     several times. It's the root of the page tables.
>>
>>   * The 7F80_1000..7FFF_FFFF guest-phys addresses are all in SMRAM.
>>
>> * I also tried the "info tlb" monitor command, via "virsh
>>   qemu-monitor-command --hmp", while the guest was auto-paused after the
>>   crash.
>>
>>   I cannot provide results: QEMU appeared to return a message that would
>>   be longer than 16MB after encoding by libvirt, and libvirt rejected
>>   that ("Unable to encode message payload", see VIR_NET_MESSAGE_MAX).
>>
>>   Anyway, the KVM trace, and the QEMU register dump, look consistent
>>   with what Paolo said about "Code=?? ?? ??...":
>>
>>     The question marks usually mean that the page tables do not map a
>>     page at that address.
>>
>>   CR3=000000007ff7f000 points into SMRAM, but we are outside of SMM
>>   (SMM=0). We can't translate *any* guest-virtual address, as we can't
>>   even begin walking the page tables.
>>
>> Thanks
>> Laszlo
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>>
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10 14:48           ` Yao, Jiewen
@ 2016-11-10 14:53             ` Paolo Bonzini
  2016-11-10 16:22               ` Laszlo Ersek
  2016-11-10 16:25             ` Laszlo Ersek
  1 sibling, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-10 14:53 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
	Zeng, Star



On 10/11/2016 15:48, Yao, Jiewen wrote:
> I cannot reproduce it before, because all my real hardware supports XD.
> My Windows QEMU also supports XD (to my surprise.)

QEMU can be configured to support XD or not.  Possibly Laszlo was using
some different default, or testing both cases.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10 14:53             ` Paolo Bonzini
@ 2016-11-10 16:22               ` Laszlo Ersek
  2016-11-10 16:39                 ` Paolo Bonzini
  0 siblings, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-10 16:22 UTC (permalink / raw)
  To: Paolo Bonzini, Yao, Jiewen
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D, Fan, Jeff,
	Zeng, Star

On 11/10/16 15:53, Paolo Bonzini wrote:
> 
> 
> On 10/11/2016 15:48, Yao, Jiewen wrote:
>> I cannot reproduce it before, because all my real hardware supports XD.
>> My Windows QEMU also supports XD (to my surprise.)
> 
> QEMU can be configured to support XD or not.  Possibly Laszlo was using
> some different default, or testing both cases.

When QEMU emulates an Ia32 (32-bit) target, the SMM state save area has
no room for capturing the fact whether NX is set or clear. This is an
issue that dates back to the inception of OVMF's SMM support. The
explanation was given by Paolo, actually :)

  https://www.mail-archive.com/edk2-devel@lists.01.org/msg00970.html

We adjusted the OvmfPkg/README file accordingly:

> * QEMU binary and options specific to 32-bit guests:
>
>   $ qemu-system-i386 -cpu coreduo,-nx \
>
>   or
>
>   $ qemu-system-x86_64 -cpu <MODEL>,-lm,-nx \
>

Note the "-nx" bit.

And, in my recent KVM / QEMU usage instructions for Jiewen:

  https://www.mail-archive.com/edk2-devel@lists.01.org/msg19446.html

I provided the following settings:

> # Settings for Ia32 only:
> [...]
> QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
>
> # Settings for Ia32X64 only:
> [...]
> QEMU_COMMAND=qemu-system-x86_64

I guess the "-nx" bit can be left off with TCG, but AFAIR it is required
for KVM.

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10 14:48           ` Yao, Jiewen
  2016-11-10 14:53             ` Paolo Bonzini
@ 2016-11-10 16:25             ` Laszlo Ersek
  1 sibling, 0 replies; 38+ messages in thread
From: Laszlo Ersek @ 2016-11-10 16:25 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Tian, Feng, edk2-devel@ml01.01.org, Kinney, Michael D,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

On 11/10/16 15:48, Yao, Jiewen wrote:

> Laszlo, your analysis will save me one day to install the Linux QEMU. J

Perfect; I can't wait till you guys adopt QEMU/KVM as a test platform! :)

Cheers
Laszlo



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 0/6] Enable SMM page level protection.
  2016-11-10 16:22               ` Laszlo Ersek
@ 2016-11-10 16:39                 ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2016-11-10 16:39 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Jiewen Yao, Feng Tian, edk2-devel, Michael D Kinney, Jeff Fan,
	Star Zeng

> And, in my recent KVM / QEMU usage instructions for Jiewen:
> 
>   https://www.mail-archive.com/edk2-devel@lists.01.org/msg19446.html
> 
> I provided the following settings:
> 
> > # Settings for Ia32 only:
> > [...]
> > QEMU_COMMAND="qemu-system-i386 -cpu coreduo,-nx"
> >
> > # Settings for Ia32X64 only:
> > [...]
> > QEMU_COMMAND=qemu-system-x86_64
> 
> I guess the "-nx" bit can be left off with TCG, but AFAIR it is required
> for KVM.

Oh right now I remember.  The same problem exists: EFER is not saved in the
32-bit state save map.  AFAIK all processors with XD also have long mode.

That said, qemu-system-x86_64 and no -cpu option should work even with Ia32
PEI/DXE/SMM and no -cpu option.  In that case you could use XD.

Now if only Intel made the *full* format of the state save map public, we
could emulate everything more accurately...  I'm told it's in the BIOS
writers guide.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-11-10 16:39 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-04  9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
2016-11-04  9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
2016-11-04  9:30 ` [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid Jiewen Yao
2016-11-04  9:30 ` [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support Jiewen Yao
2016-11-04  9:30 ` [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable Jiewen Yao
2016-11-04  9:30 ` [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection Jiewen Yao
2016-11-04  9:30 ` [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm " Jiewen Yao
2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
2016-11-04 22:46   ` Yao, Jiewen
2016-11-04 23:08     ` Laszlo Ersek
2016-11-08  1:22 ` Laszlo Ersek
2016-11-08 12:59   ` Yao, Jiewen
2016-11-08 13:22     ` Laszlo Ersek
2016-11-08 13:41       ` Yao, Jiewen
2016-11-09  6:25   ` Yao, Jiewen
2016-11-09 11:30     ` Paolo Bonzini
2016-11-09 15:01       ` Yao, Jiewen
2016-11-09 15:54         ` Paolo Bonzini
2016-11-09 16:06           ` Paolo Bonzini
2016-11-09 22:28           ` Laszlo Ersek
2016-11-09 22:59             ` Paolo Bonzini
2016-11-09 23:27               ` Laszlo Ersek
2016-11-10  1:13                 ` Yao, Jiewen
2016-11-10  6:30                   ` Fan, Jeff
2016-11-10  0:49               ` Yao, Jiewen
2016-11-10  0:50               ` Yao, Jiewen
2016-11-10  1:02                 ` Fan, Jeff
2016-11-09 20:46     ` Laszlo Ersek
2016-11-10 10:41       ` Yao, Jiewen
2016-11-10 12:01         ` Laszlo Ersek
2016-11-10 14:48           ` Yao, Jiewen
2016-11-10 14:53             ` Paolo Bonzini
2016-11-10 16:22               ` Laszlo Ersek
2016-11-10 16:39                 ` Paolo Bonzini
2016-11-10 16:25             ` Laszlo Ersek
2016-11-10 12:27         ` Paolo Bonzini
2016-11-09 11:23   ` Paolo Bonzini
2016-11-09 15:16     ` Yao, Jiewen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox