public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH 0/6] Enable SMM page level protection.
@ 2016-11-03  6:53 Jiewen Yao
  2016-11-03  6:55 ` Yao, Jiewen
  2016-11-03 21:43 ` Laszlo Ersek
  0 siblings, 2 replies; 23+ messages in thread
From: Jiewen Yao @ 2016-11-03  6:53 UTC (permalink / raw)
  To: edk2-devel; +Cc: Jeff Fan, Feng Tian, Star Zeng, Michael D Kinney, Laszlo Ersek

This series patch enables SMM page level protection.
Features are:
1) PiSmmCore reports SMM PE image code/data information
in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
and set XD for data page and RO for code page.
3) PiSmmCpu enables Static Paging for X64 according to
PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
is used as long as it is supported.
4) PiSmmCpu sets importance data structure to be read only,
such as Gdt, Idt, SmmEntrypoint, and PageTable itself.

tested platform:
1) Intel internal platform (X64).
2) EDKII Quark IA32
3) EDKII Vlv2  X64
4) EDKII OVMF IA32 and IA32X64.

Cc: Jeff Fan <jeff.fan@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>

Jiewen Yao (6):
  MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
  MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
  MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
  UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
  UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
  QuarkPlatformPkg/dsc: enable Smm paging protection.

 MdeModulePkg/Core/PiSmmCore/Dispatcher.c               |   66 +
 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c    | 1509 ++++++++++++++++++++
 MdeModulePkg/Core/PiSmmCore/Page.c                     |  775 +++++++++-
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.c                |   40 +
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.h                |   91 ++
 MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf              |    2 +
 MdeModulePkg/Core/PiSmmCore/Pool.c                     |   16 +
 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h |   51 +
 MdeModulePkg/MdeModulePkg.dec                          |    3 +
 QuarkPlatformPkg/Quark.dsc                             |    6 +
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c               |   71 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S              |   67 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm            |   68 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm           |   70 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S          |  226 +--
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm        |   36 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm       |   36 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c          |   37 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c        |    4 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c                  |  110 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c             |  142 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h             |  156 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf           |    5 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c     |  872 +++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c                 |   39 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h                 |   15 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c                |  272 +++-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S               |   51 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm             |   54 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm            |   61 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S           |  250 +---
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm         |   35 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm        |   31 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c           |   30 +-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c         |    6 +-
 UefiCpuPkg/UefiCpuPkg.dec                              |    8 +
 36 files changed, 4513 insertions(+), 798 deletions(-)
 create mode 100644 MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
 create mode 100644 MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
 create mode 100644 UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c

-- 
2.7.4.windows.1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-03  6:53 Jiewen Yao
@ 2016-11-03  6:55 ` Yao, Jiewen
  2016-11-03 21:43 ` Laszlo Ersek
  1 sibling, 0 replies; 23+ messages in thread
From: Yao, Jiewen @ 2016-11-03  6:55 UTC (permalink / raw)
  To: Yao, Jiewen, edk2-devel@lists.01.org
  Cc: Kinney, Michael D, Tian, Feng, Laszlo Ersek, Fan, Jeff,
	Zeng, Star

I forget to mention - the whole series is also pushed to https://github.com/jyao1/edk2, SmmProtection_V1 branch.
You can also review the code there.

Thank you
Yao Jiewen

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Jiewen Yao
> Sent: Thursday, November 3, 2016 2:54 PM
> To: edk2-devel@lists.01.org
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Tian, Feng
> <feng.tian@intel.com>; Laszlo Ersek <lersek@redhat.com>; Fan, Jeff
> <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: [edk2] [PATCH 0/6] Enable SMM page level protection.
> 
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
> 
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.
> 
> Cc: Jeff Fan <jeff.fan@intel.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Star Zeng <star.zeng@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
> 
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.
> 
>  MdeModulePkg/Core/PiSmmCore/Dispatcher.c               |   66 +
>  MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c    | 1509
> ++++++++++++++++++++
>  MdeModulePkg/Core/PiSmmCore/Page.c                     |  775
> +++++++++-
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.c                |   40
> +
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.h                |   91
> ++
>  MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf              |    2
> +
>  MdeModulePkg/Core/PiSmmCore/Pool.c                     |   16
> +
>  MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h |   51 +
>  MdeModulePkg/MdeModulePkg.dec                          |
> 3 +
>  QuarkPlatformPkg/Quark.dsc                             |    6 +
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c               |   71
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.S              |   67
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.asm            |   68
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiEntry.nasm           |   70
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.S          |  226
> +--
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.asm        |   36
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmiException.nasm       |   36
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c          |   37
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmProfileArch.c        |    4
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c                  |  110
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c             |
> 142 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h             |
> 156 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf           |
> 5 +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c     |
> 872 +++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.c                 |   39
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h                 |   15
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c                |  272
> +++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.S               |   51
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.asm             |   54
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiEntry.nasm            |   61
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.S           |  250
> +---
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.asm         |   35
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmiException.nasm        |   31
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c           |   30
> +-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmProfileArch.c         |    6
> +-
>  UefiCpuPkg/UefiCpuPkg.dec                              |    8 +
>  36 files changed, 4513 insertions(+), 798 deletions(-)
>  create mode 100644
> MdeModulePkg/Core/PiSmmCore/MemoryAttributesTable.c
>  create mode 100644
> MdeModulePkg/Include/Guid/PiSmmMemoryAttributesTable.h
>  create mode 100644
> UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c
> 
> --
> 2.7.4.windows.1
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-03  6:53 Jiewen Yao
  2016-11-03  6:55 ` Yao, Jiewen
@ 2016-11-03 21:43 ` Laszlo Ersek
  2016-11-03 23:51   ` Yao, Jiewen
  2016-11-04  9:35   ` Yao, Jiewen
  1 sibling, 2 replies; 23+ messages in thread
From: Laszlo Ersek @ 2016-11-03 21:43 UTC (permalink / raw)
  To: Jiewen Yao
  Cc: edk2-devel, Michael D Kinney, Feng Tian, Jeff Fan, Star Zeng,
	Paolo Bonzini, Radim Krčmář

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
> 
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

> 
> Cc: Jeff Fan <jeff.fan@intel.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Star Zeng <star.zeng@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com>
> 
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-03 21:43 ` Laszlo Ersek
@ 2016-11-03 23:51   ` Yao, Jiewen
  2016-11-04  0:09     ` Kinney, Michael D
  2016-11-04 15:22     ` Laszlo Ersek
  2016-11-04  9:35   ` Yao, Jiewen
  1 sibling, 2 replies; 23+ messages in thread
From: Yao, Jiewen @ 2016-11-03 23:51 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Kinney, Michael D, Paolo Bonzini, Fan, Jeff, Zeng, Star

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Krčmář <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-03 23:51   ` Yao, Jiewen
@ 2016-11-04  0:09     ` Kinney, Michael D
  2016-11-04  1:15       ` Yao, Jiewen
  2016-11-04 15:22     ` Laszlo Ersek
  1 sibling, 1 reply; 23+ messages in thread
From: Kinney, Michael D @ 2016-11-04  0:09 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, Kinney, Michael D
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Jiewen,

Try setting -smp 8 for 8 CPUs.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 4:52 PM
To: Laszlo Ersek <lersek@redhat.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Krčmář <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  0:09     ` Kinney, Michael D
@ 2016-11-04  1:15       ` Yao, Jiewen
  2016-11-04 16:15         ` Laszlo Ersek
  0 siblings, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2016-11-04  1:15 UTC (permalink / raw)
  To: Kinney, Michael D, Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Thank you, Mike.
Yes, I reproduced the issue and found out what is wrong. Here is detail:

It seems OVMF never puts AP in SMM mode, even *before my patch series*.
I rollback all my update and use e9d093.

I add some debug message in PerformRemainingTasks(), I found below:
*(mSmmMpSyncData->CpuData[0].Present) = 1
*(mSmmMpSyncData->CpuData[1].Present) = 0
*(mSmmMpSyncData->CpuData[2].Present) = 0
*(mSmmMpSyncData->CpuData[3].Present) = 0
*(mSmmMpSyncData->CpuData[4].Present) = 0
*(mSmmMpSyncData->CpuData[5].Present) = 0
*(mSmmMpSyncData->CpuData[6].Present) = 0
*(mSmmMpSyncData->CpuData[7].Present) = 0
It is never exposed before, because there is no code to call SmmStartupAp.

I assume all APs should be present, so I added ASSERT.
Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.

Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
Anyone can confirm that?

Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 8:10 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek <lersek@redhat.com>; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

Try setting -smp 8 for 8 CPUs.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 4:52 PM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Krčmář <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
@ 2016-11-04  2:49 Kinney, Michael D
  2016-11-04  2:55 ` Yao, Jiewen
  0 siblings, 1 reply; 23+ messages in thread
From: Kinney, Michael D @ 2016-11-04  2:49 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, Kinney, Michael D
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Jiewen,

This is how I launch Qemu on Windows for OvmfPkgIa32X64.dsc.  The flags are not identical to yours.  I have these additional flags: -cpu Nehalem -global ICH9-LPC.disable_s3=1

set QEMU_PATH=C:\Program Files\Qemu

set EDKII_BUILD_OUTPUT=%WORKSPACE%\Build\Ovmf3264\DEBUG_VS2015x86
echo %EDKII_BUILD_OUTPUT%

start "Monitor" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20717 /nossh
start "Console" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20716 /nosh

start "QEMU" /B "%QEMU_PATH%\qemu-system-x86_64w.exe" -machine q35,smm=on,accel=tcg -net none -cpu Nehalem -global ICH9-LPC.disable_s3=1 ^
-drive if=pflash,format=raw,unit=0,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_CODE.fd,readonly=on ^
-drive if=pflash,format=raw,unit=1,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_VARS.fd ^
-monitor tcp:localhost:20717,server ^
-serial tcp:localhost:20716,server ^
-smp 8 ^

Best regards,

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 6:16 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thank you, Mike.
Yes, I reproduced the issue and found out what is wrong. Here is detail:

It seems OVMF never puts AP in SMM mode, even *before my patch series*.
I rollback all my update and use e9d093.

I add some debug message in PerformRemainingTasks(), I found below:
*(mSmmMpSyncData->CpuData[0].Present) = 1
*(mSmmMpSyncData->CpuData[1].Present) = 0
*(mSmmMpSyncData->CpuData[2].Present) = 0
*(mSmmMpSyncData->CpuData[3].Present) = 0
*(mSmmMpSyncData->CpuData[4].Present) = 0
*(mSmmMpSyncData->CpuData[5].Present) = 0
*(mSmmMpSyncData->CpuData[6].Present) = 0
*(mSmmMpSyncData->CpuData[7].Present) = 0
It is never exposed before, because there is no code to call SmmStartupAp.

I assume all APs should be present, so I added ASSERT.
Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.

Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
Anyone can confirm that?

Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 8:10 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

Try setting -smp 8 for 8 CPUs.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 4:52 PM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Krčmář <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  2:49 [PATCH 0/6] Enable SMM page level protection Kinney, Michael D
@ 2016-11-04  2:55 ` Yao, Jiewen
  2016-11-04  3:05   ` Kinney, Michael D
  0 siblings, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2016-11-04  2:55 UTC (permalink / raw)
  To: Kinney, Michael D, Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Thanks Mike.

I added  "-cpu Nehalem -global ICH9-LPC.disable_s3=1", and get same result. APs are not present.


Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 10:50 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek <lersek@redhat.com>; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

This is how I launch Qemu on Windows for OvmfPkgIa32X64.dsc.  The flags are not identical to yours.  I have these additional flags: -cpu Nehalem -global ICH9-LPC.disable_s3=1

set QEMU_PATH=C:\Program Files\Qemu

set EDKII_BUILD_OUTPUT=%WORKSPACE%\Build\Ovmf3264\DEBUG_VS2015x86
echo %EDKII_BUILD_OUTPUT%

start "Monitor" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20717 /nossh
start "Console" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20716 /nosh

start "QEMU" /B "%QEMU_PATH%\qemu-system-x86_64w.exe" -machine q35,smm=on,accel=tcg -net none -cpu Nehalem -global ICH9-LPC.disable_s3=1 ^
-drive if=pflash,format=raw,unit=0,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_CODE.fd,readonly=on ^
-drive if=pflash,format=raw,unit=1,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_VARS.fd ^
-monitor tcp:localhost:20717,server ^
-serial tcp:localhost:20716,server ^
-smp 8 ^

Best regards,

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 6:16 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thank you, Mike.
Yes, I reproduced the issue and found out what is wrong. Here is detail:

It seems OVMF never puts AP in SMM mode, even *before my patch series*.
I rollback all my update and use e9d093.

I add some debug message in PerformRemainingTasks(), I found below:
*(mSmmMpSyncData->CpuData[0].Present) = 1
*(mSmmMpSyncData->CpuData[1].Present) = 0
*(mSmmMpSyncData->CpuData[2].Present) = 0
*(mSmmMpSyncData->CpuData[3].Present) = 0
*(mSmmMpSyncData->CpuData[4].Present) = 0
*(mSmmMpSyncData->CpuData[5].Present) = 0
*(mSmmMpSyncData->CpuData[6].Present) = 0
*(mSmmMpSyncData->CpuData[7].Present) = 0
It is never exposed before, because there is no code to call SmmStartupAp.

I assume all APs should be present, so I added ASSERT.
Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.

Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
Anyone can confirm that?

Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 8:10 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

Try setting -smp 8 for 8 CPUs.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 4:52 PM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Krčmář <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  2:55 ` Yao, Jiewen
@ 2016-11-04  3:05   ` Kinney, Michael D
  2016-11-04  3:12     ` Kinney, Michael D
  2016-11-04  3:14     ` Yao, Jiewen
  0 siblings, 2 replies; 23+ messages in thread
From: Kinney, Michael D @ 2016-11-04  3:05 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, Kinney, Michael D
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Jiewen,

I remember now.  Ovmf uses a different PCD setting for PcdCpuSmmSyncMode.

!if $(SMM_REQUIRE) == TRUE
  gUefiCpuPkgTokenSpaceGuid. PcdCpuSmmSyncMode |0x01
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmApSyncTimeout|100000
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStackSize|0x4000
!endif


  ## Indicates the CPU synchronization method used when processing an SMI.
  #   0x00  - Traditional CPU synchronization method.<BR>
  #   0x01  - Relaxed CPU synchronization method.<BR>
  # @Prompt SMM CPU Synchronization Method.
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode|0x00|UINT8|0x60000014


Try setting is to default 0x0 value and I think all CPUs will come into SMM.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 7:56 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thanks Mike.

I added  "-cpu Nehalem -global ICH9-LPC.disable_s3=1", and get same result. APs are not present.


Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 10:50 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

This is how I launch Qemu on Windows for OvmfPkgIa32X64.dsc.  The flags are not identical to yours.  I have these additional flags: -cpu Nehalem -global ICH9-LPC.disable_s3=1

set QEMU_PATH=C:\Program Files\Qemu

set EDKII_BUILD_OUTPUT=%WORKSPACE%\Build\Ovmf3264\DEBUG_VS2015x86
echo %EDKII_BUILD_OUTPUT%

start "Monitor" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20717 /nossh
start "Console" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20716 /nosh

start "QEMU" /B "%QEMU_PATH%\qemu-system-x86_64w.exe" -machine q35,smm=on,accel=tcg -net none -cpu Nehalem -global ICH9-LPC.disable_s3=1 ^
-drive if=pflash,format=raw,unit=0,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_CODE.fd,readonly=on ^
-drive if=pflash,format=raw,unit=1,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_VARS.fd ^
-monitor tcp:localhost:20717,server ^
-serial tcp:localhost:20716,server ^
-smp 8 ^

Best regards,

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 6:16 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thank you, Mike.
Yes, I reproduced the issue and found out what is wrong. Here is detail:

It seems OVMF never puts AP in SMM mode, even *before my patch series*.
I rollback all my update and use e9d093.

I add some debug message in PerformRemainingTasks(), I found below:
*(mSmmMpSyncData->CpuData[0].Present) = 1
*(mSmmMpSyncData->CpuData[1].Present) = 0
*(mSmmMpSyncData->CpuData[2].Present) = 0
*(mSmmMpSyncData->CpuData[3].Present) = 0
*(mSmmMpSyncData->CpuData[4].Present) = 0
*(mSmmMpSyncData->CpuData[5].Present) = 0
*(mSmmMpSyncData->CpuData[6].Present) = 0
*(mSmmMpSyncData->CpuData[7].Present) = 0
It is never exposed before, because there is no code to call SmmStartupAp.

I assume all APs should be present, so I added ASSERT.
Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.

Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
Anyone can confirm that?

Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 8:10 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

Try setting -smp 8 for 8 CPUs.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 4:52 PM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Krčmář <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  3:05   ` Kinney, Michael D
@ 2016-11-04  3:12     ` Kinney, Michael D
  2016-11-04  3:20       ` Yao, Jiewen
  2016-11-04  3:14     ` Yao, Jiewen
  1 sibling, 1 reply; 23+ messages in thread
From: Kinney, Michael D @ 2016-11-04  3:12 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, Kinney, Michael D
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Jiewen,

Here is the commits with detailed messages on this topic

https://github.com/tianocore/edk2/commit/9b1e378811ff730fe4e5bbbba294ea74ef76a915

https://github.com/tianocore/edk2/commit/bb0f18b0bce682f6780af997de45bcef146c11ca

Mike

From: Kinney, Michael D
Sent: Thursday, November 3, 2016 8:05 PM
To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek <lersek@redhat.com>; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

I remember now.  Ovmf uses a different PCD setting for PcdCpuSmmSyncMode.

!if $(SMM_REQUIRE) == TRUE
  gUefiCpuPkgTokenSpaceGuid. PcdCpuSmmSyncMode |0x01
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmApSyncTimeout|100000
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStackSize|0x4000
!endif


  ## Indicates the CPU synchronization method used when processing an SMI.
  #   0x00  - Traditional CPU synchronization method.<BR>
  #   0x01  - Relaxed CPU synchronization method.<BR>
  # @Prompt SMM CPU Synchronization Method.
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode|0x00|UINT8|0x60000014


Try setting is to default 0x0 value and I think all CPUs will come into SMM.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 7:56 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thanks Mike.

I added  "-cpu Nehalem -global ICH9-LPC.disable_s3=1", and get same result. APs are not present.


Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 10:50 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

This is how I launch Qemu on Windows for OvmfPkgIa32X64.dsc.  The flags are not identical to yours.  I have these additional flags: -cpu Nehalem -global ICH9-LPC.disable_s3=1

set QEMU_PATH=C:\Program Files\Qemu

set EDKII_BUILD_OUTPUT=%WORKSPACE%\Build\Ovmf3264\DEBUG_VS2015x86
echo %EDKII_BUILD_OUTPUT%

start "Monitor" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20717 /nossh
start "Console" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20716 /nosh

start "QEMU" /B "%QEMU_PATH%\qemu-system-x86_64w.exe" -machine q35,smm=on,accel=tcg -net none -cpu Nehalem -global ICH9-LPC.disable_s3=1 ^
-drive if=pflash,format=raw,unit=0,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_CODE.fd,readonly=on ^
-drive if=pflash,format=raw,unit=1,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_VARS.fd ^
-monitor tcp:localhost:20717,server ^
-serial tcp:localhost:20716,server ^
-smp 8 ^

Best regards,

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 6:16 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thank you, Mike.
Yes, I reproduced the issue and found out what is wrong. Here is detail:

It seems OVMF never puts AP in SMM mode, even *before my patch series*.
I rollback all my update and use e9d093.

I add some debug message in PerformRemainingTasks(), I found below:
*(mSmmMpSyncData->CpuData[0].Present) = 1
*(mSmmMpSyncData->CpuData[1].Present) = 0
*(mSmmMpSyncData->CpuData[2].Present) = 0
*(mSmmMpSyncData->CpuData[3].Present) = 0
*(mSmmMpSyncData->CpuData[4].Present) = 0
*(mSmmMpSyncData->CpuData[5].Present) = 0
*(mSmmMpSyncData->CpuData[6].Present) = 0
*(mSmmMpSyncData->CpuData[7].Present) = 0
It is never exposed before, because there is no code to call SmmStartupAp.

I assume all APs should be present, so I added ASSERT.
Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.

Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
Anyone can confirm that?

Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 8:10 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

Try setting -smp 8 for 8 CPUs.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 4:52 PM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Krčmář <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  3:05   ` Kinney, Michael D
  2016-11-04  3:12     ` Kinney, Michael D
@ 2016-11-04  3:14     ` Yao, Jiewen
  2016-11-04 16:28       ` Laszlo Ersek
  1 sibling, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2016-11-04  3:14 UTC (permalink / raw)
  To: Kinney, Michael D, Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Mike
You are right.
After I turn on this, I saw all APs.

However, the system becomes extremely slow. Intolerable.

I suggest to add more description around below on why it is set to 0x01 by default, and what is the impact if it becomes 0x00.

!if $(SMM_REQUIRE) == TRUE
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode |0x01
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmApSyncTimeout|100000
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStackSize|0x4000
!endif



From: Kinney, Michael D
Sent: Friday, November 4, 2016 11:05 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek <lersek@redhat.com>; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

I remember now.  Ovmf uses a different PCD setting for PcdCpuSmmSyncMode.

!if $(SMM_REQUIRE) == TRUE
  gUefiCpuPkgTokenSpaceGuid. PcdCpuSmmSyncMode |0x01
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmApSyncTimeout|100000
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStackSize|0x4000
!endif


  ## Indicates the CPU synchronization method used when processing an SMI.
  #   0x00  - Traditional CPU synchronization method.<BR>
  #   0x01  - Relaxed CPU synchronization method.<BR>
  # @Prompt SMM CPU Synchronization Method.
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode|0x00|UINT8|0x60000014


Try setting is to default 0x0 value and I think all CPUs will come into SMM.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 7:56 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thanks Mike.

I added  "-cpu Nehalem -global ICH9-LPC.disable_s3=1", and get same result. APs are not present.


Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 10:50 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

This is how I launch Qemu on Windows for OvmfPkgIa32X64.dsc.  The flags are not identical to yours.  I have these additional flags: -cpu Nehalem -global ICH9-LPC.disable_s3=1

set QEMU_PATH=C:\Program Files\Qemu

set EDKII_BUILD_OUTPUT=%WORKSPACE%\Build\Ovmf3264\DEBUG_VS2015x86
echo %EDKII_BUILD_OUTPUT%

start "Monitor" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20717 /nossh
start "Console" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20716 /nosh

start "QEMU" /B "%QEMU_PATH%\qemu-system-x86_64w.exe" -machine q35,smm=on,accel=tcg -net none -cpu Nehalem -global ICH9-LPC.disable_s3=1 ^
-drive if=pflash,format=raw,unit=0,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_CODE.fd,readonly=on ^
-drive if=pflash,format=raw,unit=1,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_VARS.fd ^
-monitor tcp:localhost:20717,server ^
-serial tcp:localhost:20716,server ^
-smp 8 ^

Best regards,

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 6:16 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thank you, Mike.
Yes, I reproduced the issue and found out what is wrong. Here is detail:

It seems OVMF never puts AP in SMM mode, even *before my patch series*.
I rollback all my update and use e9d093.

I add some debug message in PerformRemainingTasks(), I found below:
*(mSmmMpSyncData->CpuData[0].Present) = 1
*(mSmmMpSyncData->CpuData[1].Present) = 0
*(mSmmMpSyncData->CpuData[2].Present) = 0
*(mSmmMpSyncData->CpuData[3].Present) = 0
*(mSmmMpSyncData->CpuData[4].Present) = 0
*(mSmmMpSyncData->CpuData[5].Present) = 0
*(mSmmMpSyncData->CpuData[6].Present) = 0
*(mSmmMpSyncData->CpuData[7].Present) = 0
It is never exposed before, because there is no code to call SmmStartupAp.

I assume all APs should be present, so I added ASSERT.
Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.

Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
Anyone can confirm that?

Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 8:10 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

Try setting -smp 8 for 8 CPUs.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 4:52 PM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Krčmář <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  3:12     ` Kinney, Michael D
@ 2016-11-04  3:20       ` Yao, Jiewen
  2016-11-04 11:34         ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2016-11-04  3:20 UTC (permalink / raw)
  To: Kinney, Michael D, Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

Good info. Thanks!

I do not understand below word. I still see a *huge* performance gap.

I am confused on how is it resolved in previous patch. Or do I need configure something for my QEMU?

In order to mitigate the performance penalty, decrease
PcdCpuSmmApSyncTimeout to one tenth of its default value: 100 ms. (For
comparison, Vlv2TbltDevicePkg sets 1 ms.)


From: Kinney, Michael D
Sent: Friday, November 4, 2016 11:12 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek <lersek@redhat.com>; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

Here is the commits with detailed messages on this topic

https://github.com/tianocore/edk2/commit/9b1e378811ff730fe4e5bbbba294ea74ef76a915

https://github.com/tianocore/edk2/commit/bb0f18b0bce682f6780af997de45bcef146c11ca

Mike

From: Kinney, Michael D
Sent: Thursday, November 3, 2016 8:05 PM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

I remember now.  Ovmf uses a different PCD setting for PcdCpuSmmSyncMode.

!if $(SMM_REQUIRE) == TRUE
  gUefiCpuPkgTokenSpaceGuid. PcdCpuSmmSyncMode |0x01
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmApSyncTimeout|100000
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStackSize|0x4000
!endif


  ## Indicates the CPU synchronization method used when processing an SMI.
  #   0x00  - Traditional CPU synchronization method.<BR>
  #   0x01  - Relaxed CPU synchronization method.<BR>
  # @Prompt SMM CPU Synchronization Method.
  gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode|0x00|UINT8|0x60000014


Try setting is to default 0x0 value and I think all CPUs will come into SMM.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 7:56 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thanks Mike.

I added  "-cpu Nehalem -global ICH9-LPC.disable_s3=1", and get same result. APs are not present.


Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 10:50 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

This is how I launch Qemu on Windows for OvmfPkgIa32X64.dsc.  The flags are not identical to yours.  I have these additional flags: -cpu Nehalem -global ICH9-LPC.disable_s3=1

set QEMU_PATH=C:\Program Files\Qemu

set EDKII_BUILD_OUTPUT=%WORKSPACE%\Build\Ovmf3264\DEBUG_VS2015x86
echo %EDKII_BUILD_OUTPUT%

start "Monitor" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20717 /nossh
start "Console" /B "c:\Program Files (x86)\teraterm\ttermpro.exe" localhost:20716 /nosh

start "QEMU" /B "%QEMU_PATH%\qemu-system-x86_64w.exe" -machine q35,smm=on,accel=tcg -net none -cpu Nehalem -global ICH9-LPC.disable_s3=1 ^
-drive if=pflash,format=raw,unit=0,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_CODE.fd,readonly=on ^
-drive if=pflash,format=raw,unit=1,file=%EDKII_BUILD_OUTPUT%\FV\OVMF_VARS.fd ^
-monitor tcp:localhost:20717,server ^
-serial tcp:localhost:20716,server ^
-smp 8 ^

Best regards,

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 6:16 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Thank you, Mike.
Yes, I reproduced the issue and found out what is wrong. Here is detail:

It seems OVMF never puts AP in SMM mode, even *before my patch series*.
I rollback all my update and use e9d093.

I add some debug message in PerformRemainingTasks(), I found below:
*(mSmmMpSyncData->CpuData[0].Present) = 1
*(mSmmMpSyncData->CpuData[1].Present) = 0
*(mSmmMpSyncData->CpuData[2].Present) = 0
*(mSmmMpSyncData->CpuData[3].Present) = 0
*(mSmmMpSyncData->CpuData[4].Present) = 0
*(mSmmMpSyncData->CpuData[5].Present) = 0
*(mSmmMpSyncData->CpuData[6].Present) = 0
*(mSmmMpSyncData->CpuData[7].Present) = 0
It is never exposed before, because there is no code to call SmmStartupAp.

I assume all APs should be present, so I added ASSERT.
Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.

Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
Anyone can confirm that?

Thank you
Yao Jiewen

From: Kinney, Michael D
Sent: Friday, November 4, 2016 8:10 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Jiewen,

Try setting -smp 8 for 8 CPUs.

Mike

From: Yao, Jiewen
Sent: Thursday, November 3, 2016 4:52 PM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Kr?má? <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: RE: [edk2] [PATCH 0/6] Enable SMM page level protection.

Hi Laszlo
I appreciate your help to validate the patch for me.

Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)

A quick look at the code.


1)      The ASSERT issue
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
is caused by this line:
      Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
      ASSERT_EFI_ERROR(Status);

I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
  if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
      CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
      !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
      gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
    return EFI_INVALID_PARAMETER;
  }

I am using below line to start QEMU:
"qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"

Would you mind let me know what is difference between your environment and mine?

I will add more debug info to see what happened, which of above lines triggers the error.



2)      The unstable case is a headache.
According to "RIP=000000000009f0fd", it seems outside of SMM right?
What does this *KVM internal error. Suberror: 1* mean?

Do you have any clue on what happened?

Thank you
Yao Jiewen

From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Radim Krčmář <rkrcmar@redhat.com<mailto:rkrcmar@redhat.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-03 21:43 ` Laszlo Ersek
  2016-11-03 23:51   ` Yao, Jiewen
@ 2016-11-04  9:35   ` Yao, Jiewen
  2016-11-04 15:23     ` Laszlo Ersek
  1 sibling, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2016-11-04  9:35 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Kinney, Michael D, Yao, Jiewen, Paolo Bonzini, Fan, Jeff,
	Zeng, Star

Hi Laszlo
I just send out V2 patch.

The new update resolved OVMF ASSERT issue, and added more information in commit message.

However, there is no update for OVMF stability issue. I have no much idea on what happened.

If you want to try V2 with more tests, that is great.
But if you want to resolve S3 stability issue at first, I am also OK.

At same time, if you have any clue on the S3 stability issue, please let me know.

Thank you
Yao Jiewen


From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Friday, November 4, 2016 5:43 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Krčmář <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.

On 11/03/16 07:53, Jiewen Yao wrote:
> This series patch enables SMM page level protection.
> Features are:
> 1) PiSmmCore reports SMM PE image code/data information
> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
> and set XD for data page and RO for code page.
> 3) PiSmmCpu enables Static Paging for X64 according to
> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
> is used as long as it is supported.
> 4) PiSmmCpu sets importance data structure to be read only,
> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>
> tested platform:
> 1) Intel internal platform (X64).
> 2) EDKII Quark IA32
> 3) EDKII Vlv2  X64
> 4) EDKII OVMF IA32 and IA32X64.

Did you use a Windows QEMU binary for the OVMF tests?

Please find my test results below, all done on KVM
(3.10.0-514.el7.x86_64, if anyone is curious :)).

>
> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>
> Jiewen Yao (6):
>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>   QuarkPlatformPkg/dsc: enable Smm paging protection.

Legend:

- "untested" means the test was not executed because the same test
  failed or proved unreliable in a less demanding configuration already,

- "n/a" means a setting or test case was impossible,

- "fail" and "unreliable" (lower case) are outside the scope of this
  series; they either capture the pre-series status, or are expected
  even with the series applied due to the pre-series status,

- "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
  series.

In all cases, 36 bits were used as address width in the CPU HOB (--> up
to 64GB guest-phys address space).

   series  OVMF                                                              VCPU     boot       S3 resume
 # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
-- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
 1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
 2 no      Ia32     255                              n/a                     52x2x2   pass       untested
 3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
 4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
 5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
 6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
 7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
 8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
 9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested

Notes for the baseline tests (1-6):

* test 3: boot unreliable due to out-of-SMRAM (depends on what we do
  during boot)

* test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
  wakeup is quick, works okay

* test 6: boot fail: out of SMRAM (expected)

Notes for the tests with the series applied:

* test 7: normal boot is regressed even with
  PcdCpuSmmStaticPageTable=FALSE, relative to test 1:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)

  Note that the processor model used in this test has no support for NX.

* test 8: results identical to those of test 7

* test 13: S3 resume is unreliable (regression relative to test 4). End
  of OVMF log (Fedora guest):

> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
> S3BootScriptDone - Success
> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
> Install PPI: [PeiPostScriptTablePpi]
> Install PPI: [EfiEndOfPeiSignalPpi]
> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
> Transfer to 16bit OS waking vector - 9A1D0

  QEMU log:

> KVM internal error. Suberror: 1
> KVM internal error. Suberror: 1
> emulation failure
> emulation failure
> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f294000 00000047
> IDT=     000000007f294048 00000fff
> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

  This happened 1 time out of the 2 times I ran the test.

* test 14: boot regression relative to test 4. The symptoms are
  identical to those seen in test 7, that is:

> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
> Patch page table start ...
> Patch page table done!
> MemoryAttributesTable - NULL
> SetPageTableAttributes
> Start...
> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>
> ASSERT_EFI_ERROR (Status = Invalid Parameter)
> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)


Summary:

- this series seems to break the boot with the Ia32 build of OVMF,
  regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),

- it seems to break the boot with the Ia32X64 build of OVMF, with
  PcdCpuSmmStaticPageTable=TRUE (case 14),

- for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
  works, but S3 becomes unreliable (case 13),

- because there was no successful boot with
  PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
  increased SMRAM demand would impact the maximum count of bootable
  VCPUs. (For this test, I should likely increase the guest-phys address
  width from 36 bits anyway.)

Thanks
Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  3:20       ` Yao, Jiewen
@ 2016-11-04 11:34         ` Paolo Bonzini
  2016-11-04 13:28           ` Yao, Jiewen
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2016-11-04 11:34 UTC (permalink / raw)
  To: Yao, Jiewen, Kinney, Michael D, Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org, Fan, Jeff,
	Zeng, Star



On 04/11/2016 04:20, Yao, Jiewen wrote:
> Good info. Thanks!
> 
> I do not understand below word. I still see a **huge** performance gap.
> 
> I am confused on how is it resolved in previous patch. Or do I need
> configure something for my QEMU?

The delay you're seeing comes from SmmWaitForApArrival.  See this
explanation:

----
Port 0xb2 on QEMU only sends an SMI to the currently executing
processor.  The SMI handler, however, and in particular
SmmWaitForApArrival, currently expects that SmmControl2DxeTrigger
triggers an SMI IPI on all processors rather than just the BSP.  Thus
all SMM invocations loop for a second (the default value of
PcdCpuSmmApSyncTimeout) before SmmWaitForApArrival sends another SMI IPI
to the APs.
----

Can you try calling SendSmiIpiAllExcludingSelf in SmmControl2DxeTrigger
(OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.c) before the I/O port writes?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04 11:34         ` Paolo Bonzini
@ 2016-11-04 13:28           ` Yao, Jiewen
  2016-11-04 13:50             ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2016-11-04 13:28 UTC (permalink / raw)
  To: Paolo Bonzini, Kinney, Michael D, Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org, Fan, Jeff,
	Zeng, Star

Thank you Paolo.

I tried below way. But it does not help too much. It still takes more than 1 minutes to boot with SMP=8.
  SendSmiIpiAllExcludingSelf ();
  IoWrite8 (ICH9_APM_STS, DataPort    == NULL ? 0 : *DataPort);
  IoWrite8 (ICH9_APM_CNT, CommandPort == NULL ? 0 : *CommandPort);

I also tried to reduce the timeout PCD to:
gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmApSyncTimeout|1000
However, I find CPU-2 is still missing.

Maybe it is caused by QEMU emulate AP in serial mode, not parallel mode.
I think it might be best choice to set PcdCpuSmmSyncMode|0x1
It also helps cover a very corner case in SMM. :)


At same time, would you mind help to take a look at the S3 unstable issue? If you have any clue, please let me know.


Thank you
Yao Jiewen

From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Friday, November 4, 2016 7:34 PM
To: Yao, Jiewen <jiewen.yao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Tian, Feng <feng.tian@intel.com>; Radim Kr?má? <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.



On 04/11/2016 04:20, Yao, Jiewen wrote:
> Good info. Thanks!
>
> I do not understand below word. I still see a **huge** performance gap.
>
> I am confused on how is it resolved in previous patch. Or do I need
> configure something for my QEMU?

The delay you're seeing comes from SmmWaitForApArrival.  See this
explanation:

----
Port 0xb2 on QEMU only sends an SMI to the currently executing
processor.  The SMI handler, however, and in particular
SmmWaitForApArrival, currently expects that SmmControl2DxeTrigger
triggers an SMI IPI on all processors rather than just the BSP.  Thus
all SMM invocations loop for a second (the default value of
PcdCpuSmmApSyncTimeout) before SmmWaitForApArrival sends another SMI IPI
to the APs.
----

Can you try calling SendSmiIpiAllExcludingSelf in SmmControl2DxeTrigger
(OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.c) before the I/O port writes?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04 13:28           ` Yao, Jiewen
@ 2016-11-04 13:50             ` Paolo Bonzini
  2016-11-04 16:34               ` Laszlo Ersek
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2016-11-04 13:50 UTC (permalink / raw)
  To: Yao, Jiewen, Kinney, Michael D, Laszlo Ersek
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org, Fan, Jeff,
	Zeng, Star



On 04/11/2016 14:28, Yao, Jiewen wrote:
> I tried below way. But it does not help too much. It still takes more
> than 1 minutes to boot with SMP=8.
> 
>   SendSmiIpiAllExcludingSelf ();
>   IoWrite8 (ICH9_APM_STS, DataPort    == NULL ? 0 : *DataPort);
>   IoWrite8 (ICH9_APM_CNT, CommandPort == NULL ? 0 : *CommandPort);
> 
> I also tried to reduce the timeout PCD to:
> 
>   gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmApSyncTimeout|1000
> 
> However, I find CPU-2 is still missing.
> 
> Maybe it is caused by QEMU emulate AP in serial mode, not parallel mode.

Yeah, that's possible without KVM.  Do you issue a PAUSE instruction in
the spin loops?  That could help.

Paolo

> I think it might be best choice to set PcdCpuSmmSyncMode|0x1
> 
> It also helps cover a very corner case in SMM. J
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-03 23:51   ` Yao, Jiewen
  2016-11-04  0:09     ` Kinney, Michael D
@ 2016-11-04 15:22     ` Laszlo Ersek
  2016-11-04 15:29       ` Paolo Bonzini
  1 sibling, 1 reply; 23+ messages in thread
From: Laszlo Ersek @ 2016-11-04 15:22 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Kinney, Michael D, Paolo Bonzini, Fan, Jeff, Zeng, Star

On 11/04/16 00:51, Yao, Jiewen wrote:
> Hi Laszlo
> I appreciate your help to validate the patch for me.

Well, part of me is just plain altruistic :), but another part of me is
responsible for OVMF at Red Hat, and I'd rather find out about any
possible regressions before they are committed ;)

> Yes, I am using a windows QEMU binary. (qemu-system-x86_64.exe)
> 
> A quick look at the code.
> 
> 
> 1)      The ASSERT issue
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
> is caused by this line:
>       Status = SmmBlockingStartupThisAp (FlushTlbOnCurrentProcessor, Index, NULL);
>       ASSERT_EFI_ERROR(Status);
> 
> I am surprise to see it, because I think is an impossible case - InternalSmmStartupThisAp().
>   if (CpuIndex >= gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus ||
>       CpuIndex == gSmmCpuPrivate->SmmCoreEntryContext.CurrentlyExecutingCpu ||
>       !(*(mSmmMpSyncData->CpuData[CpuIndex].Present)) ||
>       gSmmCpuPrivate->Operation[CpuIndex] == SmmCpuRemove) {
>     return EFI_INVALID_PARAMETER;
>   }
> 
> I am using below line to start QEMU:
> "qemu-system-x86_64.exe -machine q35,smm=on,accel=tcg -smp 1 -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd --serial COM8 -boot menu=on,splash-time=7000 fat:rw:Test"
> 
> Would you mind let me know what is difference between your environment and mine?
> 
> I will add more debug info to see what happened, which of above lines triggers the error.

(I will comment on this elsewhere.)

> 2)      The unstable case is a headache.
> According to "RIP=000000000009f0fd", it seems outside of SMM right?

So, this address can be correlated with the final part of the OVMF debug
log. The OVMF debug log says (as I wrote),

> Transfer to 16bit OS waking vector - 9A1D0

and the RIP in the failure info is 9f0fd -- the difference is just
0x4F2D (20269) bytes. So, I must think, something blows up in the resume
vector of the Linux kernel that I used as guest for testing.

(Also, you can confirm we are outside of SMM from the string "SMM=0" in
the register dump.)

> What does this *KVM internal error. Suberror: 1* mean?

The key message is "emulation failure" -- it means that the processor
exits to the hypervisor (KVM) because it finds some code that it cannot
execute in guest mode natively, so the hypervisor needs to emulate it.
And, this emulation fails. The reasons can be:
- the code is valid, but KVM lacks the emulation code for it,
- the code is actually garbage (not code) -- there was some corruption
in the guest (the location used to contain code but it was corrupted, or
the guest jumped to non-code data).

Usually the register dump contains a short hexadecimal snippet from the
instruction stream (near Code=...), pinpointing the byte that caused the
problem. However, in this case, all we have is question marks, and this
is the very first time I see those. That's why I CC'd Paolo and Radim :)

Thanks
Laszlo


> 
> Do you have any clue on what happened?
> 
> Thank you
> Yao Jiewen
> 
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Friday, November 4, 2016 5:43 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Tian, Feng <feng.tian@intel.com>; Radim Krčmář <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.
> 
> On 11/03/16 07:53, Jiewen Yao wrote:
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2  X64
>> 4) EDKII OVMF IA32 and IA32X64.
> 
> Did you use a Windows QEMU binary for the OVMF tests?
> 
> Please find my test results below, all done on KVM
> (3.10.0-514.el7.x86_64, if anyone is curious :)).
> 
>>
>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>>
>> Jiewen Yao (6):
>>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>>   QuarkPlatformPkg/dsc: enable Smm paging protection.
> 
> Legend:
> 
> - "untested" means the test was not executed because the same test
>   failed or proved unreliable in a less demanding configuration already,
> 
> - "n/a" means a setting or test case was impossible,
> 
> - "fail" and "unreliable" (lower case) are outside the scope of this
>   series; they either capture the pre-series status, or are expected
>   even with the series applied due to the pre-series status,
> 
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>   series.
> 
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
> 
>    series  OVMF                                                              VCPU     boot       S3 resume
>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>  1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
>  2 no      Ia32     255                              n/a                     52x2x2   pass       untested
>  3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
>  4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
>  5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
>  6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
>  7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
>  8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
>  9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
> 10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
> 11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
> 12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
> 13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
> 14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
> 15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
> 16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
> 17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
> 18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested
> 
> Notes for the baseline tests (1-6):
> 
> * test 3: boot unreliable due to out-of-SMRAM (depends on what we do
>   during boot)
> 
> * test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
>   wakeup is quick, works okay
> 
> * test 6: boot fail: out of SMRAM (expected)
> 
> Notes for the tests with the series applied:
> 
> * test 7: normal boot is regressed even with
>   PcdCpuSmmStaticPageTable=FALSE, relative to test 1:
> 
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> Patch page table start ...
>> Patch page table done!
>> MemoryAttributesTable - NULL
>> SetPageTableAttributes
>> Start...
>> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>>
>> ASSERT_EFI_ERROR (Status = Invalid Parameter)
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
> 
>   Note that the processor model used in this test has no support for NX.
> 
> * test 8: results identical to those of test 7
> 
> * test 13: S3 resume is unreliable (regression relative to test 4). End
>   of OVMF log (Fedora guest):
> 
>> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
>> S3BootScriptDone - Success
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 9A1D0
> 
>   QEMU log:
> 
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
>> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
>> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> 
>   This happened 1 time out of the 2 times I ran the test.
> 
> * test 14: boot regression relative to test 4. The symptoms are
>   identical to those seen in test 7, that is:
> 
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> Patch page table start ...
>> Patch page table done!
>> MemoryAttributesTable - NULL
>> SetPageTableAttributes
>> Start...
>> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>>
>> ASSERT_EFI_ERROR (Status = Invalid Parameter)
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
> 
> 
> Summary:
> 
> - this series seems to break the boot with the Ia32 build of OVMF,
>   regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),
> 
> - it seems to break the boot with the Ia32X64 build of OVMF, with
>   PcdCpuSmmStaticPageTable=TRUE (case 14),
> 
> - for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
>   works, but S3 becomes unreliable (case 13),
> 
> - because there was no successful boot with
>   PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
>   increased SMRAM demand would impact the maximum count of bootable
>   VCPUs. (For this test, I should likely increase the guest-phys address
>   width from 36 bits anyway.)
> 
> Thanks
> Laszlo
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  9:35   ` Yao, Jiewen
@ 2016-11-04 15:23     ` Laszlo Ersek
  0 siblings, 0 replies; 23+ messages in thread
From: Laszlo Ersek @ 2016-11-04 15:23 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Kinney, Michael D, Paolo Bonzini, Fan, Jeff, Zeng, Star

On 11/04/16 10:35, Yao, Jiewen wrote:
> Hi Laszlo
> I just send out V2 patch.
> 
> The new update resolved OVMF ASSERT issue, and added more information in commit message.

Thanks! I'll try to look at v2 (with some more testing) next week.

> 
> However, there is no update for OVMF stability issue. I have no much idea on what happened.
> 
> If you want to try V2 with more tests, that is great.
> But if you want to resolve S3 stability issue at first, I am also OK.
> 
> At same time, if you have any clue on the S3 stability issue, please let me know.

Right... I hope I can ask Paolo and Radim to help us with that.

Thanks
Laszlo

> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Friday, November 4, 2016 5:43 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Tian, Feng <feng.tian@intel.com>; Radim Krčmář <rkrcmar@redhat.com>; edk2-devel@ml01.01.org; Kinney, Michael D <michael.d.kinney@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Fan, Jeff <jeff.fan@intel.com>; Zeng, Star <star.zeng@intel.com>
> Subject: Re: [edk2] [PATCH 0/6] Enable SMM page level protection.
> 
> On 11/03/16 07:53, Jiewen Yao wrote:
>> This series patch enables SMM page level protection.
>> Features are:
>> 1) PiSmmCore reports SMM PE image code/data information
>> in EdkiiPiSmmMemoryAttributeTable, if the SMM image is page aligned.
>> 2) PiSmmCpu consumes EdkiiPiSmmMemoryAttributeTable
>> and set XD for data page and RO for code page.
>> 3) PiSmmCpu enables Static Paging for X64 according to
>> PcdCpuSmmStaticPageTable. If it is true, 1G paging for above 4G
>> is used as long as it is supported.
>> 4) PiSmmCpu sets importance data structure to be read only,
>> such as Gdt, Idt, SmmEntrypoint, and PageTable itself.
>>
>> tested platform:
>> 1) Intel internal platform (X64).
>> 2) EDKII Quark IA32
>> 3) EDKII Vlv2  X64
>> 4) EDKII OVMF IA32 and IA32X64.
> 
> Did you use a Windows QEMU binary for the OVMF tests?
> 
> Please find my test results below, all done on KVM
> (3.10.0-514.el7.x86_64, if anyone is curious :)).
> 
>>
>> Cc: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
>> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
>> Cc: Star Zeng <star.zeng@intel.com<mailto:star.zeng@intel.com>>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
>> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
>>
>> Jiewen Yao (6):
>>   MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h
>>   MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid.
>>   MdeModulePkg/PiSmmCore: Add MemoryAttributes support.
>>   UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable.
>>   UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection.
>>   QuarkPlatformPkg/dsc: enable Smm paging protection.
> 
> Legend:
> 
> - "untested" means the test was not executed because the same test
>   failed or proved unreliable in a less demanding configuration already,
> 
> - "n/a" means a setting or test case was impossible,
> 
> - "fail" and "unreliable" (lower case) are outside the scope of this
>   series; they either capture the pre-series status, or are expected
>   even with the series applied due to the pre-series status,
> 
> - "FAIL" and "UNRELIABLE" mean regressions caused (or exposed) by the
>   series.
> 
> In all cases, 36 bits were used as address width in the CPU HOB (--> up
> to 64GB guest-phys address space).
> 
>    series  OVMF                                                              VCPU     boot       S3 resume
>  # applied platform PcdCpuMaxLogicalProcessorNumber PcdCpuSmmStaticPageTable topology result     result
> -- ------- -------- ------------------------------- ------------------------ -------- ------     ---------
>  1 no      Ia32      64                              n/a                     1x2x2    pass       unreliable
>  2 no      Ia32     255                              n/a                     52x2x2   pass       untested
>  3 no      Ia32     255                              n/a                     53x2x2   unreliable untested
>  4 no      Ia32X64   64                              n/a                     1x2x2    pass       pass
>  5 no      Ia32X64  255                              n/a                     52x2x2   pass       pass
>  6 no      Ia32X64  255                              n/a                     54x2x2   fail       n/a
>  7 yes     Ia32      64                              FALSE                   1x2x2    FAIL       n/a
>  8 yes     Ia32      64                              TRUE                    1x2x2    FAIL       n/a
>  9 yes     Ia32     255                              FALSE                   52x2x2   untested   untested
> 10 yes     Ia32     255                              FALSE                   53x2x2   untested   untested
> 11 yes     Ia32     255                              TRUE                    52x2x2   untested   untested
> 12 yes     Ia32     255                              TRUE                    53x2x2   untested   untested
> 13 yes     Ia32X64   64                              FALSE                   1x2x2    pass       UNRELIABLE
> 14 yes     Ia32X64   64                              TRUE                    1x2x2    FAIL       n/a
> 15 yes     Ia32X64  255                              FALSE                   52x2x2   pass       untested
> 16 yes     Ia32X64  255                              FALSE                   54x2x2   fail       untested
> 17 yes     Ia32X64  255                              TRUE                    52x2x2   untested   untested
> 18 yes     Ia32X64  255                              TRUE                    54x2x2   untested   untested
> 
> Notes for the baseline tests (1-6):
> 
> * test 3: boot unreliable due to out-of-SMRAM (depends on what we do
>   during boot)
> 
> * test 5: S3 wakeup: 1 try, suspend takes very long, CPU intensive;
>   wakeup is quick, works okay
> 
> * test 6: boot fail: out of SMRAM (expected)
> 
> Notes for the tests with the series applied:
> 
> * test 7: normal boot is regressed even with
>   PcdCpuSmmStaticPageTable=FALSE, relative to test 1:
> 
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> Patch page table start ...
>> Patch page table done!
>> MemoryAttributesTable - NULL
>> SetPageTableAttributes
>> Start...
>> ConvertPageEntryAttribute 0x7FF9A067->0x7FF9A065
>>
>> ASSERT_EFI_ERROR (Status = Invalid Parameter)
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
> 
>   Note that the processor model used in this test has no support for NX.
> 
> * test 8: results identical to those of test 7
> 
> * test 13: S3 resume is unreliable (regression relative to test 4). End
>   of OVMF log (Fedora guest):
> 
>> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE
>> S3BootScriptDone - Success
>> Call AsmDisablePaging64() to return to S3 Resume in PEI Phase
>> Install PPI: [PeiPostScriptTablePpi]
>> Install PPI: [EfiEndOfPeiSignalPpi]
>> Notify: PPI Guid: [EfiEndOfPeiSignalPpi], Peim notify entry point: 857895
>> PeiMpInitLib: CpuMpEndOfPeiCallback () invoked
>> Transfer to 16bit OS waking vector - 9A1D0
> 
>   QEMU log:
> 
>> KVM internal error. Suberror: 1
>> KVM internal error. Suberror: 1
>> emulation failure
>> emulation failure
>> RAX=0000000000000001 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede058
>> RSI=0000000000000004 RDI=000000007fede040 RBP=0000000000000000 RSP=000000007e19f000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffdbf48 RDX=000000007fede070
>> RSI=0000000000000004 RDI=000000007fede058 RBP=0000000000000000 RSP=000000007e1a7000
>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> RIP=000000000009f0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
>> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
>> GDT=     000000007f294000 00000047
>> IDT=     000000007f294048 00000fff
>> CR0=e0000011 CR2=0000000000000000 CR3=000000007ff80000 CR4=00000220
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000500
>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> 
>   This happened 1 time out of the 2 times I ran the test.
> 
> * test 14: boot regression relative to test 4. The symptoms are
>   identical to those seen in test 7, that is:
> 
>> SmmInstallProtocolInterface: [EfiSmmReadyToLockProtocol] 0
>> Patch page table start ...
>> Patch page table done!
>> MemoryAttributesTable - NULL
>> SetPageTableAttributes
>> Start...
>> ConvertPageEntryAttribute 0x7FF98067->0x7FF98065
>>
>> ASSERT_EFI_ERROR (Status = Invalid Parameter)
>> ASSERT UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c(468): !EFI_ERROR (Status)
> 
> 
> Summary:
> 
> - this series seems to break the boot with the Ia32 build of OVMF,
>   regardless of PcdCpuSmmStaticPageTable (cases 7 and 8),
> 
> - it seems to break the boot with the Ia32X64 build of OVMF, with
>   PcdCpuSmmStaticPageTable=TRUE (case 14),
> 
> - for the Ia32X64 build, with PcdCpuSmmStaticPageTable=FALSE, boot
>   works, but S3 becomes unreliable (case 13),
> 
> - because there was no successful boot with
>   PcdCpuSmmStaticPageTable=TRUE, I couldn't check if and how the
>   increased SMRAM demand would impact the maximum count of bootable
>   VCPUs. (For this test, I should likely increase the guest-phys address
>   width from 36 bits anyway.)
> 
> Thanks
> Laszlo
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04 15:22     ` Laszlo Ersek
@ 2016-11-04 15:29       ` Paolo Bonzini
  2016-11-04 17:36         ` Laszlo Ersek
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2016-11-04 15:29 UTC (permalink / raw)
  To: Laszlo Ersek, Yao, Jiewen
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Kinney, Michael D, Fan, Jeff, Zeng, Star



On 04/11/2016 16:22, Laszlo Ersek wrote:
>> > What does this *KVM internal error. Suberror: 1* mean?
> The key message is "emulation failure" -- it means that the processor
> exits to the hypervisor (KVM) because it finds some code that it cannot
> execute in guest mode natively, so the hypervisor needs to emulate it.
> And, this emulation fails. The reasons can be:
> - the code is valid, but KVM lacks the emulation code for it,
> - the code is actually garbage (not code) -- there was some corruption
> in the guest (the location used to contain code but it was corrupted, or
> the guest jumped to non-code data).
> 
> Usually the register dump contains a short hexadecimal snippet from the
> instruction stream (near Code=...), pinpointing the byte that caused the
> problem. However, in this case, all we have is question marks, and this
> is the very first time I see those. That's why I CC'd Paolo and Radim :)

The question marks usually mean that the page tables do not map a page
at that address, but I don't know offhand why KVM would fail emulation
instead of triple-faulting.

Try "info tlb" to dump the page tables (huge output of course, you may
want to use the GTK+ backend which has scrollable consoles).

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  1:15       ` Yao, Jiewen
@ 2016-11-04 16:15         ` Laszlo Ersek
  0 siblings, 0 replies; 23+ messages in thread
From: Laszlo Ersek @ 2016-11-04 16:15 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Kinney, Michael D, Tian, Feng, Radim Kr?má?,
	edk2-devel@ml01.01.org, Paolo Bonzini, Fan, Jeff, Zeng, Star

On 11/04/16 02:15, Yao, Jiewen wrote:
> Thank you, Mike.
> Yes, I reproduced the issue and found out what is wrong. Here is detail:
> 
> It seems OVMF never puts AP in SMM mode, even *before my patch series*.
> I rollback all my update and use e9d093.
> 
> I add some debug message in PerformRemainingTasks(), I found below:
> *(mSmmMpSyncData->CpuData[0].Present) = 1
> *(mSmmMpSyncData->CpuData[1].Present) = 0
> *(mSmmMpSyncData->CpuData[2].Present) = 0
> *(mSmmMpSyncData->CpuData[3].Present) = 0
> *(mSmmMpSyncData->CpuData[4].Present) = 0
> *(mSmmMpSyncData->CpuData[5].Present) = 0
> *(mSmmMpSyncData->CpuData[6].Present) = 0
> *(mSmmMpSyncData->CpuData[7].Present) = 0
> It is never exposed before, because there is no code to call SmmStartupAp.
> 
> I assume all APs should be present, so I added ASSERT.
> Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.
> 
> Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
> Anyone can confirm that?

Thank you very much for narrowing this down.

It is a QEMU platform peculiarity that raising an SMI, with
EFI_SMM_CONTROL2_PROTOCOL.Trigger(), puts *only* the processor that is
executing Trigger() into SMM. The other processors are not affected.

In PiSmmCpuDxeSmm, there is BSP code that waits for the APs to show up
in SMM, for a specific amount of time. (See PcdCpuSmmSyncMode and
PcdCpuSmmApSyncTimeout.) If the APs don't show up in the allotted time,
due to the original SMI, then the BSP pulls them in with directed LAPIC
writes, if I remember correctly.

(We discussed this at great length when we were enabling SMM in OVMF --
I think I may have forgotten most of the details by now.)

On the bare metal, the LAPIC writes are just a fallback, and the
timeouts are not expected in practice. On QEMU however, those timeouts
are the norm (because only the processor calling Trigger() enters SMM).
This caused very long waits when using the variable services (with SMM)
from the runtime OS, even if all of the related code (OS and firmware
alike) were executed only on the BSP (VCPU#0).

In order to mitigate this, we set PcdCpuSmmSyncMode to 1 (relaxed), in
the following patch:

commit 9b1e378811ff730fe4e5bbbba294ea74ef76a915
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Nov 30 18:46:46 2015 +0000

    OvmfPkg: use relaxed AP SMM synchronization mode

    Port 0xb2 on QEMU only sends an SMI to the currently executing processor.
    The SMI handler, however, and in particular SmmWaitForApArrival, currently
    expects that SmmControl2DxeTrigger triggers an SMI IPI on all processors
    rather than just the BSP.  Thus all SMM invocations loop for a second (the
    default value of PcdCpuSmmApSyncTimeout) before SmmWaitForApArrival sends
    another SMI IPI to the APs.

    With the default SmmCpuFeaturesLib, 32-bit machines must broadcast SMIs
    because 32-bit machines must reset the MTRRs on each entry to system
    management modes (they have no SMRRs).  However, our virtual platform
    does not have problems with cacheability of SMRAM, so we can use "directed"
    SMIs instead.  To do this, just set gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode
    to 1 (aka SmmCpuSyncModeRelaxedAp).  This fixes SMM on multiprocessor virtual
    machines.

    Contributed-under: TianoCore Contribution Agreement 1.0
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>

    git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19058 6f19259b-4bc3-4df7-8a09-765794883524

Another problem remained though: if the SMI was raised by an AP in the
first place -- which is possible when the runtime OS calls the variable
services --, then the code still waited for the sync timeout to elapse.
This is very-very easy to reproduce with the following command in a
Linux guest:

  taskset -c 1 efibootmgr

The "efibootmgr" command issues a big bunch of variable service calls
(with the help of the guest kernel, of course), and the "taskset -c 1"
command ensures that all of the user space, kernel, and firmware code
involved in "efibootmgr" is executed exclusively on VCPU#1 (that is, an
AP).

(Side remark: invoking efibootmgr without any options causes it to
simply print the BootCurrent, Timeout, BootOrder and Boot#### variables.
Probably BootNext too, if it exists.)

Initially, this command would take up to 35-40 *seconds* to complete on
my laptop, even with Paolo's 9b1e378811ff in place. Count

- 1 second busy-looping per SMI raised,

- possibly several SMIs per variable service call (I don't know how many
  times the unprivileged part of the variable driver calls into the
  privileged part, per external service call),

- and several variable service calls issued by efibootmgr.

This was clearly intolerable, especially because in a multiprocessor
linux guest, the efibootmgr program (and the initial variable service
calls issued by the kernel during boot) can be scheduled to any VCPU --
the "taskset -c 1" command only synthesizes the worst case.

For mitigating *that*, we set PcdCpuSmmApSyncTimeout to 1/10th of a
second (from its original 1 second value), in:

commit bb0f18b0bce682f6780af997de45bcef146c11ca
Author: Laszlo Ersek <lersek@redhat.com>
Date:   Mon Nov 30 18:46:50 2015 +0000

    OvmfPkg: any AP in SMM should not wait for the BSP for more than 100 ms

    This patch complements the previous one, "OvmfPkg: use relaxed AP SMM
    synchronization mode". While that patch focuses on the case when the SMI
    is raised synchronously by the BSP, on the BSP:

      BSPHandler()             [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c]
        SmmWaitForApArrival()  [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c]
          IsSyncTimerTimeout() [UefiCpuPkg/PiSmmCpuDxeSmm/SyncTimer.c]

    this patch concerns itself with the case when it is one of the APs that
    raises (and sees delivered) the synchronous SMI:

      APHandler()            [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c]
        IsSyncTimerTimeout() [UefiCpuPkg/PiSmmCpuDxeSmm/SyncTimer.c]

    Namely, in APHandler() the AP waits for the BSP to enter SMM regardless of
    PcdCpuSmmSyncMode, for PcdCpuSmmApSyncTimeout microseconds (the default
    value is 1 second). If the BSP doesn't show up in SMM within that
    interval, then the AP brings it in with a directed SMI, and waits for the
    BSP again for PcdCpuSmmApSyncTimeout microseconds.

    Although during boot services, SmmControl2DxeTrigger() is only called by
    the BSP, at runtime the OS can invoke runtime services from an AP (it can
    even be forced with "taskset -c 1 efibootmgr"). Because on QEMU
    SmmControl2DxeTrigger() only raises the SMI for the calling processor (BSP
    and AP alike), the first interval above times out invariably in such cases
    -- the BSP never shows up before the AP calls it in.

    In order to mitigate the performance penalty, decrease
    PcdCpuSmmApSyncTimeout to one tenth of its default value: 100 ms. (For
    comparison, Vlv2TbltDevicePkg sets 1 ms.)

    NOTE: once QEMU becomes capable of synchronous broadcast SMIs, this patch
    and the previous one ("OvmfPkg: use relaxed AP SMM synchronization mode")
    should be reverted, and SmmControl2DxeTrigger() should be adjusted
    instead.

    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Jordan Justen <jordan.l.justen@intel.com>
    Cc: Michael Kinney <michael.d.kinney@intel.com>
    Contributed-under: TianoCore Contribution Agreement 1.0
    Signed-off-by: Laszlo Ersek <lersek@redhat.com>
    Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>

    git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19059 6f19259b-4bc3-4df7-8a09-765794883524

With this patch in place, the "taskset -c 1 efibootmgr" command now
"only" takes ~4 seconds when executed in a guest running on my laptop.
(Again, for comparison, Vlv2TbltDevicePkg sets a hundred times stricter
timeout.)

This is the status we have to this day. The NOTE at the end of the
commit message, about QEMU gaining the ability to raise synchronous
broadcast SMIs, has not realized yet.

Prototyping that change for QEMU wasn't hard actually; I did it and
posted the patch for review:

http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html
http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html

See especially the second link for a (subjective) summary of the
opinions around the change. There was a lot of discussion on edk2-devel
too (I don't think I'll try to look up the messages now.)

Ultimately, we couldn't agree on a solution, so we continue to have only
unicast SMIs available for Trigger().

Regarding your question whether this is a bug or feature in QEMU... hard
to say. As long as only SeaBIOS was using QEMU's SMI, I guess it
qualified as a platform feature. For edk2's purposes, we could call it a
bug. The behavior could be changed in QEMU without regressing SeaBIOS's
usage pattern (for example by setting a specific APM_STS value from OVMF
that is known not to be used by SeaBIOS, before writing to APM_CNT), but
we couldn't agree on a solution.

I do see that I ultimately withdrew my QEMU patch:

http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html

so maybe Paolo or someone else convinced me that the approach was wrong.
I don't remember.

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04  3:14     ` Yao, Jiewen
@ 2016-11-04 16:28       ` Laszlo Ersek
  0 siblings, 0 replies; 23+ messages in thread
From: Laszlo Ersek @ 2016-11-04 16:28 UTC (permalink / raw)
  To: Yao, Jiewen, Kinney, Michael D
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Paolo Bonzini, Fan, Jeff, Zeng, Star

On 11/04/16 04:14, Yao, Jiewen wrote:
> Mike
> 
> You are right.
> 
> After I turn on this, I saw all APs.
> 
>  
> 
> However, the system becomes extremely slow. Intolerable.
> 
>  
> 
> I suggest to add more description around below on why it is set to 0x01
> by default, and what is the impact if it becomes 0x00.

Well, the commit messages document the settings in detail -- you can
easily find them by running "git blame" on the DSC file. (I assume
that's how Mike found the commits too.)

(If someone submitted a DSC patch with comments, I wouldn't NACK it, of
course.)

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04 13:50             ` Paolo Bonzini
@ 2016-11-04 16:34               ` Laszlo Ersek
  0 siblings, 0 replies; 23+ messages in thread
From: Laszlo Ersek @ 2016-11-04 16:34 UTC (permalink / raw)
  To: Paolo Bonzini, Yao, Jiewen, Kinney, Michael D
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org, Fan, Jeff,
	Zeng, Star

On 11/04/16 14:50, Paolo Bonzini wrote:
> 
> 
> On 04/11/2016 14:28, Yao, Jiewen wrote:
>> I tried below way. But it does not help too much. It still takes more
>> than 1 minutes to boot with SMP=8.
>>
>>   SendSmiIpiAllExcludingSelf ();
>>   IoWrite8 (ICH9_APM_STS, DataPort    == NULL ? 0 : *DataPort);
>>   IoWrite8 (ICH9_APM_CNT, CommandPort == NULL ? 0 : *CommandPort);
>>
>> I also tried to reduce the timeout PCD to:
>>
>>   gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmApSyncTimeout|1000
>>
>> However, I find CPU-2 is still missing.
>>
>> Maybe it is caused by QEMU emulate AP in serial mode, not parallel mode.
> 
> Yeah, that's possible without KVM.  Do you issue a PAUSE instruction in
> the spin loops?  That could help.

There's a BaseLib helper function for that BTW:

/**
  Requests CPU to pause for a short period of time.

  Requests CPU to pause for a short period of time. Typically used in MP
  systems to prevent memory starvation while waiting for a spin lock.

**/
VOID
EFIAPI
CpuPause (
  VOID
  );


The implementation in "MdePkg/Library/BaseLib/X64/GccInline.c" is

  __asm__ __volatile__ ("pause");

Thanks
Laszlo


>> I think it might be best choice to set PcdCpuSmmSyncMode|0x1
>>
>> It also helps cover a very corner case in SMM. J
>>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Enable SMM page level protection.
  2016-11-04 15:29       ` Paolo Bonzini
@ 2016-11-04 17:36         ` Laszlo Ersek
  0 siblings, 0 replies; 23+ messages in thread
From: Laszlo Ersek @ 2016-11-04 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Yao, Jiewen
  Cc: Tian, Feng, Radim Kr?má?, edk2-devel@ml01.01.org,
	Kinney, Michael D, Fan, Jeff, Zeng, Star

On 11/04/16 16:29, Paolo Bonzini wrote:
> 
> 
> On 04/11/2016 16:22, Laszlo Ersek wrote:
>>>> What does this *KVM internal error. Suberror: 1* mean?
>> The key message is "emulation failure" -- it means that the processor
>> exits to the hypervisor (KVM) because it finds some code that it cannot
>> execute in guest mode natively, so the hypervisor needs to emulate it.
>> And, this emulation fails. The reasons can be:
>> - the code is valid, but KVM lacks the emulation code for it,
>> - the code is actually garbage (not code) -- there was some corruption
>> in the guest (the location used to contain code but it was corrupted, or
>> the guest jumped to non-code data).
>>
>> Usually the register dump contains a short hexadecimal snippet from the
>> instruction stream (near Code=...), pinpointing the byte that caused the
>> problem. However, in this case, all we have is question marks, and this
>> is the very first time I see those. That's why I CC'd Paolo and Radim :)
> 
> The question marks usually mean that the page tables do not map a page
> at that address, but I don't know offhand why KVM would fail emulation
> instead of triple-faulting.
> 
> Try "info tlb" to dump the page tables (huge output of course, you may
> want to use the GTK+ backend which has scrollable consoles).

Thanks, I'll try "info tlb" later. (I generally use virsh
qemu-monitor-command --hmp, whose output is easy to redirect to a file.)

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-11-04 17:36 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-04  2:49 [PATCH 0/6] Enable SMM page level protection Kinney, Michael D
2016-11-04  2:55 ` Yao, Jiewen
2016-11-04  3:05   ` Kinney, Michael D
2016-11-04  3:12     ` Kinney, Michael D
2016-11-04  3:20       ` Yao, Jiewen
2016-11-04 11:34         ` Paolo Bonzini
2016-11-04 13:28           ` Yao, Jiewen
2016-11-04 13:50             ` Paolo Bonzini
2016-11-04 16:34               ` Laszlo Ersek
2016-11-04  3:14     ` Yao, Jiewen
2016-11-04 16:28       ` Laszlo Ersek
  -- strict thread matches above, loose matches on Subject: below --
2016-11-03  6:53 Jiewen Yao
2016-11-03  6:55 ` Yao, Jiewen
2016-11-03 21:43 ` Laszlo Ersek
2016-11-03 23:51   ` Yao, Jiewen
2016-11-04  0:09     ` Kinney, Michael D
2016-11-04  1:15       ` Yao, Jiewen
2016-11-04 16:15         ` Laszlo Ersek
2016-11-04 15:22     ` Laszlo Ersek
2016-11-04 15:29       ` Paolo Bonzini
2016-11-04 17:36         ` Laszlo Ersek
2016-11-04  9:35   ` Yao, Jiewen
2016-11-04 15:23     ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox