From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 8650181E56 for ; Tue, 22 Nov 2016 13:17:20 -0800 (PST) Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 131A764CD; Tue, 22 Nov 2016 21:17:20 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-82.phx2.redhat.com [10.3.116.82]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAMLHIRk012818; Tue, 22 Nov 2016 16:17:18 -0500 To: Jiewen Yao , edk2-devel@ml01.01.org References: <1479815264-26252-1-git-send-email-jiewen.yao@intel.com> Cc: Michael D Kinney , Jeff Fan From: Laszlo Ersek Message-ID: <5b1cc5b2-6dde-e315-4387-11ad8e19776c@redhat.com> Date: Tue, 22 Nov 2016 22:17:17 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <1479815264-26252-1-git-send-email-jiewen.yao@intel.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 22 Nov 2016 21:17:20 +0000 (UTC) Subject: Re: [PATCH] UefiCpuPkg/PiSmmCpu: Correct exception message. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Nov 2016 21:17:20 -0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 11/22/16 12:47, Jiewen Yao wrote: > This patch fixes the first part of > https://bugzilla.tianocore.org/show_bug.cgi?id=242 > > Previously, when SMM exception happens, "stack overflow" is misreported. > This patch checked the PF address to see it is stack overflow, or > it is caused by SMM page protection. > > Cc: Laszlo Ersek > Cc: Jeff Fan > Cc: Michael D Kinney > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Jiewen Yao > --- > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c | 28 +++++++++++++++++--- > UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h | 9 +++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c | 27 ++++++++++++++++--- > 3 files changed, 57 insertions(+), 7 deletions(-) > > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c > index 5033bc5..feca142 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/PageTbl.c > @@ -91,6 +91,8 @@ SmiPFHandler ( > ) > { > UINTN PFAddress; > + UINTN GuardPageAddress; > + UINTN CpuIndex; > > ASSERT (InterruptType == EXCEPT_IA32_PAGE_FAULT); > > @@ -98,10 +100,30 @@ SmiPFHandler ( > > PFAddress = AsmReadCr2 (); > > - if ((FeaturePcdGet (PcdCpuSmmStackGuard)) && > - (PFAddress >= mCpuHotPlugData.SmrrBase) && > + // > + // If a page fault occurs in SMRAM range, it might be in a SMM stack guard page, > + // or SMM page protection violation. > + // > + if ((PFAddress >= mCpuHotPlugData.SmrrBase) && > (PFAddress < (mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize))) { > - DEBUG ((DEBUG_ERROR, "SMM stack overflow!\n")); > + CpuIndex = GetCpuIndex (); > + GuardPageAddress = (mSmmStackArrayBase + EFI_PAGE_SIZE + CpuIndex * mSmmStackSize); > + if ((FeaturePcdGet (PcdCpuSmmStackGuard)) && > + (PFAddress >= GuardPageAddress) && > + (PFAddress < (GuardPageAddress + EFI_PAGE_SIZE))) { > + DEBUG ((DEBUG_ERROR, "SMM stack overflow!\n")); > + } > + if ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0) { > + DEBUG ((DEBUG_ERROR, "SMM exception at execution (0x%lx)\n", PFAddress)); > + DEBUG_CODE ( > + DumpModuleInfoByIp (*(UINTN *)(UINTN)SystemContext.SystemContextIa32->Esp); > + ); > + } else { > + DEBUG ((DEBUG_ERROR, "SMM exception at write (0x%lx)\n", PFAddress)); > + DEBUG_CODE ( > + DumpModuleInfoByIp ((UINTN)SystemContext.SystemContextIa32->Eip); > + ); > + } > CpuDeadLoop (); > } > > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h > index b6fb5cf..04a3dfb 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/SmmProfile.h > @@ -105,6 +105,15 @@ InitPaging ( > VOID > ); > > +/** > + Get CPU Index from APIC ID. > + > +**/ > +UINTN > +GetCpuIndex ( > + VOID > + ); > + > // > // The flag indicates if execute-disable is supported by processor. > // > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c > index 531e188..ec8eab7 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c > @@ -804,6 +804,8 @@ SmiPFHandler ( > ) > { > UINTN PFAddress; > + UINTN GuardPageAddress; > + UINTN CpuIndex; > > ASSERT (InterruptType == EXCEPT_IA32_PAGE_FAULT); > > @@ -817,12 +819,29 @@ SmiPFHandler ( > } > > // > - // If a page fault occurs in SMRAM range, it should be in a SMM stack guard page. > + // If a page fault occurs in SMRAM range, it might be in a SMM stack guard page, > + // or SMM page protection violation. > // > - if ((FeaturePcdGet (PcdCpuSmmStackGuard)) && > - (PFAddress >= mCpuHotPlugData.SmrrBase) && > + if ((PFAddress >= mCpuHotPlugData.SmrrBase) && > (PFAddress < (mCpuHotPlugData.SmrrBase + mCpuHotPlugData.SmrrSize))) { > - DEBUG ((DEBUG_ERROR, "SMM stack overflow!\n")); > + CpuIndex = GetCpuIndex (); > + GuardPageAddress = (mSmmStackArrayBase + EFI_PAGE_SIZE + CpuIndex * mSmmStackSize); > + if ((FeaturePcdGet (PcdCpuSmmStackGuard)) && > + (PFAddress >= GuardPageAddress) && > + (PFAddress < (GuardPageAddress + EFI_PAGE_SIZE))) { > + DEBUG ((DEBUG_ERROR, "SMM stack overflow!\n")); > + } > + if ((SystemContext.SystemContextX64->ExceptionData & IA32_PF_EC_ID) != 0) { > + DEBUG ((DEBUG_ERROR, "SMM exception at execution (0x%lx)\n", PFAddress)); > + DEBUG_CODE ( > + DumpModuleInfoByIp (*(UINTN *)(UINTN)SystemContext.SystemContextX64->Rsp); > + ); > + } else { > + DEBUG ((DEBUG_ERROR, "SMM exception at write (0x%lx)\n", PFAddress)); > + DEBUG_CODE ( > + DumpModuleInfoByIp ((UINTN)SystemContext.SystemContextX64->Rip); > + ); > + } > CpuDeadLoop (); > } > > (1) The "PFAddress" variable is UINTN in both variants. Printing UINTN with %lx is incorrect in the Ia32 case, because %lx takes an UINT64. I suggest the following pattern for printing UINTN values portably: DEBUG ((level, "%lx", (UINT64)Value)); That is, always use %lx and add an explicit cast. The cast is a no-op on X64, and does the right conversion on Ia32. The %lx conversion specification matches the result in both cases. (2) I tested the X64 stack overflow branch as follows: I temporarily reverted (on top of your present patch) commit 0d0c245dfb147 ("OvmfPkg: set SMM stack size to 16KB"), and then ran the certificate enrollment application that originally triggered the stack overflow. This is the debug output I got: > SMM stack overflow! > SMM exception at write (0x7FF9CFEC) > It is invoked from the instruction before IP(0x7FFE4CA3) in module (.../Build/Ovmf3264/NOOPT_GCC48/X64/MdeModulePkg/Core/PiSmmCore/PiSmmCore/DEBUG/PiSmmCore.dll) Shouldn't you change the IA32_PF_EC_ID check into an "else if"? Because I think once you determine the stack overflow, we shouldn't look for any other kind of exception. (3) I tested the Ia32 execution / write fault branch as follows: I temporarily reverted (on top of your present patch) commit 750ec4cabd07 ("UefiCpuPkg/PiSmmCpu: Check XdSupport before set NX."). Then, under the circumstances I reported in , I get: > ConvertPageEntryAttribute 0x7FEA4067->0x800000007FEA4067 > SMM exception at write (0x7FEA4890) > It is invoked from the instruction before IP(0x7FFB879A) in module (.../Build/OvmfIa32/NOOPT_GCC48/IA32/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm/DEBUG/PiSmmCpuDxeSmm.dll) Here the page fault is explained by the fact that we set the unsupported NX bit in the PTE that maps the page, and then we try to read from the page (not write to it). If I remember correctly at least. Is it possible to distinguish "read" from "write" in the fault symptoms? If it is, then I suggest to customize the error message. If it is not possible, then I suggest to replace the word "write" with "access". Thanks! Laszlo