From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.115; helo=mga14.intel.com; envelope-from=jiewen.yao@intel.com; receiver=edk2-devel@lists.01.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id B57262215BD94 for ; Tue, 30 Jan 2018 21:51:04 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Jan 2018 21:56:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,438,1511856000"; d="scan'208";a="30818260" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by orsmga002.jf.intel.com with ESMTP; 30 Jan 2018 21:56:39 -0800 Received: from fmsmsx115.amr.corp.intel.com (10.18.116.19) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 30 Jan 2018 21:56:39 -0800 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx115.amr.corp.intel.com (10.18.116.19) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 30 Jan 2018 21:56:38 -0800 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.124]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.116]) with mapi id 14.03.0319.002; Wed, 31 Jan 2018 13:56:37 +0800 From: "Yao, Jiewen" To: Paulo Alcantara , "edk2-devel@lists.01.org" CC: Rick Bramley , "Dong, Eric" , Kimon Berlin , Andrew Fish , "Diego Medaglia" , Laszlo Ersek Thread-Topic: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling Thread-Index: AQHTjZc4uDNA9nHm1kuOPH2nB41jpqN3+5WwgBJv7YCAAyfKYA== Date: Wed, 31 Jan 2018 05:56:36 +0000 Message-ID: <74D8A39837DF1E4DA445A8C0B3885C503AAA0A80@shsmsx102.ccr.corp.intel.com> References: <74D8A39837DF1E4DA445A8C0B3885C503AA8057C@shsmsx102.ccr.corp.intel.com> <9b5d7218-2d30-1c06-451c-ec9d86161aa6@paulo.ac> In-Reply-To: <9b5d7218-2d30-1c06-451c-ec9d86161aa6@paulo.ac> Accept-Language: zh-CN, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZDZiNjgyMmYtNGZkMS00ZGU4LTllNGEtYTQ5YzI0NzA4MjY4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjIuNS4xOCIsIlRydXN0ZWRMYWJlbEhhc2giOiJNZFZjZE44OWVEeDkrT3lnNEpkcERaT1wvZkhueUk4emFSTlwveUlsd1FybWxWZmd0a0FETEwyTWNkWlgyOU1yUFMifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Subject: Re: [RFC v5 0/8] Stack trace support in X64 exception handling X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Jan 2018 05:51:05 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thanks Paulo. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D OVMF IA32: DXE worked, but SMM not. Since the page fault is only occurring in IA32 with no paging enabled=20 (default case), I suspect that when we don't have paging, we are unable=20 to successfully validate the memory address when it's i.e. outside SMRAM=20 - that is, we don't know when to stop unwinding the stack. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D For IA32 SMM, I am a little confused. We unconditionally setup page table for SMM, no matter it is IA32 or X64. If you find a SMM driver running in a page-disable environment, it means, t= he SMM CORE load the driver in SMRAM, but the SMM CPU driver has not rebase= d the CPU yet. SMM driver is still using the PageTable/GDT/IDT setup by DXE CPU driver, no= t SMM CPU driver. Would you please double confirm what you have observed? You can just check the boot log file to see if PiSmmCpu driver has run or n= ot. Thank you Yao Jiewen > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Pa= ulo > Alcantara > Sent: Monday, January 29, 2018 9:38 PM > To: Yao, Jiewen ; edk2-devel@lists.01.org > Cc: Rick Bramley ; Dong, Eric > ; Kimon Berlin ; Andrew Fish > ; Diego Medaglia ; Laszlo Ersek > > Subject: Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception han= dling >=20 > Hi Jiewen, >=20 > On 1/17/2018 10:57 AM, Yao, Jiewen wrote: > > Thanks Paulo. > > > > The series looks really good. I give it thumb up. :-) > > > > The 8 patches keep updating 4 files, so I squash them when I review. Th= e > comment below is for the final code, not for a specific patch. > > > > 1) CpuExceptionCommon.c: IsLinearAddressRangeValid(). > > // > > // Check for valid input parameters > > // > > if (LinearAddressStart =3D=3D 0 || LinearAddressEnd =3D=3D 0 || > > LinearAddressStart > LinearAddressEnd) { > > return FALSE; > > } > > > > I think LinearAddressStart is a valid case. In BIOS, we do update IVT i= n 0 > address when CSM is enabled. > > I do not think we need exclude it here. If we enabled ZeroPointer prote= ction, it > can be excluded in page table parsing. > > > > 2) I found below logic appears in 2 functions - DumpImageModuleNames() = and > DumpStacktrace(). Is that possible to merge them? > > We can calculate in DumpImageAndCpuContent() and use Eip as parameter. > Just in case there is 3rd function need use EIP, it won't miss the calcul= ation. > > (It is a bug fix we did recently.) > > > > // > > // Set current EIP address > > // > > if ((ExceptionType =3D=3D EXCEPT_IA32_PAGE_FAULT) && > > ((SystemContext.SystemContextIa32->ExceptionData & > IA32_PF_EC_ID) !=3D 0)) { > > // > > // The EIP in SystemContext could not be used > > // if it is page fault with I/D set. > > // > > Eip =3D *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp; > > } else { > > Eip =3D SystemContext.SystemContextIa32->Eip; > > } > > > > 3) I am a little surprised on PeCoffSearchImageBase() issue. > > > > We have 4 PeCoffSearchImageBase() call in each arch. > DumpImageModuleNames() calls twice and DumpStacktrace() calls twice. > > Do you know which specific one triggers the zero address #PF issue? > > > > > C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch > ExceptionHandler.c(547): ImageBase =3D PeCoffSearchImageBase (Eip); > > > C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch > ExceptionHandler.c(613): ImageBase =3D PeCoffSearchImageBase (Eip); > > > C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch > ExceptionHandler.c(682): ImageBase =3D PeCoffSearchImageBase (Eip); > > > C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch > ExceptionHandler.c(741): ImageBase =3D PeCoffSearchImageBase (Eip); > > > C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch > ExceptionHandler.c(540): ImageBase =3D PeCoffSearchImageBase (Rip); > > > C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch > ExceptionHandler.c(613): ImageBase =3D PeCoffSearchImageBase (Rip); > > > C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch > ExceptionHandler.c(710): ImageBase =3D PeCoffSearchImageBase (Rip); > > > C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch > ExceptionHandler.c(779): ImageBase =3D PeCoffSearchImageBase (Rip); > > > > The EIP from SystemContext seems good. I assume there is not a problem = here. > > > > if ((ExceptionType =3D=3D EXCEPT_IA32_PAGE_FAULT) && > > ((SystemContext.SystemContextIa32->ExceptionData & > IA32_PF_EC_ID) !=3D 0)) { > > // > > // The EIP in SystemContext could not be used > > // if it is page fault with I/D set. > > // > > Eip =3D *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp; > > } else { > > Eip =3D SystemContext.SystemContextIa32->Eip; > > } > > > > // > > // Get initial PE/COFF image base address from current EIP > > // > > ImageBase =3D PeCoffSearchImageBase (Eip); > > > > But the EIP from stack frame is unknown and risky. Especially for the c= ode in > mode-switch, such as PEI->DXE, UEFI->CSM, AP wakeup->AP C code, SMM > entrypoint->SMM C code. > > Should we add a check for EIP here? > > > > // > > // Set EIP with return address from current stack frame > > // > > Eip =3D *(UINT32 *)((UINTN)Ebp + 4); > > > > // > > // If EIP is zero, then stop unwinding the stack > > // > > if (Eip =3D=3D 0) { > > break; > > } > > > > // > > // Search for the respective PE/COFF image based on EIP > > // > > ImageBase =3D PeCoffSearchImageBase (Eip); > > > > If you can help us do some more investigation on the root-cause and nar= row > down the issue, I will appreciate that. > > > > We can decide how to fix once it is root-caused. > > > > Maybe we just do a simple IsLogicalAddressValid () check before we call > PeCoffSearchImageBase(). :-) >=20 > I was able to work a little bit on stack trace support yesterday, and > here's what I came up with by following your suggestions: >=20 > 1. Centralized the calculation of initial EIP in DumpImageAndCpuContent() >=20 > 2. LinearAddressStart 0 is now a valid case >=20 > 3. Validate all return addresses (EIP/RIP) from every frame pointer. >=20 > Additionally, I run a few quick tests by generating exceptions from DXE > and SMM and here's my result: >=20 > OVMF X64: DXE & SMM worked. > OVMF X64 IA32: DXE & SMM worked. > OVMF IA32: DXE worked, but SMM not. >=20 > SMM was failing in both X64 and IA32 - however when I started validating > the return addresses from the frame pointers and, when we find an > invalid return address (e.g. last valid frame), we stop the trace. With > that, SMM is now working in X64. >=20 > Since the page fault is only occurring in IA32 with no paging enabled > (default case), I suspect that when we don't have paging, we are unable > to successfully validate the memory address when it's i.e. outside SMRAM > - that is, we don't know when to stop unwinding the stack. >=20 > Any ideas on what might be causing that? >=20 > Other than that, I'll try to do some more tests this weekend and come > back with the results. (That's why I didn't send out the v6 yet) >=20 > Repo: https://git.paulo.ac/pub/edk2.git > Branch: stacktrace_v6 >=20 > Thanks > Paulo >=20 > > > > > > Thank you > > Yao Jiewen > > > >> -----Original Message----- > >> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of > Paulo > >> Alcantara > >> Sent: Monday, January 15, 2018 8:23 AM > >> To: edk2-devel@lists.01.org > >> Cc: Rick Bramley ; Dong, Eric > >> ; Kimon Berlin ; Andrew Fish > >> ; Yao, Jiewen ; Diego Medaglia > >> ; Laszlo Ersek > >> Subject: [edk2] [RFC v5 0/8] Stack trace support in X64 exception hand= ling > >> > >> Hi, > >> > >> This series adds stack trace support during IA32 and X64 CPU exception= s. > >> > >> Informations like back trace, stack contents and image module names > >> (that were part of the call stack) will be dumped out. > >> > >> The current limitation is that it relies on available frame pointers > >> (GCC only) in order to successfully unwind the stack. > >> > >> Jiewen, > >> > >> Thank you very much for your time on this. I've applied the changes yo= u > >> suggested, as well as tested it on IA32 PAE paging mode - it worked as > >> expected. > >> > >> Other than that, I also tested the stack trace in SMM code by manually > >> calling CpuBreakPoint() and then it broke with another exception > >> (page fault). I didn't have much time to look into that, but what I've > >> observed is that the page fault ocurred during the search of PE/COFF > >> image base address (in PeCoffSearchImageBase). The function attempts t= o > >> search for the image base from "Address" through 0, so any of those > >> dereferenced addresses triggers the page fault. > >> > >> Do you know how we could fix that issue? Perhaps introducing a > >> AddressValidationLib (as Brian suggested previously) and use it within > >> PeCoffSearchImageBase()? > >> > >> I'd also like to thank Brian & Jeff for all the support! > >> > >> Thanks > >> Paulo > >> > >> Repo: https://github.com/pcacjr/edk2.git > >> Branch: stacktrace_v5 > >> > >> Cc: Rick Bramley > >> Cc: Kimon Berlin > >> Cc: Diego Medaglia > >> Cc: Andrew Fish > >> Cc: Eric Dong > >> Cc: Laszlo Ersek > >> Cc: Brian Johnson > >> Cc: Jeff Fan > >> Cc: Jiewen Yao > >> Cc: Paulo Alcantara > >> Contributed-under: TianoCore Contribution Agreement 1.1 > >> Signed-off-by: Paulo Alcantara > >> --- > >> > >> v1 -> v2: > >> * Add IA32 arch support (GCC toolchain only) > >> * Replace hard-coded stack alignment value (16) with > >> CPU_STACK_ALIGNMENT. > >> * Check for proper stack and frame pointer alignments. > >> * Fix initialization of UnwoundStacksCount to 1. > >> * Move GetPdbFileName() to common code since it will be used by bot= h > >> IA32 and X64 implementations. > >> > >> v2 -> v3: > >> * Fixed wrong assumption about "RIP < ImageBase" to start searching > >> for another PE/COFF image. That is, RIP may point to lower and > >> higher addresses for any other PE/COFF images. Both IA32 & X64. > >> (Thanks Andrew & Jiewen) > >> * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian) > >> > >> v3 -> v4: > >> * Validate all frame/stack pointer addresses before dereferencing t= hem > >> as requested by Brian & Jiewen. > >> * Correctly print out IP addresses during the stack traces (by Jeff= ) > >> > >> v4 -> v5: > >> * Fixed address calculations and improved code as suggested by Jiew= en. > >> * Fixed parameter validation as suggested by Brian. > >> * Tested stack stack with IA32 PAE paging mode. > >> > >> Paulo Alcantara (8): > >> UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support > >> UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName() > >> UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support > >> UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory > >> addresses > >> UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointer= s > >> UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses > >> UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges > >> UefiCpuPkg/CpuExceptionHandlerLib: Add early check in > >> DumpStackContents > >> > >> UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c > >> | 537 ++++++++++++++++++-- > >> UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h > >> | 59 ++- > >> > UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c | > >> 483 +++++++++++++++++- > >> UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c > | > >> 426 +++++++++++++++- > >> 4 files changed, 1435 insertions(+), 70 deletions(-) > >> > >> -- > >> 2.14.3 > >> > >> _______________________________________________ > >> edk2-devel mailing list > >> edk2-devel@lists.01.org > >> https://lists.01.org/mailman/listinfo/edk2-devel >=20 > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel