From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by spool.mail.gandi.net (Postfix) with ESMTPS id F025CD80591 for ; Tue, 29 Aug 2023 14:37:05 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=GXidLPQGoFBWLxjayXYYuoGFkC27WWz30/YNAehCKIk=; c=relaxed/simple; d=groups.io; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:In-Reply-To:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Language:Content-Type:Content-Transfer-Encoding; s=20140610; t=1693319824; v=1; b=okotVBUpsOpAfmPvilZFVK5Y0EqvesgpOYyhotIjxC2Iy4Ir6ROGI7G2a+wY7zFenQFjhTMe osSzO2jN3z5CLF0Xjen6MWOVxP3r4SLk2yN89Q6fN9nge9xzcHkT6hY9+zcgDsDyLmweQUnQdxa iVedCbPyiinClpDSL1sRoCL8= X-Received: by 127.0.0.2 with SMTP id kEYoYY7687511xJ3tD6Qpol7; Tue, 29 Aug 2023 07:37:04 -0700 X-Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web11.19092.1693319823901430408 for ; Tue, 29 Aug 2023 07:37:04 -0700 X-Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-569-4_Om895JPuejDgCxR0yZAw-1; Tue, 29 Aug 2023 10:36:59 -0400 X-MC-Unique: 4_Om895JPuejDgCxR0yZAw-1 X-Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4B3F5800193; Tue, 29 Aug 2023 14:36:59 +0000 (UTC) X-Received: from [10.39.195.6] (unknown [10.39.195.6]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 87D4A2026D4B; Tue, 29 Aug 2023 14:36:58 +0000 (UTC) Message-ID: Date: Tue, 29 Aug 2023 16:36:57 +0200 MIME-Version: 1.0 Subject: Re: [edk2-devel] [PATCH 1/1] ArmPkg/ExceptionSupport: Support backtrace through an exception To: Ard Biesheuvel , devel@edk2.groups.io Cc: quic_llindhol@quicinc.com References: <20230829132921.123407-1-ardb@kernel.org> From: "Laszlo Ersek" In-Reply-To: <20230829132921.123407-1-ardb@kernel.org> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,lersek@redhat.com List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: X-Gm-Message-State: kx3zdkxWZUXPjdRMsQm7it5ax7686176AA= Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20140610 header.b=okotVBUp; spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=redhat.com (policy=none) On 8/29/23 15:29, Ard Biesheuvel wrote: > Laszlo reports that the efi_gdb.py script fails to produce a full > backtrace when attaching it to an ARM firmware build that has halted on > an unhandled exception. >=20 > The reason is that the asm code that processes the exception was not > implemented with this in mind, and therefore lacks any handling of it. >=20 > So let's add this: create a dummy frame record suitable for chasing the > frame pointer, and add the CFI metadata to describe where the return > value can be found on the stack. >=20 > When using a GCC5 build, this produces a stack trace such as >=20 > (gdb) bt > #0 0x000000007fd4537c in CpuDeadLoop () at /home/ardb/build/edk2/MdePk= g/Library/BaseLib/CpuDeadLoop.c:30 > #1 0x000000007fd454f8 in DebugAssert ( > FileName=3DFileName@entry=3D0x7fd4a8a8 "= /home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/Def= aultExceptionHandler.c", > LineNumber=3DLineNumber@entry=3D343, Description=3DDescription@entr= y=3D0x7fd4a896 "((BOOLEAN)(0=3D=3D1))") > at /home/ardb/build/edk2/MdePkg/Library/BaseDebugLibSerialPort/Debu= gLib.c:235 > #2 0x000000007fd479ec in DefaultExceptionHandler (ExceptionType=3D, SystemContext=3D...) > at /home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/= AArch64/DefaultExceptionHandler.c:343 > #3 0x000000007fd48eb8 in ExceptionHandlersEnd () > #4 0x000000007fcde944 in QemuLoadKernelImage (ImageHandle=3D) at /home/ardb/build/edk2/OvmfPkg/Library/GenericQemuLoadImageLib= /GenericQemuLoadImageLib.c:201 > #5 TryRunningQemuKernel () at /home/ardb/build/edk2/ArmVirtPkg/Library= /PlatformBootManagerLib/QemuKernel.c:46 > #6 PlatformBootManagerAfterConsole () at /home/ardb/build/edk2/ArmVirt= Pkg/Library/PlatformBootManagerLib/PlatformBm.c:1139 > #7 BdsEntry (This=3D) at /home/ardb/build/edk2/MdeModul= ePkg/Universal/BdsDxe/BdsEntry.c:931 > #8 0x000000007ffd0018 in ?? () > Backtrace stopped: previous frame inner to this frame (corrupt stack?) >=20 > when QemuLoadKernelImage() has been tweaked to trigger an exception, as > is shown by GDB when walking the call stack: >=20 > | 0x7fcde938 b.ne 0x7fcdf134 = // b.any > | 0x7fcde93c mov x0, #0x40 = // #64 > | 0x7fcde940 bl 0x7fcd7aec > | > 0x7fcde944 brk #0x4d2 > | 0x7fcde948 bl 0x7fce4354 > | 0x7fcde94c tbz x0, #63, 0x7fcde954 > | 0x7fcde950 bl 0x7fcd844c > | 0x7fcde954 bl 0x7fcd990c =20 > Unfortunately, CLANGDWARF does not seem entirely happy with this > arrangement: it identifies the call frame where the exception > originated, but does not show any frames above that. (This could be > related to the fact that the exception code uses a separate exception > stack for handling synchronous exceptions) First of all, thanks for writing this patch so incredibly quickly. :) Second, something must be off with my gdb. Before your patch, I kept experimenting with manually resetting FP, SP, and LR to the values printed in the register dump, using gdb "set" commands. Strangely, that did result in complete pre-exception stack traces, but *only sometimes*. Most of the time gdb complains about "corrupted stack". And I just can't figure out what distinguishes the broken from the functional "bt" commands -- I did walk the allegedly corrupt stack manually, and there is nothing corrupt in the FP and LR parts of the stack frames. They all chain nicely and point to valid instructions, respectively. I don't know what it is that gdb doesn't like. Third, when I test your patch, I seem to experience precisely what you describe under CLANGDWARF -- it shows the faulting frame (the frame just before the exception), but nothing before it! And I'm not building with clang :( Thanks, Laszlo >=20 > Signed-off-by: Ard Biesheuvel > --- > ArmPkg/Library/ArmExceptionLib/AArch64/ExceptionSupport.S | 18 +++++++++= ++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) >=20 > diff --git a/ArmPkg/Library/ArmExceptionLib/AArch64/ExceptionSupport.S b/= ArmPkg/Library/ArmExceptionLib/AArch64/ExceptionSupport.S > index cd9437b6aab8..345b566932bb 100644 > --- a/ArmPkg/Library/ArmExceptionLib/AArch64/ExceptionSupport.S > +++ b/ArmPkg/Library/ArmExceptionLib/AArch64/ExceptionSupport.S > @@ -259,6 +259,8 @@ ASM_PFX(ExceptionHandlersEnd): > =20 > =20 > ASM_PFX(CommonExceptionEntry): > + .cfi_sections .debug_frame > + .cfi_startproc > =20 > EL1_OR_EL2_OR_EL3(x1) > 1:mrs x2, elr_el1 // Exception Link Register > @@ -280,6 +282,13 @@ ASM_PFX(CommonExceptionEntry): > =20 > 4:mrs x4, fpsr // Floating point Status Register 32bit > =20 > + // Create a dummy frame record using the ELR as the return address > + stp x29, x2, [sp, #-16]! > + .cfi_def_cfa_offset (GP_CONTEXT_SIZE + FP_CONTEXT_SIZE + SYS_CONTEXT_S= IZE + 16) > + .cfi_rel_offset x29, 0 > + .cfi_rel_offset x30, 8 > + mov x29, sp > + > // Save the SYS regs > stp x2, x3, [x28, #-SYS_CONTEXT_SIZE]! > stp x4, x5, [x28, #0x10] > @@ -305,7 +314,7 @@ ASM_PFX(CommonExceptionEntry): > =20 > // x0 still holds the exception type. > // Set x1 to point to the top of our struct on the Stack > - mov x1, sp > + add x1, sp, #16 > =20 > // CommonCExceptionHandler ( > // IN EFI_EXCEPTION_TYPE ExceptionType, R0 > @@ -318,6 +327,9 @@ ASM_PFX(CommonExceptionEntry): > // We do not try to recover. > bl ASM_PFX(CommonCExceptionHandler) // Call exception handler > =20 > + // Pop dummy frame record > + add sp, sp, #16 > + > // Pop as many GP regs as we can before entering the critical section = below > ldp x2, x3, [sp, #0x10] > ldp x4, x5, [sp, #0x20] > @@ -378,13 +390,17 @@ ASM_PFX(CommonExceptionEntry): > =20 > // pop remaining GP regs and return from exception. > ldr x30, [sp, #0xf0 - 0xe0] > + .cfi_restore 30 > ldp x28, x29, [sp], #GP_CONTEXT_SIZE - 0xe0 > + .cfi_restore 29 > =20 > // Adjust SP to be where we started from when we came into the handler= . > // The handler can not change the SP. > add sp, sp, #FP_CONTEXT_SIZE + SYS_CONTEXT_SIZE > + .cfi_def_cfa_offset 0 > =20 > eret > + .cfi_endproc > =20 > ASM_FUNC(RegisterEl0Stack) > msr sp_el0, x0 -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#108096): https://edk2.groups.io/g/devel/message/108096 Mute This Topic: https://groups.io/mt/101030910/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/19134562= 12/xyzzy [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-