From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by mx.groups.io with SMTP id smtpd.web11.42986.1674126709112408513 for ; Thu, 19 Jan 2023 03:11:49 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=corIGdrm; spf=pass (domain: kernel.org, ip: 139.178.84.217, mailfrom: ardb@kernel.org) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 87A5061B16 for ; Thu, 19 Jan 2023 11:11:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ED093C433F0 for ; Thu, 19 Jan 2023 11:11:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674126708; bh=h2YbRJXXwL48V8Owl+gxRBn8foZ0K+MJvcIb7bL0gsM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=corIGdrmGQSgh+pBwTpQVjx/MELgrz7NU2faKHksW+6OsEPN9yss3B6UICZ8GqYB+ +s6nWzF3oBB/8WpfxZUeEe7/vZMbDr8+CWOabbqBRRx4+2RNg0u+smJDvQmfkro9h5 9vZxnJoV6842qhCan911aFGu6kxbgQK57rzQclYPBuQr0f3yc1rmZZpgIuV9AjrG9M Qj4t69ZCiQYVbg+wFLB0N9QJlryITH+0XpJ5ifbEyd+MI7jT/p/ML6C81/al+lQPmJ ZLsCiGvn6cumacJOKN8Lr2CV99Uij+/GBffXSKb31hUyWifKPSkxzQbV9MO5Q3NNqI hwz2dnAa4NbGw== Received: by mail-lf1-f52.google.com with SMTP id y25so2693066lfa.9 for ; Thu, 19 Jan 2023 03:11:47 -0800 (PST) X-Gm-Message-State: AFqh2kq669x9aHVyidsOtYkDLFMBoDbJjjoZ28+DHrah/MkLSMNa+iYi SAHeATTQS4mDZR6yNDnihwS29biXSUr8bARXfcs= X-Google-Smtp-Source: AMrXdXsVuIFvdoiQUHPF8tBzwuyoLD9M3R0Ubo3sknJ8vgC6PpSmuohq8+ElVaIUUHqIDfRZAUXED+2Yg7oXA7Qi1n4= X-Received: by 2002:a19:c501:0:b0:4b8:9001:a694 with SMTP id w1-20020a19c501000000b004b89001a694mr542072lfe.426.1674126705907; Thu, 19 Jan 2023 03:11:45 -0800 (PST) MIME-Version: 1.0 References: <20230105162528.1430368-1-ardb@kernel.org> <20230105162528.1430368-2-ardb@kernel.org> In-Reply-To: From: "Ard Biesheuvel" Date: Thu, 19 Jan 2023 12:11:34 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [edk2-devel] [PATCH v2 2/2] ArmVirtPkg/ArmVirtQemu: Avoid early ID map on ThunderX To: Oliver Steffen , Marc Zyngier Cc: devel@edk2.groups.io, dann.frazier@canonical.com, kraxel@redhat.com Content-Type: text/plain; charset="UTF-8" (cc Marc) Context: - on my TX2 (with the S1PTW r/o memslot fix applied), the new version of ArmVirtQemu that uses an initial ID map in emulated NOR flash works fine. - in Oliver's case (which is a slightly different flavor of TX2), it crashes extremely early, presumably at the point where this ID map is activated. More details at the end. On Thu, 19 Jan 2023 at 12:03, Oliver Steffen wrote: > > Quoting Ard Biesheuvel (2023-01-18 10:22:12) > > On Wed, 18 Jan 2023 at 09:48, Ard Biesheuvel wrote: > > > > > > On Wed, 18 Jan 2023 at 09:28, Oliver Steffen wrote: > > > > > > > > Quoting Ard Biesheuvel (2023-01-18 08:34:32) > > > > > On Wed, 18 Jan 2023 at 07:37, Oliver Steffen wrote: > > > > > > > > > > > > On Tue, Jan 17, 2023 at 3:57 PM Ard Biesheuvel wrote: > > > > > >> > > > > > >> On Tue, 17 Jan 2023 at 13:48, Oliver Steffen wrote: > > > > > >> > > > > > > >> > Hi Ard, Hi everyone, > > > > > >> > > > > > > >> > Thanks for the work! > > > > > >> > > > > > > >> > But somehow this patch (as it was merged into master branch) does not > > > > > >> > work for me on the ThunderX box we have. > > > > > >> > > > > > > >> > Any idea what could be wrong? > > > > > >> > > > > > >> I'm not sure I understand the question. The patch targets ThunderX, > > > > > >> and you are using a ThunderX2. > > > > > >> > > > > > >> What were you expecting to happen, and what is happening instead? > > > > > > > > > > > > > > > > > > Firmware does not start at all when using KVM. > > > > > > > > > > > > Please excuse my limited knowledge of Arm processor variants. > > > > > > I assumed that ThunderX and ThunderX2 are very similar and hoped > > > > > > the fix would also work for this case. > > > > > > > > > > > > The issue was introduced by the same commit that Dann > > > > > > reported (07be1d34d95460a238fcd0f6693efb747c28b329): > > > > > > "ArmVirtPkg/ArmVirtQemu: enable initial ID map at early boot". > > > > > > > > > > > > > > > > Can you share the QEMU command line that you are using? I use a > > > > > ThunderX2 basically 24/7 to do all my Linux and EDK2 development, so > > > > > this change was developed on ThunderX2 and so I'm surprised you are > > > > > seeing this issue. > > > > > > > > > > Did you try the DEBUG build as well? > > > > Yes, debug is on. > > > > > > > > Here is what I have, trying with the master branch from just now > > > > (998ebe5ca0ae5c449e83ede533bee872f97d63af): > > > > > > > > # make -C BaseTools && \ > > > > . ./edksetup.sh && \ > > > > build -t GCC5 -a AARCH64 \ > > > > -p ArmVirtPkg/ArmVirtQemu.dsc \ > > > > -DCAVIUM_ERRATUM_27456 \ > > > > -b DEBUG > > > > > > > > # /usr/libexec/qemu-kvm \ > > > > -machine accel=kvm -m 1G -boot menu=on \ > > > > -blockdev node-name=code,driver=file,filename="${FW_CODE_RESIZED}",read-only=on > > > > \ > > > > -blockdev node-name=vars,driver=file,filename="${FW_VARS}" \ > > > > -machine pflash0=code \ > > > > -machine pflash1=vars \ > > > > -cpu max \ > > > > -net none \ > > > > -serial stdio > > > > > > > > > > My distro does not have qemu-kvm, and using the command line above > > > results in the following if i try it with qemu-system-aarch64 > > > > > > """ > > > qemu-system-aarch64: No machine specified, and there is no default > > > Use -machine help to list supported machines > > > """ > > > > > > unless i change it to > > > > > > qemu-system-aarch64 -machine virt,accel=kvm -m 1G -boot menu=on \ > > > -blockdev node-name=code,driver=file,filename=$HOME/bin/flash0.img,read-only=on > > > \ > > > -blockdev node-name=vars,driver=file,filename=$HOME/bin/flash1.img \ > > > -machine pflash0=code \ > > > -machine pflash1=vars \ > > > -cpu max \ > > > -net none \ > > > -nographic > > > > > > and that works fine with my firmware build. > > > > > > > > > > # /usr/libexec/qemu-kvm --version > > > > QEMU emulator version 7.2.0 (qemu-kvm-7.2.0-3.el9) > > > > > > > > # uname -r > > > > 5.14.0-234.el9.aarch64 > > > > > > > > > > Yeah, that is quite old. One potential issue that comes to mind here > > > is the one address by the patch below > > > > > > > > > > > > > > > > > > Since you have the same CPU... Might this be a bug in KVM? > > > > > > > > > > Indeed. Could you try applying this patch? > > > > > > commit 406504c7b0405d74d74c15a667cd4c4620c3e7a9 > > > Author: Marc Zyngier > > > Date: Tue Dec 20 14:03:52 2022 +0000 > > > > > > KVM: arm64: Fix S1PTW handling on RO memslots > > > > > > Or check whether this is generally reproducible with newer kernels? > > > > Another thing you might try: > > > > - build the firmware with the following hunk applied > > > > """ > > diff --git a/ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S > > b/ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S > > index 5ac7c732f6ec..f4e1285beefc 100644 > > --- a/ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S > > +++ b/ArmVirtPkg/Library/ArmPlatformLibQemu/AArch64/ArmPlatformHelper.S > > @@ -40,6 +40,12 @@ > > .set sctlrval, SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_SA | > > SCTLR_EL1_ITD | SCTLR_EL1_SED > > .set sctlrval, sctlrval | SCTLR_ELx_I | SCTLR_EL1_SPAN | SCTLR_EL1_RES1 > > > > + .align 11 > > +.Lvectors: > > + .rept 16 > > + .align 7 > > + b . > > + .endr > > > > ASM_FUNC(ArmPlatformPeiBootAction) > > #ifdef CAVIUM_ERRATUM_27456 > > @@ -90,6 +96,8 @@ ASM_FUNC(ArmPlatformPeiBootAction) > > msr mair_el1, x0 // set up the 1:1 mapping > > msr tcr_el1, x1 > > msr ttbr0_el1, x2 > > + adr x0, .Lvectors > > + msr vbar_el1, x0 > > isb > > > > tlbi vmalle1 // invalidate any cached translations > > """ > > > > - run qemu with the -s option and let it crash > > > > - connect with gdb and dump the exception context > > > > target remote:1234 > > set radix 16 > > p $FAR_EL1 > > p $ESR_EL1 > > p $ELR_EL1 > > > > That should at least tell us why the crash is occurring. > > > > I tried the most recent Qemu master (v7.2.50) and also v7.0.0, > on the 5.14 (RHEL) kernel and on 6.1.6-200.fc37.aarch64 (from Fedora). > No luck. > Does that include a backport of commit 406504c7b0405d74d74c15a667cd4c4620c3e7a9? > I applied the patch and attached gdb, as described (Qemu 7.2.50): > > p $ELR_EL1 > (gdb) p $FAR_EL1 > $1 = 0x6200 > (gdb) p $ESR_EL1 > $2 = 0x86000010 > (gdb) p $ELR_EL1 > $3 = 0x6200 > > There is no sign of any crash. It seems like it does not even start > running. > So 0x6200 is the sync exception vector, which is both the code location of the crash and the faulting address. This means fetching the instructions to handle the original exception failed, and so the original exception reason (ESR) is lost. However, the synchronous external abort (https://esr.arm64.dev/?#0x86000010) that you are seeing might point to an issue similar (or the same) that Marc recently fixed in KVM. It is quite odd that this does not reproduce *at all* on my TX2. Fedora kernels don't use 64k pages right?