From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by mx.groups.io with SMTP id smtpd.web10.81494.1675771021672921434 for ; Tue, 07 Feb 2023 03:57:01 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YM5IsAtd; spf=pass (domain: kernel.org, ip: 139.178.84.217, mailfrom: ardb@kernel.org) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1E9C861354 for ; Tue, 7 Feb 2023 11:57:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83660C433D2 for ; Tue, 7 Feb 2023 11:57:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675771020; bh=f2rvbH6WERHj5YzsuXyKhKjSOor6eVRyDSLQiYDedT8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=YM5IsAtdzYvNbCOjPYt1/nhwGA52JshEAFuSOUe3rzCAIfycjXO9TmVx15b0aBDpS LGfgCBj6zVDUIKGIOexHjwynhJGQnnC80Hy77j0LYiwMDkJM5t5/0JikU0wYozGwGN hlOpl1Mjy5dfZ0dU/s6F6q8EfiPjopf8xWOyi91CV+Lue9ILNZ/fz3dgEL9Utwt1cD ubaERLfaWhO+xz6i9jcBSy1qpjkahQJKGBcgcGsgCgXd+Nw6whWEVs3XtUdu3eFGO6 P4oo2uc7XNlshBwM04JxRTxAQk0mOjOBbjeW+eRzRr0QjSgHenTcmBfYwcQF1mY2T+ Qh+LCGp5Te/LA== Received: by mail-lf1-f46.google.com with SMTP id f34so21970205lfv.10 for ; Tue, 07 Feb 2023 03:57:00 -0800 (PST) X-Gm-Message-State: AO0yUKXB8RGTR1XJNVJZsSUZP+QfAYwNbKj+jJtEbi3FVES3hP3vRthM fyxisG5Fz5RAA27gIbml2if9TuXI/Xa5CW89dPI= X-Google-Smtp-Source: AK7set+igslYX4KRoNdBg+n2CetFjMBs352DbHhBK910cqr0pZbsKudpvScVspFglu/1isNpVjrAAgKmydiMSNneiKA= X-Received: by 2002:ac2:5550:0:b0:4b6:e197:3aeb with SMTP id l16-20020ac25550000000b004b6e1973aebmr459301lfk.233.1675771018394; Tue, 07 Feb 2023 03:56:58 -0800 (PST) MIME-Version: 1.0 References: <20230119120021.4yohqindvj3ghwky@sirius.home.kraxel.org> <173FFD60429C89C3.3213@groups.io> In-Reply-To: From: "Ard Biesheuvel" Date: Tue, 7 Feb 2023 12:56:46 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [edk2-devel] [PATCH v2 2/2] ArmVirtPkg/ArmVirtQemu: Avoid early ID map on ThunderX To: Oliver Steffen Cc: devel@edk2.groups.io, Gerd Hoffmann , Marc Zyngier , dann.frazier@canonical.com Content-Type: text/plain; charset="UTF-8" On Tue, 7 Feb 2023 at 11:51, Oliver Steffen wrote: > > On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen wrote: >> >> >> On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel wrote: >>> >>> On Wed, 1 Feb 2023 at 13:59, Oliver Steffen wrote: >>> > >>> > On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheuvel wrote: >>> >> >>> >> On Wed, 1 Feb 2023 at 10:14, Oliver Steffen wrote: >>> >> > >> >> [...] >>> >>> >> > I am sorry, this story does not seem to be over yet. >>> >> > >>> >> > We are using the Erratum patch and also included the commit 406504c7 in >>> >> > the kernel. >>> >> > Now the firmware crashes sometimes (10 out of 89 tests). >>> >> > >>> >> >>> >> Thanks for the report. Is this still on ThunderX2? >>> >> >>> >> > Any hints are very welcome! >>> >> > >>> >> >>> >> Do you have access to those build artifacts? >>> > >>> > >>> > https://kojihub.stream.centos.org/kojifiles/work/tasks/5251/1835251/edk2-aarch64-20221207gitfff6d81270b5-4.el9.test.noarch.rpm >>> > >>> > and/or here: >>> > >>> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=1835251 >>> > >>> > Source for reference: >>> > https://gitlab.com/redhat/centos-stream/src/edk2/-/merge_requests/24 >>> > >>> >>> Any chance the .dll files (which are actually ELF executables) have >>> been preserved somewhere? >> >> Here is the build folder (~90MB): >> https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/armvirt-thunderx2-issue.tar.xz >> >> I am waiting for the tests with the additional debug output to run. > > > We reran the test suite with the Erratum and the additional debug > output enabled. Strangely, the problem does not occur anymore, the > firmware boots up normally. > > We retried the tests without the additional debug output. > RHEL ships two firmware flavors for AARCH64: a silent and a verbose > version. Are these RELEASE vs DEBUG builds? > Both were tried. We see no problems with the verbose > one. The silent one fails noticeably more often if a software TPM device > is present. > This smells like some missing cache or TLB maintenance - the verbose one exits to the host much more often, and likely relies on cache/TLB maintenance occurring in the hypervisor. So the build always includes TPM support but the issue only occurs when the sw TPM is actually exposed by QEMU? > Could this be related to how much stuff is going on in the early phase > of the firmware (when logging is enabled: formatting of messages and > sending to serial port...) ? > I'll try to see if I can rig something up that logs into a buffer rather than straight to the serial, and dump it all out when handling the crash