From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by mx.groups.io with SMTP id smtpd.web10.11918.1677754252118457701 for ; Thu, 02 Mar 2023 02:50:52 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=jr5ZbLSC; spf=pass (domain: kernel.org, ip: 139.178.84.217, mailfrom: ardb@kernel.org) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7573B61593 for ; Thu, 2 Mar 2023 10:50:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 997B4C433A0 for ; Thu, 2 Mar 2023 10:50:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677754250; bh=TOKkrVZvy0DWJw2ua3e7HB9z24dYWP+TJLgTkHS6CoM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=jr5ZbLSCmU5leQf4k57LNJBT1SjtdNfQFwfjDQNWUznvgImVbXpwHddl7MASuEuca LEDELzb49cPjPXXL8ag2A+9W/rpFBNEBPm4ZTCIGJs8nRWqbeylRFkDbZ28tgzKP6D DBAGexvgI0rAIUSS4yhSSkQJVOvek/equcDJT6VRrTz8AxaZ5TlQg3yiQgJ6p2ucng btmrTF3ngdLAz/2xTyn7zORTaSnDx3+Cq6oI9Tzs95R6UWwZj9qkSmVvdU3+7ZlCkI 4b2MStFL1bF+20Ti1DrBo5VVC3nS5/GQIEz3Zz6DQkEQZQ0NvZm4YlYHM2gVoy50qz 8/NbQ41RnNoMw== Received: by mail-lj1-f182.google.com with SMTP id h3so17111477lja.12 for ; Thu, 02 Mar 2023 02:50:50 -0800 (PST) X-Gm-Message-State: AO0yUKX3pqdk/fY5ZZF+k3cAUwlo5rPmyBsisyf02KEHHZtvPsuusIYx BkZSFME5mQDkZTnaSAfiFPfWd+OUYd4vJViceMU= X-Google-Smtp-Source: AK7set+UDk6GEoRBsWN/e44eHosgaslSjpoZuGv4WvSQ4Rr4IekbHAGFLWc6d3jU+40cU8tjv6n3yQHP5TcbVOG3jZ0= X-Received: by 2002:a2e:aa1c:0:b0:293:4ed3:a404 with SMTP id bf28-20020a2eaa1c000000b002934ed3a404mr3158740ljb.2.1677754248630; Thu, 02 Mar 2023 02:50:48 -0800 (PST) MIME-Version: 1.0 References: <20230119120021.4yohqindvj3ghwky@sirius.home.kraxel.org> <173FFD60429C89C3.3213@groups.io> In-Reply-To: From: "Ard Biesheuvel" Date: Thu, 2 Mar 2023 11:50:37 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [edk2-devel] [PATCH v2 2/2] ArmVirtPkg/ArmVirtQemu: Avoid early ID map on ThunderX To: Oliver Steffen Cc: devel@edk2.groups.io, Gerd Hoffmann , Marc Zyngier , dann.frazier@canonical.com Content-Type: text/plain; charset="UTF-8" On Thu, 9 Feb 2023 at 16:15, Ard Biesheuvel wrote: > > On Tue, 7 Feb 2023 at 13:58, Oliver Steffen wrote: > > > > On Tue, Feb 7, 2023 at 12:57 PM Ard Biesheuvel wrote: > >> > >> On Tue, 7 Feb 2023 at 11:51, Oliver Steffen wrote: > >> > > >> > On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen wrote: > >> >> > >> >> > >> >> On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel wrote: > >> >>> > >> >>> On Wed, 1 Feb 2023 at 13:59, Oliver Steffen wrote: > >> >>> > > >> >>> > On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheuvel wrote: > >> >>> >> > >> >>> >> On Wed, 1 Feb 2023 at 10:14, Oliver Steffen wrote: > >> >>> >> > > >> >> > >> >> [...] > >> >>> > >> >>> >> > I am sorry, this story does not seem to be over yet. > >> >>> >> > > >> >>> >> > We are using the Erratum patch and also included the commit 406504c7 in > >> >>> >> > the kernel. > >> >>> >> > Now the firmware crashes sometimes (10 out of 89 tests). > >> >>> >> > > >> >>> >> > >> >>> >> Thanks for the report. Is this still on ThunderX2? > >> >>> >> > >> >>> >> > Any hints are very welcome! > >> >>> >> > > >> >>> >> > >> >>> >> Do you have access to those build artifacts? > >> >>> > > >> >>> > > >> >>> > https://kojihub.stream.centos.org/kojifiles/work/tasks/5251/1835251/edk2-aarch64-20221207gitfff6d81270b5-4.el9.test.noarch.rpm > >> >>> > > >> >>> > and/or here: > >> >>> > > >> >>> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=1835251 > >> >>> > > >> >>> > Source for reference: > >> >>> > https://gitlab.com/redhat/centos-stream/src/edk2/-/merge_requests/24 > >> >>> > > >> >>> > >> >>> Any chance the .dll files (which are actually ELF executables) have > >> >>> been preserved somewhere? > >> >> > >> >> Here is the build folder (~90MB): > >> >> https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/armvirt-thunderx2-issue.tar.xz > >> >> > >> >> I am waiting for the tests with the additional debug output to run. > >> > > >> > > >> > We reran the test suite with the Erratum and the additional debug > >> > output enabled. Strangely, the problem does not occur anymore, the > >> > firmware boots up normally. > >> > > >> > We retried the tests without the additional debug output. > >> > RHEL ships two firmware flavors for AARCH64: a silent and a verbose > >> > version. > >> > >> Are these RELEASE vs DEBUG builds? > > > > > > All builds are DEBUG, just the amount of information printed on > > the serial is different (almost zero for the "silent" one.) > > > >> > >> > Both were tried. We see no problems with the verbose > >> > one. The silent one fails noticeably more often if a software TPM device > >> > is present. > >> > > >> > >> This smells like some missing cache or TLB maintenance - the verbose > >> one exits to the host much more often, and likely relies on cache/TLB > >> maintenance occurring in the hypervisor. > >> > >> So the build always includes TPM support but the issue only occurs > >> when the sw TPM is actually exposed by QEMU? > > > > > > Yes. > > All builds include support for TPM, but the issue occurs more frequently > > if a sw TPM is exposed by QEMU. > > > > Any chance you could provide a specific command line for launching > QEMU? I am trying to reproduce this, but I am not making any progress. > > >> > >> > Could this be related to how much stuff is going on in the early phase > >> > of the firmware (when logging is enabled: formatting of messages and > >> > sending to serial port...) ? > >> > > >> > >> I'll try to see if I can rig something up that logs into a buffer > >> rather than straight to the serial, and dump it all out when handling > >> the crash > >> > > This takes a bit more time than I can afford to spend on this atm, and > I'd like to be able to reproduce before I go down this rabbit hole. Have there been any developments regarding this issue?