From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web10.82520.1675774725587562270 for ; Tue, 07 Feb 2023 04:58:46 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=d6/TxbuT; spf=pass (domain: redhat.com, ip: 170.10.129.124, mailfrom: osteffen@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675774724; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/A2lEWpaC6IiTqZh0LdDxiin6WVxcBIh1/SqGWKrlfU=; b=d6/TxbuTAQ7EiKzBslDdzP6Qivgori92qbVDQBEKh/sdRYt9VIR0HiHSmPGwYUhREOvw16 fJKgDEq8LWVcnQeRuTiX2kwblNqp7SJjeqPainrwFyXAMN6hk0nf06pB7Z+AJ1JXs2qBo9 OKDjas8O/N89PP056i7JJ8Nk/idrlL4= Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-203-b5BqnOtZNdqZs41ghJTbjg-1; Tue, 07 Feb 2023 07:58:43 -0500 X-MC-Unique: b5BqnOtZNdqZs41ghJTbjg-1 Received: by mail-lj1-f200.google.com with SMTP id v24-20020a2e7a18000000b0028ea2c1017fso3585255ljc.14 for ; Tue, 07 Feb 2023 04:58:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/A2lEWpaC6IiTqZh0LdDxiin6WVxcBIh1/SqGWKrlfU=; b=u3iKbQ8vIRjLrYuEXfaQOBmh2yqZu0wIH7PBRcGNp2RzQeZilQvV8XKEngwT1j+TC+ a3y62lkpes99vdljM8ejbtfKZ4i059ipMxNS7Q9VYx8qfujPARQZDR5uJRPiQxzHkD0h O3TZH7NCdETCQUThtuoSQOSRiuh6hnKCZv2JVfrMgeyGQKhDMZzP6kdTEJKZlr8dxiyL iH7dKPegLxWEAkTH0H6pekQGBoN1CZyckhmI2LeL6AGkdX1OTS6JlfVHAN1kHPDVycYN D4W0uA5a7vZcXhh8j/9YOZqjvUv7Ep8DlnM7av/tjNcQEIMWIhPQr8HV+5oRD+9eJR6A i3dw== X-Gm-Message-State: AO0yUKVGT8DkYr7v0uETWkIaRL8S2D1o/9RQLdfjCDLLU3f9ryiCqjQi R/HarXwNVGE+vTFVAzw7XqDI4GZ3sqMyzyk6GK9QIC7muq6EJTYlD+KCRT/D04FojQDdJ2EhD4L 77H4yjAxHpudQku8dd24diaHFkzvOSA== X-Received: by 2002:a19:5212:0:b0:4d5:77b2:1773 with SMTP id m18-20020a195212000000b004d577b21773mr488730lfb.82.1675774720318; Tue, 07 Feb 2023 04:58:40 -0800 (PST) X-Google-Smtp-Source: AK7set/ZK2gp5DQQmfdtzgMWHpUBbfUekErpkwrSs/8hLaPcGW5hNW4fUOUgn+/I81MAGH1vE+duxD749Vfa64zY8Eg= X-Received: by 2002:a19:5212:0:b0:4d5:77b2:1773 with SMTP id m18-20020a195212000000b004d577b21773mr488725lfb.82.1675774720072; Tue, 07 Feb 2023 04:58:40 -0800 (PST) MIME-Version: 1.0 References: <20230119120021.4yohqindvj3ghwky@sirius.home.kraxel.org> <173FFD60429C89C3.3213@groups.io> In-Reply-To: From: "Oliver Steffen" Date: Tue, 7 Feb 2023 13:58:28 +0100 Message-ID: Subject: Re: [edk2-devel] [PATCH v2 2/2] ArmVirtPkg/ArmVirtQemu: Avoid early ID map on ThunderX To: Ard Biesheuvel Cc: devel@edk2.groups.io, Gerd Hoffmann , Marc Zyngier , dann.frazier@canonical.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/alternative; boundary="000000000000e9f13b05f41bb4cf" --000000000000e9f13b05f41bb4cf Content-Type: text/plain; charset="UTF-8" On Tue, Feb 7, 2023 at 12:57 PM Ard Biesheuvel wrote: > On Tue, 7 Feb 2023 at 11:51, Oliver Steffen wrote: > > > > On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen > wrote: > >> > >> > >> On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel wrote: > >>> > >>> On Wed, 1 Feb 2023 at 13:59, Oliver Steffen > wrote: > >>> > > >>> > On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheuvel > wrote: > >>> >> > >>> >> On Wed, 1 Feb 2023 at 10:14, Oliver Steffen > wrote: > >>> >> > > >> > >> [...] > >>> > >>> >> > I am sorry, this story does not seem to be over yet. > >>> >> > > >>> >> > We are using the Erratum patch and also included the commit > 406504c7 in > >>> >> > the kernel. > >>> >> > Now the firmware crashes sometimes (10 out of 89 tests). > >>> >> > > >>> >> > >>> >> Thanks for the report. Is this still on ThunderX2? > >>> >> > >>> >> > Any hints are very welcome! > >>> >> > > >>> >> > >>> >> Do you have access to those build artifacts? > >>> > > >>> > > >>> > > https://kojihub.stream.centos.org/kojifiles/work/tasks/5251/1835251/edk2-aarch64-20221207gitfff6d81270b5-4.el9.test.noarch.rpm > >>> > > >>> > and/or here: > >>> > > >>> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=1835251 > >>> > > >>> > Source for reference: > >>> > https://gitlab.com/redhat/centos-stream/src/edk2/-/merge_requests/24 > >>> > > >>> > >>> Any chance the .dll files (which are actually ELF executables) have > >>> been preserved somewhere? > >> > >> Here is the build folder (~90MB): > >> > https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/armvirt-thunderx2-issue.tar.xz > >> > >> I am waiting for the tests with the additional debug output to run. > > > > > > We reran the test suite with the Erratum and the additional debug > > output enabled. Strangely, the problem does not occur anymore, the > > firmware boots up normally. > > > > We retried the tests without the additional debug output. > > RHEL ships two firmware flavors for AARCH64: a silent and a verbose > > version. > > Are these RELEASE vs DEBUG builds? > All builds are DEBUG, just the amount of information printed on the serial is different (almost zero for the "silent" one.) > > Both were tried. We see no problems with the verbose > > one. The silent one fails noticeably more often if a software TPM device > > is present. > > > > This smells like some missing cache or TLB maintenance - the verbose > one exits to the host much more often, and likely relies on cache/TLB > maintenance occurring in the hypervisor. > > So the build always includes TPM support but the issue only occurs > when the sw TPM is actually exposed by QEMU? > Yes. All builds include support for TPM, but the issue occurs more frequently if a sw TPM is exposed by QEMU. > > Could this be related to how much stuff is going on in the early phase > > of the firmware (when logging is enabled: formatting of messages and > > sending to serial port...) ? > > > > I'll try to see if I can rig something up that logs into a buffer > rather than straight to the serial, and dump it all out when handling > the crash > > Awesome. Thanks, Oliver --000000000000e9f13b05f41bb4cf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, Feb 7, 2023 at 12:57 PM Ard Biesheuvel <ardb@kernel.org> wrote:
On Tue, 7 Feb 2023 at 11:51, Oliver Steffe= n <osteffen@red= hat.com> wrote:
>
> On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen <osteffen@redhat.com> wrote: >>
>>
>> On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>>>
>>> On Wed, 1 Feb 2023 at 13:59, Oliver Steffen <osteffen@redhat.com> wrot= e:
>>> >
>>> > On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheuvel <ardb@kernel.org> wrot= e:
>>> >>
>>> >> On Wed, 1 Feb 2023 at 10:14, Oliver Steffen <osteffen@redhat.com= > wrote:
>>> >> >
>>
>> [...]
>>>
>>> >> > I am sorry, this story does not seem to be over = yet.
>>> >> >
>>> >> > We are using the Erratum patch and also included= the commit 406504c7 in
>>> >> > the kernel.
>>> >> > Now the firmware crashes sometimes (10 out of 89= tests).
>>> >> >
>>> >>
>>> >> Thanks for the report. Is this still on ThunderX2? >>> >>
>>> >> > Any hints are very welcome!
>>> >> >
>>> >>
>>> >> Do=C2=A0 you have access to those build artifacts? >>> >
>>> >
>>> > https://kojihub.stream.centos.o= rg/kojifiles/work/tasks/5251/1835251/edk2-aarch64-20221207gitfff6d81270b5-4= .el9.test.noarch.rpm
>>> >
>>> > and/or here:
>>> >
>>> > https://kojihub.st= ream.centos.org/koji/taskinfo?taskID=3D1835251
>>> >
>>> > Source for reference:
>>> > https://gitlab= .com/redhat/centos-stream/src/edk2/-/merge_requests/24
>>> >
>>>
>>> Any chance the .dll files (which are actually ELF executables)= have
>>> been preserved somewhere?
>>
>> Here is the build folder (~90MB):
>> https:= //gitlab.com/osteffen/thunderx2-debug/-/raw/main/armvirt-thunderx2-issue.ta= r.xz
>>
>> I am waiting for the tests with the additional debug output to run= .
>
>
> We reran the test suite with the Erratum and the additional debug
> output enabled.=C2=A0 Strangely, the problem does not occur anymore, t= he
> firmware boots up normally.
>
> We retried the tests without the additional debug output.
> RHEL ships two firmware flavors for AARCH64: a silent and a verbose > version.

Are these RELEASE vs DEBUG builds?

All builds are DEBUG, ju= st the amount of information printed on
the serial is different (almost zero for the "= ;silent" one.)
=C2=A0
> Both were tried. We see no problems with the verbose
> one. The silent one fails noticeably more often if a software TPM devi= ce
> is present.
>

This smells like some missing cache or TLB maintenance - the verbose
one exits to the host much more often, and likely relies on cache/TLB
maintenance occurring in the hypervisor.

So the build always includes TPM support but the issue only occurs
when the sw TPM is actually exposed by QEMU?
=C2=A0
Yes.
All builds= include support for TPM, but the issue occurs more frequently
=
if a sw TPM is= exposed by QEMU.
=C2=A0
> Could this be related to how much stuff is going on in the early phase=
> of the firmware (when logging is enabled: formatting of messages and > sending to serial port...) ?
>

I'll try to see if I can rig something up that logs into a buffer
rather than straight to the serial, and dump it all out when handling
the crash

Awesome.

Thanks,
=C2=A0Oliver=C2=A0
--000000000000e9f13b05f41bb4cf--