From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mx.groups.io with SMTP id smtpd.web11.15312.1677763799989460376 for ; Thu, 02 Mar 2023 05:30:00 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UxIsOIrs; spf=pass (domain: redhat.com, ip: 170.10.133.124, mailfrom: osteffen@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677763799; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BtYLqfPQUEpLYHUb7UqnWw8m0Qosp5pZrHbC4eXOJms=; b=UxIsOIrsA4h+oEdVKHB9kKtBkrBnSsJzYJ0SjMCQFICp8T9mNL80nudi66uHHQ7Ptg+Dix S9Pzieg3iWqiSo4+1+0fWdHdUMNsQxyQkqjxRvVcShorETQyZdyLleF85vyaWglFFhZKcZ wp1MIPFSJKq9bQ1kiHkhkociN0wfwyg= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-464-29OgeZUWMc2sI1MyIbq20w-1; Thu, 02 Mar 2023 08:29:57 -0500 X-MC-Unique: 29OgeZUWMc2sI1MyIbq20w-1 Received: by mail-lf1-f72.google.com with SMTP id c6-20020a196546000000b004b57756f937so4778591lfj.3 for ; Thu, 02 Mar 2023 05:29:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BtYLqfPQUEpLYHUb7UqnWw8m0Qosp5pZrHbC4eXOJms=; b=5FSVNhxz6Zwycb3rFUtSBN3R7NgVcq22HdBIkRe4HyQ5yHOx0sdn1lmKp/K+9FxFpx pMWRyvBeMgQqjcmPrQdK8dfdgkyEs2SvhAbuTp+6cWBxiwfGumw5iIOLQiiEIHJb9+c0 ROkAo2X/mOmPKNdV2j4oIFq3AtCt9SqwzY4w3sXS/To0z9PlGTnRzmI5VaKrdJU4cxN8 UvGlwYwBhofwUUoyLemhoXcOexjTtdh6rwvGerTBXQxa/oSSAAhDWCUKDgbaL5hK5F28 RxvCgcivjZLTkC+1pqc1sW8iN6aN5VQ57OJSaKUvf4rXCxVeToeigiFK7yctRDtNECug Qdzg== X-Gm-Message-State: AO0yUKUDjhxxlSQHqXbjuRPXHUh4MyaYHtlE83pH0/u4Gs/foCatdB1a ATBdsLAzUem2Aqqf5//YLWMNysGmeMug6XmjtyljPEtHOx1N964s+ZENkasQaZ7QrhvC6Gl+q21 xDzi0DQuRAvgcgxMM62+qUFR/+HGRUgK9R7s= X-Received: by 2002:ac2:5dd9:0:b0:4d8:5f47:e4d3 with SMTP id x25-20020ac25dd9000000b004d85f47e4d3mr2959395lfq.8.1677763796093; Thu, 02 Mar 2023 05:29:56 -0800 (PST) X-Google-Smtp-Source: AK7set+SvU5c1KwvN/nJxT4pqVXPBRhfSJUzXRh8J1rErtGAUH3dICCj06poqpl5cx4X7EtVhp9zRgXWmo74l2ncUQg= X-Received: by 2002:ac2:5dd9:0:b0:4d8:5f47:e4d3 with SMTP id x25-20020ac25dd9000000b004d85f47e4d3mr2959386lfq.8.1677763795700; Thu, 02 Mar 2023 05:29:55 -0800 (PST) MIME-Version: 1.0 References: <20230119120021.4yohqindvj3ghwky@sirius.home.kraxel.org> <173FFD60429C89C3.3213@groups.io> In-Reply-To: From: "Oliver Steffen" Date: Thu, 2 Mar 2023 14:29:43 +0100 Message-ID: Subject: Re: [edk2-devel] [PATCH v2 2/2] ArmVirtPkg/ArmVirtQemu: Avoid early ID map on ThunderX To: devel@edk2.groups.io, ardb@kernel.org Cc: Gerd Hoffmann , Marc Zyngier , dann.frazier@canonical.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/alternative; boundary="0000000000000f663105f5ead34b" --0000000000000f663105f5ead34b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Mar 2, 2023 at 11:50=E2=80=AFAM Ard Biesheuvel wr= ote: > On Thu, 9 Feb 2023 at 16:15, Ard Biesheuvel wrote: > > > > On Tue, 7 Feb 2023 at 13:58, Oliver Steffen wrote= : > > > > > > On Tue, Feb 7, 2023 at 12:57 PM Ard Biesheuvel > wrote: > > >> > > >> On Tue, 7 Feb 2023 at 11:51, Oliver Steffen > wrote: > > >> > > > >> > On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen > wrote: > > >> >> > > >> >> > > >> >> On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel > wrote: > > >> >>> > > >> >>> On Wed, 1 Feb 2023 at 13:59, Oliver Steffen > wrote: > > >> >>> > > > >> >>> > On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheuvel > wrote: > > >> >>> >> > > >> >>> >> On Wed, 1 Feb 2023 at 10:14, Oliver Steffen < > osteffen@redhat.com> wrote: > > >> >>> >> > > > >> >> > > >> >> [...] > > >> >>> > > >> >>> >> > I am sorry, this story does not seem to be over yet. > > >> >>> >> > > > >> >>> >> > We are using the Erratum patch and also included the commit > 406504c7 in > > >> >>> >> > the kernel. > > >> >>> >> > Now the firmware crashes sometimes (10 out of 89 tests). > > >> >>> >> > > > >> >>> >> > > >> >>> >> Thanks for the report. Is this still on ThunderX2? > > >> >>> >> > > >> >>> >> > Any hints are very welcome! > > >> >>> >> > > > >> >>> >> > > >> >>> >> Do you have access to those build artifacts? > > >> >>> > > > >> >>> > > > >> >>> > > https://kojihub.stream.centos.org/kojifiles/work/tasks/5251/1835251/edk2-= aarch64-20221207gitfff6d81270b5-4.el9.test.noarch.rpm > > >> >>> > > > >> >>> > and/or here: > > >> >>> > > > >> >>> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=3D18352= 51 > > >> >>> > > > >> >>> > Source for reference: > > >> >>> > > https://gitlab.com/redhat/centos-stream/src/edk2/-/merge_requests/24 > > >> >>> > > > >> >>> > > >> >>> Any chance the .dll files (which are actually ELF executables) > have > > >> >>> been preserved somewhere? > > >> >> > > >> >> Here is the build folder (~90MB): > > >> >> > https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/armvirt-thunderx2-= issue.tar.xz > > >> >> > > >> >> I am waiting for the tests with the additional debug output to ru= n. > > >> > > > >> > > > >> > We reran the test suite with the Erratum and the additional debug > > >> > output enabled. Strangely, the problem does not occur anymore, th= e > > >> > firmware boots up normally. > > >> > > > >> > We retried the tests without the additional debug output. > > >> > RHEL ships two firmware flavors for AARCH64: a silent and a verbos= e > > >> > version. > > >> > > >> Are these RELEASE vs DEBUG builds? > > > > > > > > > All builds are DEBUG, just the amount of information printed on > > > the serial is different (almost zero for the "silent" one.) > > > > > >> > > >> > Both were tried. We see no problems with the verbose > > >> > one. The silent one fails noticeably more often if a software TPM > device > > >> > is present. > > >> > > > >> > > >> This smells like some missing cache or TLB maintenance - the verbose > > >> one exits to the host much more often, and likely relies on cache/TL= B > > >> maintenance occurring in the hypervisor. > > >> > > >> So the build always includes TPM support but the issue only occurs > > >> when the sw TPM is actually exposed by QEMU? > > > > > > > > > Yes. > > > All builds include support for TPM, but the issue occurs more > frequently > > > if a sw TPM is exposed by QEMU. > > > > > > > Any chance you could provide a specific command line for launching > > QEMU? I am trying to reproduce this, but I am not making any progress. > > > > >> > > >> > Could this be related to how much stuff is going on in the early > phase > > >> > of the firmware (when logging is enabled: formatting of messages a= nd > > >> > sending to serial port...) ? > > >> > > > >> > > >> I'll try to see if I can rig something up that logs into a buffer > > >> rather than straight to the serial, and dump it all out when handlin= g > > >> the crash > > >> > > > > This takes a bit more time than I can afford to spend on this atm, and > > I'd like to be able to reproduce before I go down this rabbit hole. > > Have there been any developments regarding this issue? > Nothing from my side. I tried to come up with a more reliable/faster reproducer but then stopped because of other stuff. If you have any idea what I could try next let me know. -Oliver --0000000000000f663105f5ead34b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, Mar 2, 2023 at 11:50=E2=80=AFAM Ard Biesheuvel <ardb@kernel.org> wrote:
On Thu, 9 Feb 2023 at 16:15, Ard= Biesheuvel <ardb@k= ernel.org> wrote:
>
> On Tue, 7 Feb 2023 at 13:58, Oliver Steffen <osteffen@redhat.com> wrote:
> >
> > On Tue, Feb 7, 2023 at 12:57 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >>
> >> On Tue, 7 Feb 2023 at 11:51, Oliver Steffen <osteffen@redhat.com> wro= te:
> >> >
> >> > On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen <osteffen@redhat.com= > wrote:
> >> >>
> >> >>
> >> >> On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel <ardb@kernel.org> = wrote:
> >> >>>
> >> >>> On Wed, 1 Feb 2023 at 13:59, Oliver Steffen <= osteffen@redhat.co= m> wrote:
> >> >>> >
> >> >>> > On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheu= vel <ardb@kernel.or= g> wrote:
> >> >>> >>
> >> >>> >> On Wed, 1 Feb 2023 at 10:14, Oliver Ste= ffen <osteffen@= redhat.com> wrote:
> >> >>> >> >
> >> >>
> >> >> [...]
> >> >>>
> >> >>> >> > I am sorry, this story does not se= em to be over yet.
> >> >>> >> >
> >> >>> >> > We are using the Erratum patch and= also included the commit 406504c7 in
> >> >>> >> > the kernel.
> >> >>> >> > Now the firmware crashes sometimes= (10 out of 89 tests).
> >> >>> >> >
> >> >>> >>
> >> >>> >> Thanks for the report. Is this still on= ThunderX2?
> >> >>> >>
> >> >>> >> > Any hints are very welcome!
> >> >>> >> >
> >> >>> >>
> >> >>> >> Do=C2=A0 you have access to those build= artifacts?
> >> >>> >
> >> >>> >
> >> >>> > https://kojihub.s= tream.centos.org/kojifiles/work/tasks/5251/1835251/edk2-aarch64-20221207git= fff6d81270b5-4.el9.test.noarch.rpm
> >> >>> >
> >> >>> > and/or here:
> >> >>> >
> >> >>> > http= s://kojihub.stream.centos.org/koji/taskinfo?taskID=3D1835251
> >> >>> >
> >> >>> > Source for reference:
> >> >>> > = https://gitlab.com/redhat/centos-stream/src/edk2/-/merge_requests/24 > >> >>> >
> >> >>>
> >> >>> Any chance the .dll files (which are actually EL= F executables) have
> >> >>> been preserved somewhere?
> >> >>
> >> >> Here is the build folder (~90MB):
> >> >> https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/armvirt-thun= derx2-issue.tar.xz
> >> >>
> >> >> I am waiting for the tests with the additional debug= output to run.
> >> >
> >> >
> >> > We reran the test suite with the Erratum and the additio= nal debug
> >> > output enabled.=C2=A0 Strangely, the problem does not oc= cur anymore, the
> >> > firmware boots up normally.
> >> >
> >> > We retried the tests without the additional debug output= .
> >> > RHEL ships two firmware flavors for AARCH64: a silent an= d a verbose
> >> > version.
> >>
> >> Are these RELEASE vs DEBUG builds?
> >
> >
> > All builds are DEBUG, just the amount of information printed on > > the serial is different (almost zero for the "silent" o= ne.)
> >
> >>
> >> > Both were tried. We see no problems with the verbose
> >> > one. The silent one fails noticeably more often if a sof= tware TPM device
> >> > is present.
> >> >
> >>
> >> This smells like some missing cache or TLB maintenance - the = verbose
> >> one exits to the host much more often, and likely relies on c= ache/TLB
> >> maintenance occurring in the hypervisor.
> >>
> >> So the build always includes TPM support but the issue only o= ccurs
> >> when the sw TPM is actually exposed by QEMU?
> >
> >
> > Yes.
> > All builds include support for TPM, but the issue occurs more fre= quently
> > if a sw TPM is exposed by QEMU.
> >
>
> Any chance you could provide a specific command line for launching
> QEMU? I am trying to reproduce this, but I am not making any progress.=
>
> >>
> >> > Could this be related to how much stuff is going on in t= he early phase
> >> > of the firmware (when logging is enabled: formatting of = messages and
> >> > sending to serial port...) ?
> >> >
> >>
> >> I'll try to see if I can rig something up that logs into = a buffer
> >> rather than straight to the serial, and dump it all out when = handling
> >> the crash
> >>
>
> This takes a bit more time than I can afford to spend on this atm, and=
> I'd like to be able to reproduce before I go down this rabbit hole= .

Have there been any developments regarding this issue?

Nothing f= rom my side.=C2=A0 I tried to come up with a more reliable/faster reproduce= r
but then= stopped because of other stuff.

If you have any idea what I could try next let me know.

-Oliver
--0000000000000f663105f5ead34b--