On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen <osteffen@redhat.com> wrote:

On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel <ardb@kernel.org> wrote:
On Wed, 1 Feb 2023 at 13:59, Oliver Steffen <osteffen@redhat.com> wrote:
>
> On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> On Wed, 1 Feb 2023 at 10:14, Oliver Steffen <osteffen@redhat.com> wrote:
>> >
[...] 
>> > I am sorry, this story does not seem to be over yet.
>> >
>> > We are using the Erratum patch and also included the commit 406504c7 in
>> > the kernel.
>> > Now the firmware crashes sometimes (10 out of 89 tests).
>> >
>>
>> Thanks for the report. Is this still on ThunderX2?
>>
>> > Any hints are very welcome!
>> >
>>
>> Do  you have access to those build artifacts?
>
>
> https://kojihub.stream.centos.org/kojifiles/work/tasks/5251/1835251/edk2-aarch64-20221207gitfff6d81270b5-4.el9.test.noarch.rpm
>
> and/or here:
>
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=1835251
>
> Source for reference:
> https://gitlab.com/redhat/centos-stream/src/edk2/-/merge_requests/24
>

Any chance the .dll files (which are actually ELF executables) have
been preserved somewhere?

I am waiting for the tests with the additional debug output to run.
 
We reran the test suite with the Erratum and the additional debug
output enabled.  Strangely, the problem does not occur anymore, the
firmware boots up normally.

We retried the tests without the additional debug output.
RHEL ships two firmware flavors for AARCH64: a silent and a verbose
version. Both were tried. We see no problems with the verbose
one. The silent one fails noticeably more often if a software TPM device
is present.

Could this be related to how much stuff is going on in the early phase
of the firmware (when logging is enabled: formatting of messages and
sending to serial port...) ?

Thanks,
  Oliver