public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Daniil Egranov <daniil.egranov@arm.com>
To: Ryan Harkin <ryan.harkin@linaro.org>
Cc: "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	Leif Lindholm <leif.lindholm@linaro.org>
Subject: Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0
Date: Mon, 23 Jan 2017 20:19:31 -0600	[thread overview]
Message-ID: <63991be9-816a-98b2-c3f8-e3cb3e86419c@arm.com> (raw)
In-Reply-To: <CAD0U-h+Z1040kobVGP_Myx+0GSkroTNMdaxPwcz9kX=Qt02chg@mail.gmail.com>

Hi Ryan,


On 01/23/2017 06:56 AM, Ryan Harkin wrote:
> On 20 January 2017 at 20:57, Daniil Egranov <daniil.egranov@arm.com> wrote:
>> Hi Ryan,
>>
>>
>> On 01/20/2017 04:30 AM, Ryan Harkin wrote:
>>
>> On 20 January 2017 at 01:34, Daniil Egranov <daniil.egranov@arm.com> wrote:
>>
>> Hi Leif, Ryan
>>
>>
>> On 01/19/2017 09:13 AM, Leif Lindholm wrote:
>>
>> On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote:
>>
>> On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote:
>>
>> The Marvell Yukon MAC address load supported only on Juno R1 and R2.
>> It disabled for Juno R0 due to PCI issues on this board.
>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Daniil Egranov <daniil.egranov@arm.com>
>>
>> Tested-by: Ryan Harkin <ryan.harkin@linaro.org>
>>
>> ---
>>   ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c
>> b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c
>> index 47ff587..e9e6990 100644
>> --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c
>> +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c
>> @@ -378,6 +378,7 @@ OnEndOfDxe (
>>     EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath;
>>     EFI_HANDLE                Handle;
>>     EFI_STATUS                Status;
>> +  UINT32                    JunoRevision;
>>
>>     //
>>     // PCI Root Complex initialization
>> @@ -393,8 +394,12 @@ OnEndOfDxe (
>>     Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath,
>> FALSE);
>>     ASSERT_EFI_ERROR (Status);
>>
>> -  Status = ArmJunoSetNicMacAddress ();
>> -  ASSERT_EFI_ERROR (Status);
>> +  GetJunoRevision (JunoRevision);
>> +
>> +  if (JunoRevision != JUNO_REVISION_R0) {
>> +    Status = ArmJunoSetNicMacAddress ();
>> +    ASSERT_EFI_ERROR (Status);
>>
>> This is just an FYI, but I stacked your patch on top of mainline, like this:
>>
>> 5f81f61  2017-01-18  ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe:
>> Fixed crash on Juno R0       [Daniil Egranov]
>> 19ca06b  2017-01-19  OvmfPkg: Remove superfluous return statements.
>>   [Thomas Huth]
>>
>> The first time I ran this, Juno R0 worked fine, but on R1 and R2, the
>> assert triggered:
>>
>> UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017)
>> [snip]
>> ASSERT_EFI_ERROR (Status = Not Found)
>> ASSERT [ArmJunoDxe]
>> /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401):
>> !EFI_ERROR (Status)
>>
>> I worked out what is happening. And it's not to do with this patch.
>> It's another fall-out from the re-work you did to the previous patch.
>> It's also ultimately due to a bug the firmware.
>>
>> With the initial version of your "Set Marvell Yukon MAC address"
>> patch, this hang didn't happen. I suspect that was because your error
>> checking was weaker and certain PCIe failures didn't trigger the
>> assert.
>>
>> To reproduce the error with this commit:
>> 1) power on and boot R1 or R2 into Shell
>>    I do this by interrupting the boot by pressing ESCAPE and using the boot
>> menu
>> 2) At the Shell prompt, run "reset -s" to shutdown
>> 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot"
>> 4) the board will hang while booting UEFI, assuming the board firmware
>> doesn't die with constant messages like this:
>>
>>      ERROR: PCIe CSR read failed to respond
>>      ERROR: SMBus transaction not claimed
>>
>> Assuming the problem is firmware, not EDK2, what should we do about it?
>>
>> OK, so instinctively, my reaction was that "the reset -s bug is a
>> system controller firmware bug and we shouldn't work around
>> it". However, since it is actually disrupting Ryan's workflow, which
>> frequently doesn't touch PCI at all, I think downgrading the ASSERT to
>> an error message is a good idea short-term.
>>
>> Daniil - could you make that change please?
>>
>> /
>>      Leif
>>
>>
>> I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus
>> transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon
>> driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error
>> messages during the initial boot.
>>
>> Testing motherboard interfaces (FPGA build 118)...
>> SRAM 32MB test: PASSED
>> LAN9118   test: PASSED
>> KMI1/2    test: PASSED
>> MMC       test: PASSED
>> PB/LEDs   test: PASSED
>> FPGA UART test: PASSED
>> ERROR: PCIe CSR read failed to respond
>> ERROR: SMBus transaction not claimed
>> ERROR: PCIe CSR read failed to respond
>> ...
>>
>> Once it went through reporting these errors, the UEFI starts loading but
>> still fails in OnEndOfDxe():
>> ASSERT_EFI_ERROR (Status = Not Found)
>> ASSERT [ArmJunoDxe]
>> /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110):
>> !EFI_ERROR (Status)
>>
>> This is the original ArmJunoDxe code:
>>    Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath,
>> FALSE);
>>    ASSERT_EFI_ERROR (Status); <---- line 110
>>
>> I'm confused at which point in the commit history this is line 110.
>>
>> Just before your MAC commit, it's line 108. After your commit, and in
>> the current mainline, it's line 394.
>>
>> Then, I have to go back to Jeremy's commit
>> 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see
>> another change to ArmJunoDxe.c. Before that commit, the assert appears
>> on line 116.
>>
>> On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10
>> on line 125.
>>
>> So I'm unsure what code you are running.
>>
>>
>> This shows the code where it fails for me after MAC address code was removed
>> from ArmJunoDxe.c. It's part of my development branch and lines may not
>> matching any public commits
> You should consider switching branch to mainline when trying to
> compare and report problems with other people, then we have a common
> baseline to work with.
>
>
>>   ... but i pulled 16.06 release for a test and
> 16.06? Why so old? What's wrong with 16.12? You are missing firmware
> updates and fixes by using such an old release.

I  wanted to check how long this problem existed. Just picked up one of 
the old builds.

>
>> see the ASSERT on the same code in case of  "PCIe .." and "SMBus .." errors.
>> The line number should be corresponding to
>> 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit:
>>
>> ASSERT_EFI_ERROR (Status = Not Found)
>> ASSERT [ArmJunoDxe]
>> /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124):
>> !EFI_ERROR (Status)
>>
>>
>> but it actually fails here first:
>>    Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid,
>>                                    &PciRootComplexDevicePath,
>>                                    &Handle);
>>
>> Ryan - could you try to remove Marvell patches and check if you also
>> catching "PCIe .." and "SMBus .." errors without them and your build still
>> fails with other ASSERTs related to PCI.
>>
>> Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI
>> enumeration is completely corrupted.
>>
>> These errors do not appear if board was reset with the nPBRESET button. It
>> never fails with "reset -w" and "reset -c". I also loaded Debian and used
>> the shutdown command from there and got the same "PCIe .." and "SMBus .."
>> errors after "reboot" command from the "Cmd>" prompt.  Possibly, the
>> "reboot" command from the board shell prompt does not reset the board
>> correctly so it looks like a firmware issue.
>>
>> Yes, that's pretty much what I've been saying. There appears to be a
>> firmware bug that causes PCIe to be in an unstable state after
>> software shutdown of the AP by the motherboard, followed by reset.
>>
>> The "PCIe CSR read failed to respond" error is nothing to do with EDK2
>> and happens before EDK2 runs.
>>
>> However, unlike you, if I don't get the "PCIe CSR read failed to
>> respond" error, upstream EDK2 before your MAC address commit will boot
>> without ASSERTing on R0/1/2.
>>
>>
>> Do you mean, when you do have the "PCIe .." error ? As i understand, if you
>> power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the MAC
>> address code during the first boot and can start UEFI Shell?
>>
> Correct.
>
>
>> So I've never seen the assert at line 110 as you pointed out above.
>> That's curious. I can see why that would/should assert, but it doesn't
>> for me. Perhaps it's something to do with the tree you are building
>> from? Or the devices you have connected? I have no additional SATA or
>> PCIe devices connected to my board.
>>
>> Cheers,
>> Ryan.
>>
>>
>> I am using R2 for the tests. The tests were done without SATA or PCI devices
>> connected to the board. Unfortunately, I do not have R1 board at this moment
>> to check it.
>>
> That's fine, R2 is good enough.
>
>
>> This is my R2 firmware versions:
>>
>> ARM V2M-Juno Boot loader v1.0.0
>> HBI0262 build 1844
>>
>> ARM V2M_Juno r2 Firmware v1.4.3
> v1.4.4 is the latest Juno firmware. You should update. Although it
> won't fix these problems as it wasn't intended to.  And as you used
> 16.06 for your baseline testing, I assume you probably also have old
> SCP firmware in your environment too.
>
>
>> Build Date: Apr 12 2016
>>
>> MIC RAM configuration (pms_v105.bin)...
>>
>> Configuring motherboard (rev D, var A)...
>> IOFPGA image \MB\HBI0262D\io_b118.bit
>>
>> As Leif suggested, i will replace the ASSERT with an error message and send
>> another patch.
>>
> Thanks, that's all we need to do, I think.
>

Done.

>> Could you please check one thing for me on your boards? Use a build which
>> worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> reboot
>> -> see "PCIe .." and "SMBus .." errors -> boot UEFI.
> After I get the "PCIe .." and "SMBus .." errors, the board will never
> boot into UEFI. It hangs forever and I have to power-cycle the board.
> This rarely happens for me on R2.
>
> Sometimes, I do UEFI shell -> reset -s -> reboot and the board will
> give the "PCIe .." and "SMBus .." error, but most of the time, it does
> not and UEFI continues to boot.

Interesting, if i do reset -s, i am getting "PCIe ..." error  and stop 
with ASSERT error in UEFI all the time.

>
>
>> As i understand, you do
>> not have the ASSERT with your old builds so you can start the UEFI shell.
>> Use the "pci" shell command to list all PCI devices. Do you see them
>> correctly enumerated? I'd like to make sure that by ignoring the ASSERT
>> errors we will not face any further problems with PCI.
>>
> In the case where my board continues to boot after software shutdown,
> the pci shell command shows this:
>
> Shell> pci
>     Seg  Bus  Dev  Func
>     ---  ---  ---  ----
>      00   00   00    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 1556 Device 1100 Prog Interface 0
>      00   01   00    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   01    00 ==> Pre 2.0 device - All devices other than VGA
>               Vendor 0000 Device 0000 Prog Interface 0
>      00   02   02    00 ==> Pre 2.0 device - All devices other than VGA
>               Vendor 0000 Device 0000 Prog Interface 0
>      00   02   03    00 ==> Pre 2.0 device - All devices other than VGA
>               Vendor 0000 Device 0000 Prog Interface 0
>      00   02   0C    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   10    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   1F    00 ==> Pre 2.0 device - All devices other than VGA
>               Vendor 0000 Device 0000 Prog Interface 0
>
> Which is not the same as when the board is power cycled. On power
> cycle, I see this:
>
>     Seg  Bus  Dev  Func
>     ---  ---  ---  ----
>      00   00   00    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 1556 Device 1100 Prog Interface 0
>      00   01   00    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   01    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   02    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   03    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   0C    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   10    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   02   1F    00 ==> Bridge Device - PCI/PCI bridge
>               Vendor 111D Device 8090 Prog Interface 0
>      00   03   00    00 ==> Mass Storage Controller - Other mass storage controll
> er
>               Vendor 1095 Device 3132 Prog Interface 0
>      00   08   00    00 ==> Network Controller - Ethernet controller
>               Vendor 11AB Device 4380 Prog Interface 0
>
> Cheers,
> Ryan.
>
>

If I let UEFI boot to the shell after "PCIe ..." error, the pci shell 
command shows me a long list of unrecognized PCI devices with all 
combinations of BDFs.

Thanks,
Daniil

>> Thanks,
>> Daniil
>>
>>
>> Thanks,
>> Daniil
>>
>>
>> Prior to your "Set Marvell Yukon MAC address" patch, or with the
>> earlier version, the board would boot anyway, but the Yukon device
>> would be missing.
>>
>> Now it dies.
>>
>> I don't know which is worse, but I think hanging is worse than an
>> ethernet port dropping out. Although hanging is a bit more obvious
>> that there's a problem...
>>
>>
>> +  }
>>   }
>>
>>   STATIC
>> --
>> 2.7.4
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



  reply	other threads:[~2017-01-24  2:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-18 23:27 [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 Daniil Egranov
2017-01-19 13:49 ` Ryan Harkin
2017-01-19 15:13   ` Leif Lindholm
2017-01-20  1:34     ` Daniil Egranov
2017-01-20 10:30       ` Ryan Harkin
2017-01-20 20:57         ` Daniil Egranov
2017-01-23 11:26           ` Ryan Harkin
2017-01-23 12:56           ` Ryan Harkin
2017-01-24  2:19             ` Daniil Egranov [this message]
2017-01-24 11:05               ` Ryan Harkin
2017-01-24 12:34                 ` Ryan Harkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63991be9-816a-98b2-c3f8-e3cb3e86419c@arm.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox