* [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 @ 2017-01-18 23:27 Daniil Egranov 2017-01-19 13:49 ` Ryan Harkin 0 siblings, 1 reply; 11+ messages in thread From: Daniil Egranov @ 2017-01-18 23:27 UTC (permalink / raw) To: edk2-devel; +Cc: leif.lindholm, ryan.harkin, Daniil Egranov The Marvell Yukon MAC address load supported only on Juno R1 and R2. It disabled for Juno R0 due to PCI issues on this board. Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> --- ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c index 47ff587..e9e6990 100644 --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c @@ -378,6 +378,7 @@ OnEndOfDxe ( EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; EFI_HANDLE Handle; EFI_STATUS Status; + UINT32 JunoRevision; // // PCI Root Complex initialization @@ -393,8 +394,12 @@ OnEndOfDxe ( Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, FALSE); ASSERT_EFI_ERROR (Status); - Status = ArmJunoSetNicMacAddress (); - ASSERT_EFI_ERROR (Status); + GetJunoRevision (JunoRevision); + + if (JunoRevision != JUNO_REVISION_R0) { + Status = ArmJunoSetNicMacAddress (); + ASSERT_EFI_ERROR (Status); + } } STATIC -- 2.7.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-18 23:27 [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 Daniil Egranov @ 2017-01-19 13:49 ` Ryan Harkin 2017-01-19 15:13 ` Leif Lindholm 0 siblings, 1 reply; 11+ messages in thread From: Ryan Harkin @ 2017-01-19 13:49 UTC (permalink / raw) To: Daniil Egranov; +Cc: edk2-devel@lists.01.org, Leif Lindholm On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote: > The Marvell Yukon MAC address load supported only on Juno R1 and R2. > It disabled for Juno R0 due to PCI issues on this board. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> Tested-by: Ryan Harkin <ryan.harkin@linaro.org> > --- > ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > index 47ff587..e9e6990 100644 > --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > @@ -378,6 +378,7 @@ OnEndOfDxe ( > EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; > EFI_HANDLE Handle; > EFI_STATUS Status; > + UINT32 JunoRevision; > > // > // PCI Root Complex initialization > @@ -393,8 +394,12 @@ OnEndOfDxe ( > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, FALSE); > ASSERT_EFI_ERROR (Status); > > - Status = ArmJunoSetNicMacAddress (); > - ASSERT_EFI_ERROR (Status); > + GetJunoRevision (JunoRevision); > + > + if (JunoRevision != JUNO_REVISION_R0) { > + Status = ArmJunoSetNicMacAddress (); > + ASSERT_EFI_ERROR (Status); This is just an FYI, but I stacked your patch on top of mainline, like this: 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 [Daniil Egranov] 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. [Thomas Huth] The first time I ran this, Juno R0 worked fine, but on R1 and R2, the assert triggered: UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) [snip] ASSERT_EFI_ERROR (Status = Not Found) ASSERT [ArmJunoDxe] /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): !EFI_ERROR (Status) I worked out what is happening. And it's not to do with this patch. It's another fall-out from the re-work you did to the previous patch. It's also ultimately due to a bug the firmware. With the initial version of your "Set Marvell Yukon MAC address" patch, this hang didn't happen. I suspect that was because your error checking was weaker and certain PCIe failures didn't trigger the assert. To reproduce the error with this commit: 1) power on and boot R1 or R2 into Shell I do this by interrupting the boot by pressing ESCAPE and using the boot menu 2) At the Shell prompt, run "reset -s" to shutdown 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" 4) the board will hang while booting UEFI, assuming the board firmware doesn't die with constant messages like this: ERROR: PCIe CSR read failed to respond ERROR: SMBus transaction not claimed Assuming the problem is firmware, not EDK2, what should we do about it? Prior to your "Set Marvell Yukon MAC address" patch, or with the earlier version, the board would boot anyway, but the Yukon device would be missing. Now it dies. I don't know which is worse, but I think hanging is worse than an ethernet port dropping out. Although hanging is a bit more obvious that there's a problem... > + } > } > > STATIC > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-19 13:49 ` Ryan Harkin @ 2017-01-19 15:13 ` Leif Lindholm 2017-01-20 1:34 ` Daniil Egranov 0 siblings, 1 reply; 11+ messages in thread From: Leif Lindholm @ 2017-01-19 15:13 UTC (permalink / raw) To: Ryan Harkin, Daniil Egranov; +Cc: edk2-devel@lists.01.org On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: > On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote: > > The Marvell Yukon MAC address load supported only on Juno R1 and R2. > > It disabled for Juno R0 due to PCI issues on this board. > > > > Contributed-under: TianoCore Contribution Agreement 1.0 > > Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> > > Tested-by: Ryan Harkin <ryan.harkin@linaro.org> > > > --- > > ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > > index 47ff587..e9e6990 100644 > > --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > > +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > > @@ -378,6 +378,7 @@ OnEndOfDxe ( > > EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; > > EFI_HANDLE Handle; > > EFI_STATUS Status; > > + UINT32 JunoRevision; > > > > // > > // PCI Root Complex initialization > > @@ -393,8 +394,12 @@ OnEndOfDxe ( > > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, FALSE); > > ASSERT_EFI_ERROR (Status); > > > > - Status = ArmJunoSetNicMacAddress (); > > - ASSERT_EFI_ERROR (Status); > > + GetJunoRevision (JunoRevision); > > + > > + if (JunoRevision != JUNO_REVISION_R0) { > > + Status = ArmJunoSetNicMacAddress (); > > + ASSERT_EFI_ERROR (Status); > > This is just an FYI, but I stacked your patch on top of mainline, like this: > > 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: > Fixed crash on Juno R0 [Daniil Egranov] > 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. > [Thomas Huth] > > The first time I ran this, Juno R0 worked fine, but on R1 and R2, the > assert triggered: > > UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) > [snip] > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): > !EFI_ERROR (Status) > > I worked out what is happening. And it's not to do with this patch. > It's another fall-out from the re-work you did to the previous patch. > It's also ultimately due to a bug the firmware. > > With the initial version of your "Set Marvell Yukon MAC address" > patch, this hang didn't happen. I suspect that was because your error > checking was weaker and certain PCIe failures didn't trigger the > assert. > > To reproduce the error with this commit: > 1) power on and boot R1 or R2 into Shell > I do this by interrupting the boot by pressing ESCAPE and using the boot menu > 2) At the Shell prompt, run "reset -s" to shutdown > 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" > 4) the board will hang while booting UEFI, assuming the board firmware > doesn't die with constant messages like this: > > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > > Assuming the problem is firmware, not EDK2, what should we do about it? OK, so instinctively, my reaction was that "the reset -s bug is a system controller firmware bug and we shouldn't work around it". However, since it is actually disrupting Ryan's workflow, which frequently doesn't touch PCI at all, I think downgrading the ASSERT to an error message is a good idea short-term. Daniil - could you make that change please? / Leif > Prior to your "Set Marvell Yukon MAC address" patch, or with the > earlier version, the board would boot anyway, but the Yukon device > would be missing. > > Now it dies. > > I don't know which is worse, but I think hanging is worse than an > ethernet port dropping out. Although hanging is a bit more obvious > that there's a problem... > > > > + } > > } > > > > STATIC > > -- > > 2.7.4 > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-19 15:13 ` Leif Lindholm @ 2017-01-20 1:34 ` Daniil Egranov 2017-01-20 10:30 ` Ryan Harkin 0 siblings, 1 reply; 11+ messages in thread From: Daniil Egranov @ 2017-01-20 1:34 UTC (permalink / raw) To: Leif Lindholm, Ryan Harkin; +Cc: edk2-devel@lists.01.org Hi Leif, Ryan On 01/19/2017 09:13 AM, Leif Lindholm wrote: > On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: >> On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote: >>> The Marvell Yukon MAC address load supported only on Juno R1 and R2. >>> It disabled for Juno R0 due to PCI issues on this board. >>> >>> Contributed-under: TianoCore Contribution Agreement 1.0 >>> Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> >> Tested-by: Ryan Harkin <ryan.harkin@linaro.org> >> >>> --- >>> ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>> index 47ff587..e9e6990 100644 >>> --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>> +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>> @@ -378,6 +378,7 @@ OnEndOfDxe ( >>> EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; >>> EFI_HANDLE Handle; >>> EFI_STATUS Status; >>> + UINT32 JunoRevision; >>> >>> // >>> // PCI Root Complex initialization >>> @@ -393,8 +394,12 @@ OnEndOfDxe ( >>> Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, FALSE); >>> ASSERT_EFI_ERROR (Status); >>> >>> - Status = ArmJunoSetNicMacAddress (); >>> - ASSERT_EFI_ERROR (Status); >>> + GetJunoRevision (JunoRevision); >>> + >>> + if (JunoRevision != JUNO_REVISION_R0) { >>> + Status = ArmJunoSetNicMacAddress (); >>> + ASSERT_EFI_ERROR (Status); >> This is just an FYI, but I stacked your patch on top of mainline, like this: >> >> 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: >> Fixed crash on Juno R0 [Daniil Egranov] >> 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. >> [Thomas Huth] >> >> The first time I ran this, Juno R0 worked fine, but on R1 and R2, the >> assert triggered: >> >> UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) >> [snip] >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): >> !EFI_ERROR (Status) >> >> I worked out what is happening. And it's not to do with this patch. >> It's another fall-out from the re-work you did to the previous patch. >> It's also ultimately due to a bug the firmware. >> >> With the initial version of your "Set Marvell Yukon MAC address" >> patch, this hang didn't happen. I suspect that was because your error >> checking was weaker and certain PCIe failures didn't trigger the >> assert. >> >> To reproduce the error with this commit: >> 1) power on and boot R1 or R2 into Shell >> I do this by interrupting the boot by pressing ESCAPE and using the boot menu >> 2) At the Shell prompt, run "reset -s" to shutdown >> 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" >> 4) the board will hang while booting UEFI, assuming the board firmware >> doesn't die with constant messages like this: >> >> ERROR: PCIe CSR read failed to respond >> ERROR: SMBus transaction not claimed >> >> Assuming the problem is firmware, not EDK2, what should we do about it? > OK, so instinctively, my reaction was that "the reset -s bug is a > system controller firmware bug and we shouldn't work around > it". However, since it is actually disrupting Ryan's workflow, which > frequently doesn't touch PCI at all, I think downgrading the ASSERT to > an error message is a good idea short-term. > > Daniil - could you make that change please? > > / > Leif I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error messages during the initial boot. Testing motherboard interfaces (FPGA build 118)... SRAM 32MB test: PASSED LAN9118 test: PASSED KMI1/2 test: PASSED MMC test: PASSED PB/LEDs test: PASSED FPGA UART test: PASSED ERROR: PCIe CSR read failed to respond ERROR: SMBus transaction not claimed ERROR: PCIe CSR read failed to respond ... Once it went through reporting these errors, the UEFI starts loading but still fails in OnEndOfDxe(): ASSERT_EFI_ERROR (Status = Not Found) ASSERT [ArmJunoDxe] /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): !EFI_ERROR (Status) This is the original ArmJunoDxe code: Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, FALSE); ASSERT_EFI_ERROR (Status); <---- line 110 but it actually fails here first: Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, &PciRootComplexDevicePath, &Handle); Ryan - could you try to remove Marvell patches and check if you also catching "PCIe .." and "SMBus .." errors without them and your build still fails with other ASSERTs related to PCI. Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI enumeration is completely corrupted. These errors do not appear if board was reset with the nPBRESET button. It never fails with "reset -w" and "reset -c". I also loaded Debian and used the shutdown command from there and got the same "PCIe .." and "SMBus .." errors after "reboot" command from the "Cmd>" prompt. Possibly, the "reboot" command from the board shell prompt doesnot reset the board correctly so it looks like a firmware issue. Thanks, Daniil >> Prior to your "Set Marvell Yukon MAC address" patch, or with the >> earlier version, the board would boot anyway, but the Yukon device >> would be missing. >> >> Now it dies. >> >> I don't know which is worse, but I think hanging is worse than an >> ethernet port dropping out. Although hanging is a bit more obvious >> that there's a problem... >> >> >>> + } >>> } >>> >>> STATIC >>> -- >>> 2.7.4 >>> > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-20 1:34 ` Daniil Egranov @ 2017-01-20 10:30 ` Ryan Harkin 2017-01-20 20:57 ` Daniil Egranov 0 siblings, 1 reply; 11+ messages in thread From: Ryan Harkin @ 2017-01-20 10:30 UTC (permalink / raw) To: Daniil Egranov; +Cc: Leif Lindholm, edk2-devel@lists.01.org On 20 January 2017 at 01:34, Daniil Egranov <daniil.egranov@arm.com> wrote: > Hi Leif, Ryan > > > On 01/19/2017 09:13 AM, Leif Lindholm wrote: > > On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: > > On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote: > > The Marvell Yukon MAC address load supported only on Juno R1 and R2. > It disabled for Juno R0 due to PCI issues on this board. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> > > Tested-by: Ryan Harkin <ryan.harkin@linaro.org> > > --- > ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > index 47ff587..e9e6990 100644 > --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > @@ -378,6 +378,7 @@ OnEndOfDxe ( > EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; > EFI_HANDLE Handle; > EFI_STATUS Status; > + UINT32 JunoRevision; > > // > // PCI Root Complex initialization > @@ -393,8 +394,12 @@ OnEndOfDxe ( > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, > FALSE); > ASSERT_EFI_ERROR (Status); > > - Status = ArmJunoSetNicMacAddress (); > - ASSERT_EFI_ERROR (Status); > + GetJunoRevision (JunoRevision); > + > + if (JunoRevision != JUNO_REVISION_R0) { > + Status = ArmJunoSetNicMacAddress (); > + ASSERT_EFI_ERROR (Status); > > This is just an FYI, but I stacked your patch on top of mainline, like this: > > 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: > Fixed crash on Juno R0 [Daniil Egranov] > 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. > [Thomas Huth] > > The first time I ran this, Juno R0 worked fine, but on R1 and R2, the > assert triggered: > > UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) > [snip] > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): > !EFI_ERROR (Status) > > I worked out what is happening. And it's not to do with this patch. > It's another fall-out from the re-work you did to the previous patch. > It's also ultimately due to a bug the firmware. > > With the initial version of your "Set Marvell Yukon MAC address" > patch, this hang didn't happen. I suspect that was because your error > checking was weaker and certain PCIe failures didn't trigger the > assert. > > To reproduce the error with this commit: > 1) power on and boot R1 or R2 into Shell > I do this by interrupting the boot by pressing ESCAPE and using the boot > menu > 2) At the Shell prompt, run "reset -s" to shutdown > 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" > 4) the board will hang while booting UEFI, assuming the board firmware > doesn't die with constant messages like this: > > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > > Assuming the problem is firmware, not EDK2, what should we do about it? > > OK, so instinctively, my reaction was that "the reset -s bug is a > system controller firmware bug and we shouldn't work around > it". However, since it is actually disrupting Ryan's workflow, which > frequently doesn't touch PCI at all, I think downgrading the ASSERT to > an error message is a good idea short-term. > > Daniil - could you make that change please? > > / > Leif > > > I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus > transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon > driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error > messages during the initial boot. > > Testing motherboard interfaces (FPGA build 118)... > SRAM 32MB test: PASSED > LAN9118 test: PASSED > KMI1/2 test: PASSED > MMC test: PASSED > PB/LEDs test: PASSED > FPGA UART test: PASSED > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > ERROR: PCIe CSR read failed to respond > ... > > Once it went through reporting these errors, the UEFI starts loading but > still fails in OnEndOfDxe(): > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): > !EFI_ERROR (Status) > > This is the original ArmJunoDxe code: > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, > FALSE); > ASSERT_EFI_ERROR (Status); <---- line 110 I'm confused at which point in the commit history this is line 110. Just before your MAC commit, it's line 108. After your commit, and in the current mainline, it's line 394. Then, I have to go back to Jeremy's commit 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see another change to ArmJunoDxe.c. Before that commit, the assert appears on line 116. On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 on line 125. So I'm unsure what code you are running. > > but it actually fails here first: > Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, > &PciRootComplexDevicePath, > &Handle); > > Ryan - could you try to remove Marvell patches and check if you also > catching "PCIe .." and "SMBus .." errors without them and your build still > fails with other ASSERTs related to PCI. > > Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI > enumeration is completely corrupted. > > These errors do not appear if board was reset with the nPBRESET button. It > never fails with "reset -w" and "reset -c". I also loaded Debian and used > the shutdown command from there and got the same "PCIe .." and "SMBus .." > errors after "reboot" command from the "Cmd>" prompt. Possibly, the > "reboot" command from the board shell prompt does not reset the board > correctly so it looks like a firmware issue. > Yes, that's pretty much what I've been saying. There appears to be a firmware bug that causes PCIe to be in an unstable state after software shutdown of the AP by the motherboard, followed by reset. The "PCIe CSR read failed to respond" error is nothing to do with EDK2 and happens before EDK2 runs. However, unlike you, if I don't get the "PCIe CSR read failed to respond" error, upstream EDK2 before your MAC address commit will boot without ASSERTing on R0/1/2. So I've never seen the assert at line 110 as you pointed out above. That's curious. I can see why that would/should assert, but it doesn't for me. Perhaps it's something to do with the tree you are building from? Or the devices you have connected? I have no additional SATA or PCIe devices connected to my board. Cheers, Ryan. > Thanks, > Daniil > > > Prior to your "Set Marvell Yukon MAC address" patch, or with the > earlier version, the board would boot anyway, but the Yukon device > would be missing. > > Now it dies. > > I don't know which is worse, but I think hanging is worse than an > ethernet port dropping out. Although hanging is a bit more obvious > that there's a problem... > > > + } > } > > STATIC > -- > 2.7.4 > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-20 10:30 ` Ryan Harkin @ 2017-01-20 20:57 ` Daniil Egranov 2017-01-23 11:26 ` Ryan Harkin 2017-01-23 12:56 ` Ryan Harkin 0 siblings, 2 replies; 11+ messages in thread From: Daniil Egranov @ 2017-01-20 20:57 UTC (permalink / raw) To: Ryan Harkin; +Cc: edk2-devel@lists.01.org, Leif Lindholm Hi Ryan, On 01/20/2017 04:30 AM, Ryan Harkin wrote: > On 20 January 2017 at 01:34, Daniil Egranov <daniil.egranov@arm.com> wrote: >> Hi Leif, Ryan >> >> >> On 01/19/2017 09:13 AM, Leif Lindholm wrote: >> >> On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: >> >> On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote: >> >> The Marvell Yukon MAC address load supported only on Juno R1 and R2. >> It disabled for Juno R0 due to PCI issues on this board. >> >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> >> >> Tested-by: Ryan Harkin <ryan.harkin@linaro.org> >> >> --- >> ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> index 47ff587..e9e6990 100644 >> --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> @@ -378,6 +378,7 @@ OnEndOfDxe ( >> EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; >> EFI_HANDLE Handle; >> EFI_STATUS Status; >> + UINT32 JunoRevision; >> >> // >> // PCI Root Complex initialization >> @@ -393,8 +394,12 @@ OnEndOfDxe ( >> Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, >> FALSE); >> ASSERT_EFI_ERROR (Status); >> >> - Status = ArmJunoSetNicMacAddress (); >> - ASSERT_EFI_ERROR (Status); >> + GetJunoRevision (JunoRevision); >> + >> + if (JunoRevision != JUNO_REVISION_R0) { >> + Status = ArmJunoSetNicMacAddress (); >> + ASSERT_EFI_ERROR (Status); >> >> This is just an FYI, but I stacked your patch on top of mainline, like this: >> >> 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: >> Fixed crash on Juno R0 [Daniil Egranov] >> 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. >> [Thomas Huth] >> >> The first time I ran this, Juno R0 worked fine, but on R1 and R2, the >> assert triggered: >> >> UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) >> [snip] >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): >> !EFI_ERROR (Status) >> >> I worked out what is happening. And it's not to do with this patch. >> It's another fall-out from the re-work you did to the previous patch. >> It's also ultimately due to a bug the firmware. >> >> With the initial version of your "Set Marvell Yukon MAC address" >> patch, this hang didn't happen. I suspect that was because your error >> checking was weaker and certain PCIe failures didn't trigger the >> assert. >> >> To reproduce the error with this commit: >> 1) power on and boot R1 or R2 into Shell >> I do this by interrupting the boot by pressing ESCAPE and using the boot >> menu >> 2) At the Shell prompt, run "reset -s" to shutdown >> 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" >> 4) the board will hang while booting UEFI, assuming the board firmware >> doesn't die with constant messages like this: >> >> ERROR: PCIe CSR read failed to respond >> ERROR: SMBus transaction not claimed >> >> Assuming the problem is firmware, not EDK2, what should we do about it? >> >> OK, so instinctively, my reaction was that "the reset -s bug is a >> system controller firmware bug and we shouldn't work around >> it". However, since it is actually disrupting Ryan's workflow, which >> frequently doesn't touch PCI at all, I think downgrading the ASSERT to >> an error message is a good idea short-term. >> >> Daniil - could you make that change please? >> >> / >> Leif >> >> >> I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus >> transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon >> driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error >> messages during the initial boot. >> >> Testing motherboard interfaces (FPGA build 118)... >> SRAM 32MB test: PASSED >> LAN9118 test: PASSED >> KMI1/2 test: PASSED >> MMC test: PASSED >> PB/LEDs test: PASSED >> FPGA UART test: PASSED >> ERROR: PCIe CSR read failed to respond >> ERROR: SMBus transaction not claimed >> ERROR: PCIe CSR read failed to respond >> ... >> >> Once it went through reporting these errors, the UEFI starts loading but >> still fails in OnEndOfDxe(): >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): >> !EFI_ERROR (Status) >> >> This is the original ArmJunoDxe code: >> Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, >> FALSE); >> ASSERT_EFI_ERROR (Status); <---- line 110 > I'm confused at which point in the commit history this is line 110. > > Just before your MAC commit, it's line 108. After your commit, and in > the current mainline, it's line 394. > > Then, I have to go back to Jeremy's commit > 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see > another change to ArmJunoDxe.c. Before that commit, the assert appears > on line 116. > > On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 > on line 125. > > So I'm unsure what code you are running. > This shows the code where it fails for me after MAC address code was removed from ArmJunoDxe.c. It's part of my development branch and lines may not matching any public commits ... but i pulled 16.06 release for a test and see the ASSERT on the same code in case of "PCIe .." and "SMBus .." errors. The line number should be corresponding to 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit: ASSERT_EFI_ERROR (Status = Not Found) ASSERT [ArmJunoDxe] /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124): !EFI_ERROR (Status) > >> but it actually fails here first: >> Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, >> &PciRootComplexDevicePath, >> &Handle); >> >> Ryan - could you try to remove Marvell patches and check if you also >> catching "PCIe .." and "SMBus .." errors without them and your build still >> fails with other ASSERTs related to PCI. >> >> Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI >> enumeration is completely corrupted. >> >> These errors do not appear if board was reset with the nPBRESET button. It >> never fails with "reset -w" and "reset -c". I also loaded Debian and used >> the shutdown command from there and got the same "PCIe .." and "SMBus .." >> errors after "reboot" command from the "Cmd>" prompt. Possibly, the >> "reboot" command from the board shell prompt does not reset the board >> correctly so it looks like a firmware issue. >> > Yes, that's pretty much what I've been saying. There appears to be a > firmware bug that causes PCIe to be in an unstable state after > software shutdown of the AP by the motherboard, followed by reset. > > The "PCIe CSR read failed to respond" error is nothing to do with EDK2 > and happens before EDK2 runs. > > However, unlike you, if I don't get the "PCIe CSR read failed to > respond" error, upstream EDK2 before your MAC address commit will boot > without ASSERTing on R0/1/2. Do you mean, when you do have the "PCIe .." error ? As i understand, if you power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the MAC address code during the first boot and can start UEFI Shell? > So I've never seen the assert at line 110 as you pointed out above. > That's curious. I can see why that would/should assert, but it doesn't > for me. Perhaps it's something to do with the tree you are building > from? Or the devices you have connected? I have no additional SATA or > PCIe devices connected to my board. > > Cheers, > Ryan. I am using R2 for the tests. The tests were done without SATA or PCI devices connected to the board. Unfortunately, I do not have R1 board at this moment to check it. This is my R2 firmware versions: ARM V2M-Juno Boot loader v1.0.0 HBI0262 build 1844 ARM V2M_Juno r2 Firmware v1.4.3 Build Date: Apr 12 2016 MIC RAM configuration (pms_v105.bin)... Configuring motherboard (rev D, var A)... IOFPGA image \MB\HBI0262D\io_b118.bit As Leif suggested, i will replace the ASSERT with an error message and send another patch. Could you please check one thing for me on your boards? Use a build which worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> reboot -> see "PCIe .." and "SMBus .." errors -> boot UEFI. As i understand, you do not have the ASSERT with your old builds so you can start the UEFI shell. Use the "pci" shell command to list all PCI devices. Do you see them correctly enumerated? I'd like to make sure that by ignoring the ASSERT errors we will not face any further problems with PCI. Thanks, Daniil >> Thanks, >> Daniil >> >> >> Prior to your "Set Marvell Yukon MAC address" patch, or with the >> earlier version, the board would boot anyway, but the Yukon device >> would be missing. >> >> Now it dies. >> >> I don't know which is worse, but I think hanging is worse than an >> ethernet port dropping out. Although hanging is a bit more obvious >> that there's a problem... >> >> >> + } >> } >> >> STATIC >> -- >> 2.7.4 >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel >> >> > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-20 20:57 ` Daniil Egranov @ 2017-01-23 11:26 ` Ryan Harkin 2017-01-23 12:56 ` Ryan Harkin 1 sibling, 0 replies; 11+ messages in thread From: Ryan Harkin @ 2017-01-23 11:26 UTC (permalink / raw) To: Daniil Egranov; +Cc: edk2-devel@lists.01.org, Leif Lindholm On 20 January 2017 at 20:57, Daniil Egranov <daniil.egranov@arm.com> wrote: > Hi Ryan, > > > On 01/20/2017 04:30 AM, Ryan Harkin wrote: > > On 20 January 2017 at 01:34, Daniil Egranov <daniil.egranov@arm.com> wrote: > > Hi Leif, Ryan > > > On 01/19/2017 09:13 AM, Leif Lindholm wrote: > > On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: > > On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote: > > The Marvell Yukon MAC address load supported only on Juno R1 and R2. > It disabled for Juno R0 due to PCI issues on this board. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> > > Tested-by: Ryan Harkin <ryan.harkin@linaro.org> > > --- > ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > index 47ff587..e9e6990 100644 > --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > @@ -378,6 +378,7 @@ OnEndOfDxe ( > EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; > EFI_HANDLE Handle; > EFI_STATUS Status; > + UINT32 JunoRevision; > > // > // PCI Root Complex initialization > @@ -393,8 +394,12 @@ OnEndOfDxe ( > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, > FALSE); > ASSERT_EFI_ERROR (Status); > > - Status = ArmJunoSetNicMacAddress (); > - ASSERT_EFI_ERROR (Status); > + GetJunoRevision (JunoRevision); > + > + if (JunoRevision != JUNO_REVISION_R0) { > + Status = ArmJunoSetNicMacAddress (); > + ASSERT_EFI_ERROR (Status); > > This is just an FYI, but I stacked your patch on top of mainline, like this: > > 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: > Fixed crash on Juno R0 [Daniil Egranov] > 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. > [Thomas Huth] > > The first time I ran this, Juno R0 worked fine, but on R1 and R2, the > assert triggered: > > UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) > [snip] > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): > !EFI_ERROR (Status) > > I worked out what is happening. And it's not to do with this patch. > It's another fall-out from the re-work you did to the previous patch. > It's also ultimately due to a bug the firmware. > > With the initial version of your "Set Marvell Yukon MAC address" > patch, this hang didn't happen. I suspect that was because your error > checking was weaker and certain PCIe failures didn't trigger the > assert. > > To reproduce the error with this commit: > 1) power on and boot R1 or R2 into Shell > I do this by interrupting the boot by pressing ESCAPE and using the boot > menu > 2) At the Shell prompt, run "reset -s" to shutdown > 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" > 4) the board will hang while booting UEFI, assuming the board firmware > doesn't die with constant messages like this: > > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > > Assuming the problem is firmware, not EDK2, what should we do about it? > > OK, so instinctively, my reaction was that "the reset -s bug is a > system controller firmware bug and we shouldn't work around > it". However, since it is actually disrupting Ryan's workflow, which > frequently doesn't touch PCI at all, I think downgrading the ASSERT to > an error message is a good idea short-term. > > Daniil - could you make that change please? > > / > Leif > > > I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus > transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon > driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error > messages during the initial boot. > > Testing motherboard interfaces (FPGA build 118)... > SRAM 32MB test: PASSED > LAN9118 test: PASSED > KMI1/2 test: PASSED > MMC test: PASSED > PB/LEDs test: PASSED > FPGA UART test: PASSED > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > ERROR: PCIe CSR read failed to respond > ... > > Once it went through reporting these errors, the UEFI starts loading but > still fails in OnEndOfDxe(): > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): > !EFI_ERROR (Status) > > This is the original ArmJunoDxe code: > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, > FALSE); > ASSERT_EFI_ERROR (Status); <---- line 110 > > I'm confused at which point in the commit history this is line 110. > > Just before your MAC commit, it's line 108. After your commit, and in > the current mainline, it's line 394. > > Then, I have to go back to Jeremy's commit > 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see > another change to ArmJunoDxe.c. Before that commit, the assert appears > on line 116. > > On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 > on line 125. > > So I'm unsure what code you are running. > > > This shows the code where it fails for me after MAC address code was removed > from ArmJunoDxe.c. It's part of my development branch and lines may not > matching any public commits You should consider switching branch to mainline when trying to compare and report problems with other people, then we have a common baseline to work with. > ... but i pulled 16.06 release for a test and 16.06? Why so old? What's wrong with 16.12? You are missing firmware updates and fixes by using such an old release. > see the ASSERT on the same code in case of "PCIe .." and "SMBus .." errors. > The line number should be corresponding to > 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit: > > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124): > !EFI_ERROR (Status) > > > but it actually fails here first: > Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, > &PciRootComplexDevicePath, > &Handle); > > Ryan - could you try to remove Marvell patches and check if you also > catching "PCIe .." and "SMBus .." errors without them and your build still > fails with other ASSERTs related to PCI. > > Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI > enumeration is completely corrupted. > > These errors do not appear if board was reset with the nPBRESET button. It > never fails with "reset -w" and "reset -c". I also loaded Debian and used > the shutdown command from there and got the same "PCIe .." and "SMBus .." > errors after "reboot" command from the "Cmd>" prompt. Possibly, the > "reboot" command from the board shell prompt does not reset the board > correctly so it looks like a firmware issue. > > Yes, that's pretty much what I've been saying. There appears to be a > firmware bug that causes PCIe to be in an unstable state after > software shutdown of the AP by the motherboard, followed by reset. > > The "PCIe CSR read failed to respond" error is nothing to do with EDK2 > and happens before EDK2 runs. > > However, unlike you, if I don't get the "PCIe CSR read failed to > respond" error, upstream EDK2 before your MAC address commit will boot > without ASSERTing on R0/1/2. > > > Do you mean, when you do have the "PCIe .." error ? As i understand, if you > power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the MAC > address code during the first boot and can start UEFI Shell? > Correct. > So I've never seen the assert at line 110 as you pointed out above. > That's curious. I can see why that would/should assert, but it doesn't > for me. Perhaps it's something to do with the tree you are building > from? Or the devices you have connected? I have no additional SATA or > PCIe devices connected to my board. > > Cheers, > Ryan. > > > I am using R2 for the tests. The tests were done without SATA or PCI devices > connected to the board. Unfortunately, I do not have R1 board at this moment > to check it. > That's fine, R2 is good enough. > This is my R2 firmware versions: > > ARM V2M-Juno Boot loader v1.0.0 > HBI0262 build 1844 > > ARM V2M_Juno r2 Firmware v1.4.3 v1.4.4 is the latest Juno firmware. You should update. Although it won't fix these problems as it wasn't intended to. And as you used 16.06 for your baseline testing, I assume you also have old SCP firmware in your environment too. > Build Date: Apr 12 2016 > > MIC RAM configuration (pms_v105.bin)... > > Configuring motherboard (rev D, var A)... > IOFPGA image \MB\HBI0262D\io_b118.bit > > As Leif suggested, i will replace the ASSERT with an error message and send > another patch. > Thanks, that's all we need to do, I think. > Could you please check one thing for me on your boards? Use a build which > worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> reboot > -> see "PCIe .." and "SMBus .." errors -> boot UEFI. After I get the "PCIe .." and "SMBus .." errors, the board will never boot into UEFI. It hangs forever and I have to power-cycle the board. This rarely happens for me on R2. Sometimes, I do UEFI shell -> reset -s -> reboot and the board will give the "PCIe .." and "SMBus .." error, but most of the time, it does not and UEFI continues to boot. > As i understand, you do > not have the ASSERT with your old builds so you can start the UEFI shell. > Use the "pci" shell command to list all PCI devices. Do you see them > correctly enumerated? I'd like to make sure that by ignoring the ASSERT > errors we will not face any further problems with PCI. > In the case where my board continues to boot after software shutdown, the pci shell command shows this: Shell> pci Seg Bus Dev Func --- --- --- ---- 00 00 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 1556 Device 1100 Prog Interface 0 00 01 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 01 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 02 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 03 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 10 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 1F 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 Which is not the same as when the board is power cycled. On power cycle, I see this: Seg Bus Dev Func --- --- --- ---- 00 00 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 1556 Device 1100 Prog Interface 0 00 01 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 01 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 02 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 03 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 10 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 1F 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 03 00 00 ==> Mass Storage Controller - Other mass storage controll er Vendor 1095 Device 3132 Prog Interface 0 00 08 00 00 ==> Network Controller - Ethernet controller Vendor 11AB Device 4380 Prog Interface 0 Cheers, Ryan. > Thanks, > Daniil > > > Thanks, > Daniil > > > Prior to your "Set Marvell Yukon MAC address" patch, or with the > earlier version, the board would boot anyway, but the Yukon device > would be missing. > > Now it dies. > > I don't know which is worse, but I think hanging is worse than an > ethernet port dropping out. Although hanging is a bit more obvious > that there's a problem... > > > + } > } > > STATIC > -- > 2.7.4 > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-20 20:57 ` Daniil Egranov 2017-01-23 11:26 ` Ryan Harkin @ 2017-01-23 12:56 ` Ryan Harkin 2017-01-24 2:19 ` Daniil Egranov 1 sibling, 1 reply; 11+ messages in thread From: Ryan Harkin @ 2017-01-23 12:56 UTC (permalink / raw) To: Daniil Egranov; +Cc: edk2-devel@lists.01.org, Leif Lindholm On 20 January 2017 at 20:57, Daniil Egranov <daniil.egranov@arm.com> wrote: > Hi Ryan, > > > On 01/20/2017 04:30 AM, Ryan Harkin wrote: > > On 20 January 2017 at 01:34, Daniil Egranov <daniil.egranov@arm.com> wrote: > > Hi Leif, Ryan > > > On 01/19/2017 09:13 AM, Leif Lindholm wrote: > > On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: > > On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote: > > The Marvell Yukon MAC address load supported only on Juno R1 and R2. > It disabled for Juno R0 due to PCI issues on this board. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> > > Tested-by: Ryan Harkin <ryan.harkin@linaro.org> > > --- > ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > index 47ff587..e9e6990 100644 > --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > @@ -378,6 +378,7 @@ OnEndOfDxe ( > EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; > EFI_HANDLE Handle; > EFI_STATUS Status; > + UINT32 JunoRevision; > > // > // PCI Root Complex initialization > @@ -393,8 +394,12 @@ OnEndOfDxe ( > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, > FALSE); > ASSERT_EFI_ERROR (Status); > > - Status = ArmJunoSetNicMacAddress (); > - ASSERT_EFI_ERROR (Status); > + GetJunoRevision (JunoRevision); > + > + if (JunoRevision != JUNO_REVISION_R0) { > + Status = ArmJunoSetNicMacAddress (); > + ASSERT_EFI_ERROR (Status); > > This is just an FYI, but I stacked your patch on top of mainline, like this: > > 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: > Fixed crash on Juno R0 [Daniil Egranov] > 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. > [Thomas Huth] > > The first time I ran this, Juno R0 worked fine, but on R1 and R2, the > assert triggered: > > UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) > [snip] > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): > !EFI_ERROR (Status) > > I worked out what is happening. And it's not to do with this patch. > It's another fall-out from the re-work you did to the previous patch. > It's also ultimately due to a bug the firmware. > > With the initial version of your "Set Marvell Yukon MAC address" > patch, this hang didn't happen. I suspect that was because your error > checking was weaker and certain PCIe failures didn't trigger the > assert. > > To reproduce the error with this commit: > 1) power on and boot R1 or R2 into Shell > I do this by interrupting the boot by pressing ESCAPE and using the boot > menu > 2) At the Shell prompt, run "reset -s" to shutdown > 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" > 4) the board will hang while booting UEFI, assuming the board firmware > doesn't die with constant messages like this: > > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > > Assuming the problem is firmware, not EDK2, what should we do about it? > > OK, so instinctively, my reaction was that "the reset -s bug is a > system controller firmware bug and we shouldn't work around > it". However, since it is actually disrupting Ryan's workflow, which > frequently doesn't touch PCI at all, I think downgrading the ASSERT to > an error message is a good idea short-term. > > Daniil - could you make that change please? > > / > Leif > > > I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus > transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon > driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error > messages during the initial boot. > > Testing motherboard interfaces (FPGA build 118)... > SRAM 32MB test: PASSED > LAN9118 test: PASSED > KMI1/2 test: PASSED > MMC test: PASSED > PB/LEDs test: PASSED > FPGA UART test: PASSED > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > ERROR: PCIe CSR read failed to respond > ... > > Once it went through reporting these errors, the UEFI starts loading but > still fails in OnEndOfDxe(): > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): > !EFI_ERROR (Status) > > This is the original ArmJunoDxe code: > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, > FALSE); > ASSERT_EFI_ERROR (Status); <---- line 110 > > I'm confused at which point in the commit history this is line 110. > > Just before your MAC commit, it's line 108. After your commit, and in > the current mainline, it's line 394. > > Then, I have to go back to Jeremy's commit > 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see > another change to ArmJunoDxe.c. Before that commit, the assert appears > on line 116. > > On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 > on line 125. > > So I'm unsure what code you are running. > > > This shows the code where it fails for me after MAC address code was removed > from ArmJunoDxe.c. It's part of my development branch and lines may not > matching any public commits You should consider switching branch to mainline when trying to compare and report problems with other people, then we have a common baseline to work with. > ... but i pulled 16.06 release for a test and 16.06? Why so old? What's wrong with 16.12? You are missing firmware updates and fixes by using such an old release. > see the ASSERT on the same code in case of "PCIe .." and "SMBus .." errors. > The line number should be corresponding to > 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit: > > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124): > !EFI_ERROR (Status) > > > but it actually fails here first: > Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, > &PciRootComplexDevicePath, > &Handle); > > Ryan - could you try to remove Marvell patches and check if you also > catching "PCIe .." and "SMBus .." errors without them and your build still > fails with other ASSERTs related to PCI. > > Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI > enumeration is completely corrupted. > > These errors do not appear if board was reset with the nPBRESET button. It > never fails with "reset -w" and "reset -c". I also loaded Debian and used > the shutdown command from there and got the same "PCIe .." and "SMBus .." > errors after "reboot" command from the "Cmd>" prompt. Possibly, the > "reboot" command from the board shell prompt does not reset the board > correctly so it looks like a firmware issue. > > Yes, that's pretty much what I've been saying. There appears to be a > firmware bug that causes PCIe to be in an unstable state after > software shutdown of the AP by the motherboard, followed by reset. > > The "PCIe CSR read failed to respond" error is nothing to do with EDK2 > and happens before EDK2 runs. > > However, unlike you, if I don't get the "PCIe CSR read failed to > respond" error, upstream EDK2 before your MAC address commit will boot > without ASSERTing on R0/1/2. > > > Do you mean, when you do have the "PCIe .." error ? As i understand, if you > power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the MAC > address code during the first boot and can start UEFI Shell? > Correct. > So I've never seen the assert at line 110 as you pointed out above. > That's curious. I can see why that would/should assert, but it doesn't > for me. Perhaps it's something to do with the tree you are building > from? Or the devices you have connected? I have no additional SATA or > PCIe devices connected to my board. > > Cheers, > Ryan. > > > I am using R2 for the tests. The tests were done without SATA or PCI devices > connected to the board. Unfortunately, I do not have R1 board at this moment > to check it. > That's fine, R2 is good enough. > This is my R2 firmware versions: > > ARM V2M-Juno Boot loader v1.0.0 > HBI0262 build 1844 > > ARM V2M_Juno r2 Firmware v1.4.3 v1.4.4 is the latest Juno firmware. You should update. Although it won't fix these problems as it wasn't intended to. And as you used 16.06 for your baseline testing, I assume you probably also have old SCP firmware in your environment too. > Build Date: Apr 12 2016 > > MIC RAM configuration (pms_v105.bin)... > > Configuring motherboard (rev D, var A)... > IOFPGA image \MB\HBI0262D\io_b118.bit > > As Leif suggested, i will replace the ASSERT with an error message and send > another patch. > Thanks, that's all we need to do, I think. > Could you please check one thing for me on your boards? Use a build which > worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> reboot > -> see "PCIe .." and "SMBus .." errors -> boot UEFI. After I get the "PCIe .." and "SMBus .." errors, the board will never boot into UEFI. It hangs forever and I have to power-cycle the board. This rarely happens for me on R2. Sometimes, I do UEFI shell -> reset -s -> reboot and the board will give the "PCIe .." and "SMBus .." error, but most of the time, it does not and UEFI continues to boot. > As i understand, you do > not have the ASSERT with your old builds so you can start the UEFI shell. > Use the "pci" shell command to list all PCI devices. Do you see them > correctly enumerated? I'd like to make sure that by ignoring the ASSERT > errors we will not face any further problems with PCI. > In the case where my board continues to boot after software shutdown, the pci shell command shows this: Shell> pci Seg Bus Dev Func --- --- --- ---- 00 00 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 1556 Device 1100 Prog Interface 0 00 01 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 01 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 02 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 03 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 10 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 1F 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 Which is not the same as when the board is power cycled. On power cycle, I see this: Seg Bus Dev Func --- --- --- ---- 00 00 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 1556 Device 1100 Prog Interface 0 00 01 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 01 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 02 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 03 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 10 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 1F 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 03 00 00 ==> Mass Storage Controller - Other mass storage controll er Vendor 1095 Device 3132 Prog Interface 0 00 08 00 00 ==> Network Controller - Ethernet controller Vendor 11AB Device 4380 Prog Interface 0 Cheers, Ryan. > Thanks, > Daniil > > > Thanks, > Daniil > > > Prior to your "Set Marvell Yukon MAC address" patch, or with the > earlier version, the board would boot anyway, but the Yukon device > would be missing. > > Now it dies. > > I don't know which is worse, but I think hanging is worse than an > ethernet port dropping out. Although hanging is a bit more obvious > that there's a problem... > > > + } > } > > STATIC > -- > 2.7.4 > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-23 12:56 ` Ryan Harkin @ 2017-01-24 2:19 ` Daniil Egranov 2017-01-24 11:05 ` Ryan Harkin 0 siblings, 1 reply; 11+ messages in thread From: Daniil Egranov @ 2017-01-24 2:19 UTC (permalink / raw) To: Ryan Harkin; +Cc: edk2-devel@lists.01.org, Leif Lindholm Hi Ryan, On 01/23/2017 06:56 AM, Ryan Harkin wrote: > On 20 January 2017 at 20:57, Daniil Egranov <daniil.egranov@arm.com> wrote: >> Hi Ryan, >> >> >> On 01/20/2017 04:30 AM, Ryan Harkin wrote: >> >> On 20 January 2017 at 01:34, Daniil Egranov <daniil.egranov@arm.com> wrote: >> >> Hi Leif, Ryan >> >> >> On 01/19/2017 09:13 AM, Leif Lindholm wrote: >> >> On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: >> >> On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> wrote: >> >> The Marvell Yukon MAC address load supported only on Juno R1 and R2. >> It disabled for Juno R0 due to PCI issues on this board. >> >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> >> >> Tested-by: Ryan Harkin <ryan.harkin@linaro.org> >> >> --- >> ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> index 47ff587..e9e6990 100644 >> --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> @@ -378,6 +378,7 @@ OnEndOfDxe ( >> EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; >> EFI_HANDLE Handle; >> EFI_STATUS Status; >> + UINT32 JunoRevision; >> >> // >> // PCI Root Complex initialization >> @@ -393,8 +394,12 @@ OnEndOfDxe ( >> Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, >> FALSE); >> ASSERT_EFI_ERROR (Status); >> >> - Status = ArmJunoSetNicMacAddress (); >> - ASSERT_EFI_ERROR (Status); >> + GetJunoRevision (JunoRevision); >> + >> + if (JunoRevision != JUNO_REVISION_R0) { >> + Status = ArmJunoSetNicMacAddress (); >> + ASSERT_EFI_ERROR (Status); >> >> This is just an FYI, but I stacked your patch on top of mainline, like this: >> >> 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: >> Fixed crash on Juno R0 [Daniil Egranov] >> 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. >> [Thomas Huth] >> >> The first time I ran this, Juno R0 worked fine, but on R1 and R2, the >> assert triggered: >> >> UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) >> [snip] >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): >> !EFI_ERROR (Status) >> >> I worked out what is happening. And it's not to do with this patch. >> It's another fall-out from the re-work you did to the previous patch. >> It's also ultimately due to a bug the firmware. >> >> With the initial version of your "Set Marvell Yukon MAC address" >> patch, this hang didn't happen. I suspect that was because your error >> checking was weaker and certain PCIe failures didn't trigger the >> assert. >> >> To reproduce the error with this commit: >> 1) power on and boot R1 or R2 into Shell >> I do this by interrupting the boot by pressing ESCAPE and using the boot >> menu >> 2) At the Shell prompt, run "reset -s" to shutdown >> 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" >> 4) the board will hang while booting UEFI, assuming the board firmware >> doesn't die with constant messages like this: >> >> ERROR: PCIe CSR read failed to respond >> ERROR: SMBus transaction not claimed >> >> Assuming the problem is firmware, not EDK2, what should we do about it? >> >> OK, so instinctively, my reaction was that "the reset -s bug is a >> system controller firmware bug and we shouldn't work around >> it". However, since it is actually disrupting Ryan's workflow, which >> frequently doesn't touch PCI at all, I think downgrading the ASSERT to >> an error message is a good idea short-term. >> >> Daniil - could you make that change please? >> >> / >> Leif >> >> >> I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus >> transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon >> driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error >> messages during the initial boot. >> >> Testing motherboard interfaces (FPGA build 118)... >> SRAM 32MB test: PASSED >> LAN9118 test: PASSED >> KMI1/2 test: PASSED >> MMC test: PASSED >> PB/LEDs test: PASSED >> FPGA UART test: PASSED >> ERROR: PCIe CSR read failed to respond >> ERROR: SMBus transaction not claimed >> ERROR: PCIe CSR read failed to respond >> ... >> >> Once it went through reporting these errors, the UEFI starts loading but >> still fails in OnEndOfDxe(): >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): >> !EFI_ERROR (Status) >> >> This is the original ArmJunoDxe code: >> Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, >> FALSE); >> ASSERT_EFI_ERROR (Status); <---- line 110 >> >> I'm confused at which point in the commit history this is line 110. >> >> Just before your MAC commit, it's line 108. After your commit, and in >> the current mainline, it's line 394. >> >> Then, I have to go back to Jeremy's commit >> 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see >> another change to ArmJunoDxe.c. Before that commit, the assert appears >> on line 116. >> >> On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 >> on line 125. >> >> So I'm unsure what code you are running. >> >> >> This shows the code where it fails for me after MAC address code was removed >> from ArmJunoDxe.c. It's part of my development branch and lines may not >> matching any public commits > You should consider switching branch to mainline when trying to > compare and report problems with other people, then we have a common > baseline to work with. > > >> ... but i pulled 16.06 release for a test and > 16.06? Why so old? What's wrong with 16.12? You are missing firmware > updates and fixes by using such an old release. I wanted to check how long this problem existed. Just picked up one of the old builds. > >> see the ASSERT on the same code in case of "PCIe .." and "SMBus .." errors. >> The line number should be corresponding to >> 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit: >> >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124): >> !EFI_ERROR (Status) >> >> >> but it actually fails here first: >> Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, >> &PciRootComplexDevicePath, >> &Handle); >> >> Ryan - could you try to remove Marvell patches and check if you also >> catching "PCIe .." and "SMBus .." errors without them and your build still >> fails with other ASSERTs related to PCI. >> >> Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI >> enumeration is completely corrupted. >> >> These errors do not appear if board was reset with the nPBRESET button. It >> never fails with "reset -w" and "reset -c". I also loaded Debian and used >> the shutdown command from there and got the same "PCIe .." and "SMBus .." >> errors after "reboot" command from the "Cmd>" prompt. Possibly, the >> "reboot" command from the board shell prompt does not reset the board >> correctly so it looks like a firmware issue. >> >> Yes, that's pretty much what I've been saying. There appears to be a >> firmware bug that causes PCIe to be in an unstable state after >> software shutdown of the AP by the motherboard, followed by reset. >> >> The "PCIe CSR read failed to respond" error is nothing to do with EDK2 >> and happens before EDK2 runs. >> >> However, unlike you, if I don't get the "PCIe CSR read failed to >> respond" error, upstream EDK2 before your MAC address commit will boot >> without ASSERTing on R0/1/2. >> >> >> Do you mean, when you do have the "PCIe .." error ? As i understand, if you >> power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the MAC >> address code during the first boot and can start UEFI Shell? >> > Correct. > > >> So I've never seen the assert at line 110 as you pointed out above. >> That's curious. I can see why that would/should assert, but it doesn't >> for me. Perhaps it's something to do with the tree you are building >> from? Or the devices you have connected? I have no additional SATA or >> PCIe devices connected to my board. >> >> Cheers, >> Ryan. >> >> >> I am using R2 for the tests. The tests were done without SATA or PCI devices >> connected to the board. Unfortunately, I do not have R1 board at this moment >> to check it. >> > That's fine, R2 is good enough. > > >> This is my R2 firmware versions: >> >> ARM V2M-Juno Boot loader v1.0.0 >> HBI0262 build 1844 >> >> ARM V2M_Juno r2 Firmware v1.4.3 > v1.4.4 is the latest Juno firmware. You should update. Although it > won't fix these problems as it wasn't intended to. And as you used > 16.06 for your baseline testing, I assume you probably also have old > SCP firmware in your environment too. > > >> Build Date: Apr 12 2016 >> >> MIC RAM configuration (pms_v105.bin)... >> >> Configuring motherboard (rev D, var A)... >> IOFPGA image \MB\HBI0262D\io_b118.bit >> >> As Leif suggested, i will replace the ASSERT with an error message and send >> another patch. >> > Thanks, that's all we need to do, I think. > Done. >> Could you please check one thing for me on your boards? Use a build which >> worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> reboot >> -> see "PCIe .." and "SMBus .." errors -> boot UEFI. > After I get the "PCIe .." and "SMBus .." errors, the board will never > boot into UEFI. It hangs forever and I have to power-cycle the board. > This rarely happens for me on R2. > > Sometimes, I do UEFI shell -> reset -s -> reboot and the board will > give the "PCIe .." and "SMBus .." error, but most of the time, it does > not and UEFI continues to boot. Interesting, if i do reset -s, i am getting "PCIe ..." error and stop with ASSERT error in UEFI all the time. > > >> As i understand, you do >> not have the ASSERT with your old builds so you can start the UEFI shell. >> Use the "pci" shell command to list all PCI devices. Do you see them >> correctly enumerated? I'd like to make sure that by ignoring the ASSERT >> errors we will not face any further problems with PCI. >> > In the case where my board continues to boot after software shutdown, > the pci shell command shows this: > > Shell> pci > Seg Bus Dev Func > --- --- --- ---- > 00 00 00 00 ==> Bridge Device - PCI/PCI bridge > Vendor 1556 Device 1100 Prog Interface 0 > 00 01 00 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 01 00 ==> Pre 2.0 device - All devices other than VGA > Vendor 0000 Device 0000 Prog Interface 0 > 00 02 02 00 ==> Pre 2.0 device - All devices other than VGA > Vendor 0000 Device 0000 Prog Interface 0 > 00 02 03 00 ==> Pre 2.0 device - All devices other than VGA > Vendor 0000 Device 0000 Prog Interface 0 > 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 10 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 1F 00 ==> Pre 2.0 device - All devices other than VGA > Vendor 0000 Device 0000 Prog Interface 0 > > Which is not the same as when the board is power cycled. On power > cycle, I see this: > > Seg Bus Dev Func > --- --- --- ---- > 00 00 00 00 ==> Bridge Device - PCI/PCI bridge > Vendor 1556 Device 1100 Prog Interface 0 > 00 01 00 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 01 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 02 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 03 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 10 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 1F 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 03 00 00 ==> Mass Storage Controller - Other mass storage controll > er > Vendor 1095 Device 3132 Prog Interface 0 > 00 08 00 00 ==> Network Controller - Ethernet controller > Vendor 11AB Device 4380 Prog Interface 0 > > Cheers, > Ryan. > > If I let UEFI boot to the shell after "PCIe ..." error, the pci shell command shows me a long list of unrecognized PCI devices with all combinations of BDFs. Thanks, Daniil >> Thanks, >> Daniil >> >> >> Thanks, >> Daniil >> >> >> Prior to your "Set Marvell Yukon MAC address" patch, or with the >> earlier version, the board would boot anyway, but the Yukon device >> would be missing. >> >> Now it dies. >> >> I don't know which is worse, but I think hanging is worse than an >> ethernet port dropping out. Although hanging is a bit more obvious >> that there's a problem... >> >> >> + } >> } >> >> STATIC >> -- >> 2.7.4 >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel >> >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel >> >> > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-24 2:19 ` Daniil Egranov @ 2017-01-24 11:05 ` Ryan Harkin 2017-01-24 12:34 ` Ryan Harkin 0 siblings, 1 reply; 11+ messages in thread From: Ryan Harkin @ 2017-01-24 11:05 UTC (permalink / raw) To: Daniil Egranov; +Cc: edk2-devel@lists.01.org, Leif Lindholm On 24 January 2017 at 02:19, Daniil Egranov <daniil.egranov@arm.com> wrote: > Hi Ryan, > > > > On 01/23/2017 06:56 AM, Ryan Harkin wrote: >> >> On 20 January 2017 at 20:57, Daniil Egranov <daniil.egranov@arm.com> >> wrote: >>> >>> Hi Ryan, >>> >>> >>> On 01/20/2017 04:30 AM, Ryan Harkin wrote: >>> >>> On 20 January 2017 at 01:34, Daniil Egranov <daniil.egranov@arm.com> >>> wrote: >>> >>> Hi Leif, Ryan >>> >>> >>> On 01/19/2017 09:13 AM, Leif Lindholm wrote: >>> >>> On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: >>> >>> On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> >>> wrote: >>> >>> The Marvell Yukon MAC address load supported only on Juno R1 and R2. >>> It disabled for Juno R0 due to PCI issues on this board. >>> >>> Contributed-under: TianoCore Contribution Agreement 1.0 >>> Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> >>> >>> Tested-by: Ryan Harkin <ryan.harkin@linaro.org> >>> >>> --- >>> ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>> b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>> index 47ff587..e9e6990 100644 >>> --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>> +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>> @@ -378,6 +378,7 @@ OnEndOfDxe ( >>> EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; >>> EFI_HANDLE Handle; >>> EFI_STATUS Status; >>> + UINT32 JunoRevision; >>> >>> // >>> // PCI Root Complex initialization >>> @@ -393,8 +394,12 @@ OnEndOfDxe ( >>> Status = gBS->ConnectController (Handle, NULL, >>> PciRootComplexDevicePath, >>> FALSE); >>> ASSERT_EFI_ERROR (Status); >>> >>> - Status = ArmJunoSetNicMacAddress (); >>> - ASSERT_EFI_ERROR (Status); >>> + GetJunoRevision (JunoRevision); >>> + >>> + if (JunoRevision != JUNO_REVISION_R0) { >>> + Status = ArmJunoSetNicMacAddress (); >>> + ASSERT_EFI_ERROR (Status); >>> >>> This is just an FYI, but I stacked your patch on top of mainline, like >>> this: >>> >>> 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: >>> Fixed crash on Juno R0 [Daniil Egranov] >>> 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. >>> [Thomas Huth] >>> >>> The first time I ran this, Juno R0 worked fine, but on R1 and R2, the >>> assert triggered: >>> >>> UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) >>> [snip] >>> ASSERT_EFI_ERROR (Status = Not Found) >>> ASSERT [ArmJunoDxe] >>> >>> /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): >>> !EFI_ERROR (Status) >>> >>> I worked out what is happening. And it's not to do with this patch. >>> It's another fall-out from the re-work you did to the previous patch. >>> It's also ultimately due to a bug the firmware. >>> >>> With the initial version of your "Set Marvell Yukon MAC address" >>> patch, this hang didn't happen. I suspect that was because your error >>> checking was weaker and certain PCIe failures didn't trigger the >>> assert. >>> >>> To reproduce the error with this commit: >>> 1) power on and boot R1 or R2 into Shell >>> I do this by interrupting the boot by pressing ESCAPE and using the >>> boot >>> menu >>> 2) At the Shell prompt, run "reset -s" to shutdown >>> 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" >>> 4) the board will hang while booting UEFI, assuming the board firmware >>> doesn't die with constant messages like this: >>> >>> ERROR: PCIe CSR read failed to respond >>> ERROR: SMBus transaction not claimed >>> >>> Assuming the problem is firmware, not EDK2, what should we do about it? >>> >>> OK, so instinctively, my reaction was that "the reset -s bug is a >>> system controller firmware bug and we shouldn't work around >>> it". However, since it is actually disrupting Ryan's workflow, which >>> frequently doesn't touch PCI at all, I think downgrading the ASSERT to >>> an error message is a good idea short-term. >>> >>> Daniil - could you make that change please? >>> >>> / >>> Leif >>> >>> >>> I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus >>> transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon >>> driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same >>> error >>> messages during the initial boot. >>> >>> Testing motherboard interfaces (FPGA build 118)... >>> SRAM 32MB test: PASSED >>> LAN9118 test: PASSED >>> KMI1/2 test: PASSED >>> MMC test: PASSED >>> PB/LEDs test: PASSED >>> FPGA UART test: PASSED >>> ERROR: PCIe CSR read failed to respond >>> ERROR: SMBus transaction not claimed >>> ERROR: PCIe CSR read failed to respond >>> ... >>> >>> Once it went through reporting these errors, the UEFI starts loading but >>> still fails in OnEndOfDxe(): >>> ASSERT_EFI_ERROR (Status = Not Found) >>> ASSERT [ArmJunoDxe] >>> >>> /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): >>> !EFI_ERROR (Status) >>> >>> This is the original ArmJunoDxe code: >>> Status = gBS->ConnectController (Handle, NULL, >>> PciRootComplexDevicePath, >>> FALSE); >>> ASSERT_EFI_ERROR (Status); <---- line 110 >>> >>> I'm confused at which point in the commit history this is line 110. >>> >>> Just before your MAC commit, it's line 108. After your commit, and in >>> the current mainline, it's line 394. >>> >>> Then, I have to go back to Jeremy's commit >>> 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see >>> another change to ArmJunoDxe.c. Before that commit, the assert appears >>> on line 116. >>> >>> On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 >>> on line 125. >>> >>> So I'm unsure what code you are running. >>> >>> >>> This shows the code where it fails for me after MAC address code was >>> removed >>> from ArmJunoDxe.c. It's part of my development branch and lines may not >>> matching any public commits >> >> You should consider switching branch to mainline when trying to >> compare and report problems with other people, then we have a common >> baseline to work with. >> >> >>> ... but i pulled 16.06 release for a test and >> >> 16.06? Why so old? What's wrong with 16.12? You are missing firmware >> updates and fixes by using such an old release. > > > I wanted to check how long this problem existed. Just picked up one of the > old builds. > > >> >>> see the ASSERT on the same code in case of "PCIe .." and "SMBus .." >>> errors. >>> The line number should be corresponding to >>> 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit: >>> >>> ASSERT_EFI_ERROR (Status = Not Found) >>> ASSERT [ArmJunoDxe] >>> >>> /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124): >>> !EFI_ERROR (Status) >>> >>> >>> but it actually fails here first: >>> Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, >>> &PciRootComplexDevicePath, >>> &Handle); >>> >>> Ryan - could you try to remove Marvell patches and check if you also >>> catching "PCIe .." and "SMBus .." errors without them and your build >>> still >>> fails with other ASSERTs related to PCI. >>> >>> Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI >>> PCI >>> enumeration is completely corrupted. >>> >>> These errors do not appear if board was reset with the nPBRESET button. >>> It >>> never fails with "reset -w" and "reset -c". I also loaded Debian and used >>> the shutdown command from there and got the same "PCIe .." and "SMBus .." >>> errors after "reboot" command from the "Cmd>" prompt. Possibly, the >>> "reboot" command from the board shell prompt does not reset the board >>> correctly so it looks like a firmware issue. >>> >>> Yes, that's pretty much what I've been saying. There appears to be a >>> firmware bug that causes PCIe to be in an unstable state after >>> software shutdown of the AP by the motherboard, followed by reset. >>> >>> The "PCIe CSR read failed to respond" error is nothing to do with EDK2 >>> and happens before EDK2 runs. >>> >>> However, unlike you, if I don't get the "PCIe CSR read failed to >>> respond" error, upstream EDK2 before your MAC address commit will boot >>> without ASSERTing on R0/1/2. >>> >>> >>> Do you mean, when you do have the "PCIe .." error ? As i understand, if >>> you >>> power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the >>> MAC >>> address code during the first boot and can start UEFI Shell? >>> >> Correct. >> >> >>> So I've never seen the assert at line 110 as you pointed out above. >>> That's curious. I can see why that would/should assert, but it doesn't >>> for me. Perhaps it's something to do with the tree you are building >>> from? Or the devices you have connected? I have no additional SATA or >>> PCIe devices connected to my board. >>> >>> Cheers, >>> Ryan. >>> >>> >>> I am using R2 for the tests. The tests were done without SATA or PCI >>> devices >>> connected to the board. Unfortunately, I do not have R1 board at this >>> moment >>> to check it. >>> >> That's fine, R2 is good enough. >> >> >>> This is my R2 firmware versions: >>> >>> ARM V2M-Juno Boot loader v1.0.0 >>> HBI0262 build 1844 >>> >>> ARM V2M_Juno r2 Firmware v1.4.3 >> >> v1.4.4 is the latest Juno firmware. You should update. Although it >> won't fix these problems as it wasn't intended to. And as you used >> 16.06 for your baseline testing, I assume you probably also have old >> SCP firmware in your environment too. >> >> >>> Build Date: Apr 12 2016 >>> >>> MIC RAM configuration (pms_v105.bin)... >>> >>> Configuring motherboard (rev D, var A)... >>> IOFPGA image \MB\HBI0262D\io_b118.bit >>> >>> As Leif suggested, i will replace the ASSERT with an error message and >>> send >>> another patch. >>> >> Thanks, that's all we need to do, I think. >> > > Done. > >>> Could you please check one thing for me on your boards? Use a build which >>> worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> >>> reboot >>> -> see "PCIe .." and "SMBus .." errors -> boot UEFI. >> >> After I get the "PCIe .." and "SMBus .." errors, the board will never >> boot into UEFI. It hangs forever and I have to power-cycle the board. >> This rarely happens for me on R2. >> >> Sometimes, I do UEFI shell -> reset -s -> reboot and the board will >> give the "PCIe .." and "SMBus .." error, but most of the time, it does >> not and UEFI continues to boot. > > > Interesting, if i do reset -s, i am getting "PCIe ..." error and stop with > ASSERT error in UEFI all the time. > Funny, but while testing mainline [1] before testing your v2 patch, R1 and R2 both crashed after software shutdown with: ASSERT [ArmJunoDxe] /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(397): !EFI_ERROR (Status) But even worse than that, at one point, R2 did this: Installation of the FDT using the device path <VenHw(E7223039-5836-41E1-B542-D7EC736C5E59)/board.dtb> completed. PCI: The size 0x7F000000 of the region 0x80000000 has been increased to be a power of two for the AXI translation table. PCI: The size 0x180000000 of the region 0x880000000 has been increased to be a power of two for the AXI translation table. Synchronous Exception at 0x00000000FDEC712C PC 0x0000FDEC712C (0x0000FDEC5000+0x0000212C) [ 0] CpuIo2Dxe.dll PC 0x0000FDEC6570 (0x0000FDEC5000+0x00001570) [ 0] CpuIo2Dxe.dll PC 0x0000FDE449CC (0x0000FDE41000+0x000039CC) [ 1] PciHostBridge.dll PC 0x0000F886A6B4 (0x0000F8864000+0x000066B4) [ 2] PciBusDxe.dll PC 0x0000F8867C54 (0x0000F8864000+0x00003C54) [ 2] PciBusDxe.dll PC 0x0000F8867C24 (0x0000F8864000+0x00003C24) [ 2] PciBusDxe.dll PC 0x0000F886CA08 (0x0000F8864000+0x00008A08) [ 2] PciBusDxe.dll PC 0x0000F88677A8 (0x0000F8864000+0x000037A8) [ 2] PciBusDxe.dll PC 0x0000F8865920 (0x0000F8864000+0x00001920) [ 2] PciBusDxe.dll PC 0x0000FE005F8C (0x0000FDFFA000+0x0000BF8C) [ 3] DxeCore.dll PC 0x0000FE005358 (0x0000FDFFA000+0x0000B358) [ 3] DxeCore.dll PC 0x0000F8A29D6C (0x0000F8A28000+0x00001D6C) [ 4] ArmJunoDxe.dll PC 0x0000FE0177B8 (0x0000FDFFA000+0x0001D7B8) [ 5] DxeCore.dll PC 0x0000FE016F38 (0x0000FDFFA000+0x0001CF38) [ 5] DxeCore.dll PC 0x0000FE005134 (0x0000FDFFA000+0x0000B134) [ 5] DxeCore.dll PC 0x0000FE0175C4 (0x0000FDFFA000+0x0001D5C4) [ 5] DxeCore.dll PC 0x0000FE0179C0 (0x0000FDFFA000+0x0001D9C0) [ 5] DxeCore.dll PC 0x0000FE017E68 (0x0000FDFFA000+0x0001DE68) [ 5] DxeCore.dll PC 0x0000FDDEDA34 (0x0000FDDE1000+0x0000CA34) [ 6] BdsDxe.dll PC 0x0000FDDFD534 (0x0000FDDE1000+0x0001C534) [ 6] BdsDxe.dll PC 0x0000FDDE3E70 (0x0000FDDE1000+0x00002E70) [ 6] BdsDxe.dll PC 0x0000FDFFC3DC (0x0000FDFFA000+0x000023DC) [ 7] DxeCore.dll PC 0x0000FDFFB4A0 (0x0000FDFFA000+0x000014A0) [ 7] DxeCore.dll PC 0x0000FDFFB024 (0x0000FDFFA000+0x00001024) [ 7] DxeCore.dll [ 0] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe/DEBUG/CpuIo2Dxe.dll [ 1] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/ArmPlatformPkg/ArmJunoPkg/Drivers/PciHostBridgeDxe/PciHostBridgeDxe/DEBUG/PciHostBridge.dll [ 2] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Bus/Pci/PciBusDxe/PciBusDxe/DEBUG/PciBusDxe.dll [ 3] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll [ 4] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe/DEBUG/ArmJunoDxe.dll [ 5] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll [ 6] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll [ 7] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll X0 0x0000000057F00000 X1 0x00000000FDECC030 X2 0x0000000057F00000 X3 0x000000000000001A X4 0x00000000FD434E98 X5 0x00000000FDEC64D0 X6 0x00000000F887A080 X7 0x00000000FDE4A4B8 X8 0x00000000FE02D690 X9 0x0000004100000000 X10 0x0000000057EFFFFF X11 0xC000000000000000 X12 0x0000000000000000 X13 0x0000000000000003 X14 0x0000000000000000 X15 0x0000000000000003 X16 0x00000000FEFFFBF0 X17 0x0000000000000000 X18 0x0000000000000000 X19 0x0000000000000013 X20 0x0000000000000000 X21 0x0000000000000000 X22 0x0000000000000000 X23 0x0000000000000000 X24 0x0000000000000000 X25 0x0000000000000000 X26 0x0000000000000000 X27 0x0000000000000000 X28 0x0000000000000000 FP 0x00000000FEFFF2F0 LR 0x00000000FDEC6570 V0 0x0000000000000000 0000000000000000 V1 0xAB1C5ED5923F82A4 59F111F13956C25B V2 0x550C7DC3243185BE 12835B01D807AA98 V3 0xC19BF1749BDC06A7 80DEB1FE72BE5D74 V4 0x240CA1CC0FC19DC6 EFBE4786E49B69C1 V5 0x76F988DA5CB0A9DC 4A7484AA2DE92C6F V6 0xBF597FC7B00327C8 A831C66D983E5152 V7 0x1429296706CA6351 D5A79147C6E00BF3 V8 0x0000000000000000 2E1B213827B70A85 V9 0x0000000000000000 766A0ABB650A7354 V10 0x0000000000000000 A81A664BA2BFE8A1 V11 0x0000000000000000 D6990624D192E819 V12 0x0000000000000000 1E376C0819A4C116 V13 0x0000000000000000 4ED8AA4A391C0CB3 V14 0x0000000000000000 78A5636F748F82EE V15 0x0000000000000000 A4506CEB90BEFFFA V16 0x3C83709547545153 BC990E9F897D4FA8 V17 0xBEBEF297C5EC0578 E1D99AE8C2B92607 V18 0x13242833C5F05BAB 101C24C521387481 V19 0x0A50ABCAD0BDDA12 996865483E880969 V20 0x6C4ABAA53A9BE1CB 4416D9F479B08221 V21 0x60952147C3A68574 55B8AE51435C1A1A V22 0x9FEB2A3B4AB8D3BF 88C1883495C7F76F V23 0xD0C224BC8FB77E09 3DB8D233CF470963 V24 0x0E72963E02D21C93 C95B757DA62B3A12 V25 0x2A77DBA1E4EE5D5C E08479A1B557DFA8 V26 0xB593F86668CA8129 927A0019CDEC1A76 V27 0xD000200000E00002 A6201000E0008C04 V28 0x8020000081000200 892080400880A022 V29 0x2085000000000136 801489A520002304 V30 0x0801200082040060 0384E05702949200 V31 0x4064084112940202 1000458804200800 SP 0x00000000FEFFF2D0 ELR 0x00000000FDEC712C SPSR 0x60000209 FPSR 0x00000000 ESR 0x96000210 FAR 0x0000000057F00000 ESR : EC 0x25 IL 0x1 ISS 0x00000210 Data abort: Synchronous external abort Stack dump: 00000FEFFF1D0: 996865483E880969 0A50ABCAD0BDDA12 4416D9F479B08221 6C4ABAA53A9BE1CB 00000FEFFF1F0: 55B8AE51435C1A1A 60952147C3A68574 88C1883495C7F76F 9FEB2A3B4AB8D3BF 00000FEFFF210: 3DB8D233CF470963 D0C224BC8FB77E09 C95B757DA62B3A12 0E72963E02D21C93 00000FEFFF230: E08479A1B557DFA8 2A77DBA1E4EE5D5C 927A0019CDEC1A76 B593F86668CA8129 00000FEFFF250: A6201000E0008C04 D000200000E00002 892080400880A022 8020000081000200 00000FEFFF270: 801489A520002304 2085000000000136 0384E05702949200 0801200082040060 00000FEFFF290: 1000458804200800 4064084112940202 00000000FDEC712C 0000000060000209 00000FEFFF2B0: 0000000000000000 0000000096000210 0000000057F00000 000000000000001A > 00000FEFFF2D0: 0000000057F00000 0000000057F00000 FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 00000FEFFF2F0: 00000000FEFFF350 00000000FDE449CC A831C66D983E5152 00000000FD434E98 00000FEFFF310: 000000000000001A 0000000057F00000 0000000057F00000 00000000FDECC000 00000FEFFF330: 00000000FD434E98 0101000000000000 0000000000000000 00000000FD434E98 00000FEFFF350: 00000000FEFFF3A0 00000000F886A6B4 00000000FEFFF3A0 00000000FD434E98 00000FEFFF370: 000000000000001A 0000000057F00000 00000000FEFFF3A0 00000000FD46C860 00000FEFFF390: 00000000FDECC000 00000000FD46C798 00000000FEFFF420 00000000F8867C54 00000FEFFF3B0: 0000000057F00000 00000000FDDB8018 00000000FEFFF400 57F00000F8879090 ASSERT [ArmCpuDxe] /linaro/platforms/uefi/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(265): ((BOOLEAN)(0==1)) Anyway, these problems are all due to dodgy lower level firmware, which should be fixed eventually. And your v2 patch removing the assert is out best way forward. Cheers, Ryan. > >> >> >>> As i understand, you do >>> not have the ASSERT with your old builds so you can start the UEFI shell. >>> Use the "pci" shell command to list all PCI devices. Do you see them >>> correctly enumerated? I'd like to make sure that by ignoring the ASSERT >>> errors we will not face any further problems with PCI. >>> >> In the case where my board continues to boot after software shutdown, >> the pci shell command shows this: >> >> Shell> pci >> Seg Bus Dev Func >> --- --- --- ---- >> 00 00 00 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 1556 Device 1100 Prog Interface 0 >> 00 01 00 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 01 00 ==> Pre 2.0 device - All devices other than VGA >> Vendor 0000 Device 0000 Prog Interface 0 >> 00 02 02 00 ==> Pre 2.0 device - All devices other than VGA >> Vendor 0000 Device 0000 Prog Interface 0 >> 00 02 03 00 ==> Pre 2.0 device - All devices other than VGA >> Vendor 0000 Device 0000 Prog Interface 0 >> 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 10 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 1F 00 ==> Pre 2.0 device - All devices other than VGA >> Vendor 0000 Device 0000 Prog Interface 0 >> >> Which is not the same as when the board is power cycled. On power >> cycle, I see this: >> >> Seg Bus Dev Func >> --- --- --- ---- >> 00 00 00 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 1556 Device 1100 Prog Interface 0 >> 00 01 00 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 01 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 02 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 03 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 10 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 02 1F 00 ==> Bridge Device - PCI/PCI bridge >> Vendor 111D Device 8090 Prog Interface 0 >> 00 03 00 00 ==> Mass Storage Controller - Other mass storage >> controll >> er >> Vendor 1095 Device 3132 Prog Interface 0 >> 00 08 00 00 ==> Network Controller - Ethernet controller >> Vendor 11AB Device 4380 Prog Interface 0 >> >> Cheers, >> Ryan. >> >> > > If I let UEFI boot to the shell after "PCIe ..." error, the pci shell > command shows me a long list of unrecognized PCI devices with all > combinations of BDFs. > > Thanks, > Daniil > > >>> Thanks, >>> Daniil >>> >>> >>> Thanks, >>> Daniil >>> >>> >>> Prior to your "Set Marvell Yukon MAC address" patch, or with the >>> earlier version, the board would boot anyway, but the Yukon device >>> would be missing. >>> >>> Now it dies. >>> >>> I don't know which is worse, but I think hanging is worse than an >>> ethernet port dropping out. Although hanging is a bit more obvious >>> that there's a problem... >>> >>> >>> + } >>> } >>> >>> STATIC >>> -- >>> 2.7.4 >>> >>> _______________________________________________ >>> edk2-devel mailing list >>> edk2-devel@lists.01.org >>> https://lists.01.org/mailman/listinfo/edk2-devel >>> >>> >>> _______________________________________________ >>> edk2-devel mailing list >>> edk2-devel@lists.01.org >>> https://lists.01.org/mailman/listinfo/edk2-devel >>> >>> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel > > [1] mainline is this commit at the moment: 5734d48 2017-01-22 ShellPkg SmbiosView: Add decoding of SMBIOS spec 3.1.1 [Star Zeng] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 2017-01-24 11:05 ` Ryan Harkin @ 2017-01-24 12:34 ` Ryan Harkin 0 siblings, 0 replies; 11+ messages in thread From: Ryan Harkin @ 2017-01-24 12:34 UTC (permalink / raw) To: Daniil Egranov; +Cc: edk2-devel@lists.01.org, Leif Lindholm On 24 January 2017 at 11:05, Ryan Harkin <ryan.harkin@linaro.org> wrote: > On 24 January 2017 at 02:19, Daniil Egranov <daniil.egranov@arm.com> wrote: >> Hi Ryan, >> >> >> >> On 01/23/2017 06:56 AM, Ryan Harkin wrote: >>> >>> On 20 January 2017 at 20:57, Daniil Egranov <daniil.egranov@arm.com> >>> wrote: >>>> >>>> Hi Ryan, >>>> >>>> >>>> On 01/20/2017 04:30 AM, Ryan Harkin wrote: >>>> >>>> On 20 January 2017 at 01:34, Daniil Egranov <daniil.egranov@arm.com> >>>> wrote: >>>> >>>> Hi Leif, Ryan >>>> >>>> >>>> On 01/19/2017 09:13 AM, Leif Lindholm wrote: >>>> >>>> On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: >>>> >>>> On 18 January 2017 at 23:27, Daniil Egranov <daniil.egranov@arm.com> >>>> wrote: >>>> >>>> The Marvell Yukon MAC address load supported only on Juno R1 and R2. >>>> It disabled for Juno R0 due to PCI issues on this board. >>>> >>>> Contributed-under: TianoCore Contribution Agreement 1.0 >>>> Signed-off-by: Daniil Egranov <daniil.egranov@arm.com> >>>> >>>> Tested-by: Ryan Harkin <ryan.harkin@linaro.org> >>>> >>>> --- >>>> ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- >>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>>> b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>>> index 47ff587..e9e6990 100644 >>>> --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>>> +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >>>> @@ -378,6 +378,7 @@ OnEndOfDxe ( >>>> EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; >>>> EFI_HANDLE Handle; >>>> EFI_STATUS Status; >>>> + UINT32 JunoRevision; >>>> >>>> // >>>> // PCI Root Complex initialization >>>> @@ -393,8 +394,12 @@ OnEndOfDxe ( >>>> Status = gBS->ConnectController (Handle, NULL, >>>> PciRootComplexDevicePath, >>>> FALSE); >>>> ASSERT_EFI_ERROR (Status); >>>> >>>> - Status = ArmJunoSetNicMacAddress (); >>>> - ASSERT_EFI_ERROR (Status); >>>> + GetJunoRevision (JunoRevision); >>>> + >>>> + if (JunoRevision != JUNO_REVISION_R0) { >>>> + Status = ArmJunoSetNicMacAddress (); >>>> + ASSERT_EFI_ERROR (Status); >>>> >>>> This is just an FYI, but I stacked your patch on top of mainline, like >>>> this: >>>> >>>> 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: >>>> Fixed crash on Juno R0 [Daniil Egranov] >>>> 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. >>>> [Thomas Huth] >>>> >>>> The first time I ran this, Juno R0 worked fine, but on R1 and R2, the >>>> assert triggered: >>>> >>>> UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) >>>> [snip] >>>> ASSERT_EFI_ERROR (Status = Not Found) >>>> ASSERT [ArmJunoDxe] >>>> >>>> /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): >>>> !EFI_ERROR (Status) >>>> >>>> I worked out what is happening. And it's not to do with this patch. >>>> It's another fall-out from the re-work you did to the previous patch. >>>> It's also ultimately due to a bug the firmware. >>>> >>>> With the initial version of your "Set Marvell Yukon MAC address" >>>> patch, this hang didn't happen. I suspect that was because your error >>>> checking was weaker and certain PCIe failures didn't trigger the >>>> assert. >>>> >>>> To reproduce the error with this commit: >>>> 1) power on and boot R1 or R2 into Shell >>>> I do this by interrupting the boot by pressing ESCAPE and using the >>>> boot >>>> menu >>>> 2) At the Shell prompt, run "reset -s" to shutdown >>>> 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" >>>> 4) the board will hang while booting UEFI, assuming the board firmware >>>> doesn't die with constant messages like this: >>>> >>>> ERROR: PCIe CSR read failed to respond >>>> ERROR: SMBus transaction not claimed >>>> >>>> Assuming the problem is firmware, not EDK2, what should we do about it? >>>> >>>> OK, so instinctively, my reaction was that "the reset -s bug is a >>>> system controller firmware bug and we shouldn't work around >>>> it". However, since it is actually disrupting Ryan's workflow, which >>>> frequently doesn't touch PCI at all, I think downgrading the ASSERT to >>>> an error message is a good idea short-term. >>>> >>>> Daniil - could you make that change please? >>>> >>>> / >>>> Leif >>>> >>>> >>>> I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus >>>> transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon >>>> driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same >>>> error >>>> messages during the initial boot. >>>> >>>> Testing motherboard interfaces (FPGA build 118)... >>>> SRAM 32MB test: PASSED >>>> LAN9118 test: PASSED >>>> KMI1/2 test: PASSED >>>> MMC test: PASSED >>>> PB/LEDs test: PASSED >>>> FPGA UART test: PASSED >>>> ERROR: PCIe CSR read failed to respond >>>> ERROR: SMBus transaction not claimed >>>> ERROR: PCIe CSR read failed to respond >>>> ... >>>> >>>> Once it went through reporting these errors, the UEFI starts loading but >>>> still fails in OnEndOfDxe(): >>>> ASSERT_EFI_ERROR (Status = Not Found) >>>> ASSERT [ArmJunoDxe] >>>> >>>> /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): >>>> !EFI_ERROR (Status) >>>> >>>> This is the original ArmJunoDxe code: >>>> Status = gBS->ConnectController (Handle, NULL, >>>> PciRootComplexDevicePath, >>>> FALSE); >>>> ASSERT_EFI_ERROR (Status); <---- line 110 >>>> >>>> I'm confused at which point in the commit history this is line 110. >>>> >>>> Just before your MAC commit, it's line 108. After your commit, and in >>>> the current mainline, it's line 394. >>>> >>>> Then, I have to go back to Jeremy's commit >>>> 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see >>>> another change to ArmJunoDxe.c. Before that commit, the assert appears >>>> on line 116. >>>> >>>> On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 >>>> on line 125. >>>> >>>> So I'm unsure what code you are running. >>>> >>>> >>>> This shows the code where it fails for me after MAC address code was >>>> removed >>>> from ArmJunoDxe.c. It's part of my development branch and lines may not >>>> matching any public commits >>> >>> You should consider switching branch to mainline when trying to >>> compare and report problems with other people, then we have a common >>> baseline to work with. >>> >>> >>>> ... but i pulled 16.06 release for a test and >>> >>> 16.06? Why so old? What's wrong with 16.12? You are missing firmware >>> updates and fixes by using such an old release. >> >> >> I wanted to check how long this problem existed. Just picked up one of the >> old builds. >> >> >>> >>>> see the ASSERT on the same code in case of "PCIe .." and "SMBus .." >>>> errors. >>>> The line number should be corresponding to >>>> 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit: >>>> >>>> ASSERT_EFI_ERROR (Status = Not Found) >>>> ASSERT [ArmJunoDxe] >>>> >>>> /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124): >>>> !EFI_ERROR (Status) >>>> >>>> >>>> but it actually fails here first: >>>> Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, >>>> &PciRootComplexDevicePath, >>>> &Handle); >>>> >>>> Ryan - could you try to remove Marvell patches and check if you also >>>> catching "PCIe .." and "SMBus .." errors without them and your build >>>> still >>>> fails with other ASSERTs related to PCI. >>>> >>>> Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI >>>> PCI >>>> enumeration is completely corrupted. >>>> >>>> These errors do not appear if board was reset with the nPBRESET button. >>>> It >>>> never fails with "reset -w" and "reset -c". I also loaded Debian and used >>>> the shutdown command from there and got the same "PCIe .." and "SMBus .." >>>> errors after "reboot" command from the "Cmd>" prompt. Possibly, the >>>> "reboot" command from the board shell prompt does not reset the board >>>> correctly so it looks like a firmware issue. >>>> >>>> Yes, that's pretty much what I've been saying. There appears to be a >>>> firmware bug that causes PCIe to be in an unstable state after >>>> software shutdown of the AP by the motherboard, followed by reset. >>>> >>>> The "PCIe CSR read failed to respond" error is nothing to do with EDK2 >>>> and happens before EDK2 runs. >>>> >>>> However, unlike you, if I don't get the "PCIe CSR read failed to >>>> respond" error, upstream EDK2 before your MAC address commit will boot >>>> without ASSERTing on R0/1/2. >>>> >>>> >>>> Do you mean, when you do have the "PCIe .." error ? As i understand, if >>>> you >>>> power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the >>>> MAC >>>> address code during the first boot and can start UEFI Shell? >>>> >>> Correct. >>> >>> >>>> So I've never seen the assert at line 110 as you pointed out above. >>>> That's curious. I can see why that would/should assert, but it doesn't >>>> for me. Perhaps it's something to do with the tree you are building >>>> from? Or the devices you have connected? I have no additional SATA or >>>> PCIe devices connected to my board. >>>> >>>> Cheers, >>>> Ryan. >>>> >>>> >>>> I am using R2 for the tests. The tests were done without SATA or PCI >>>> devices >>>> connected to the board. Unfortunately, I do not have R1 board at this >>>> moment >>>> to check it. >>>> >>> That's fine, R2 is good enough. >>> >>> >>>> This is my R2 firmware versions: >>>> >>>> ARM V2M-Juno Boot loader v1.0.0 >>>> HBI0262 build 1844 >>>> >>>> ARM V2M_Juno r2 Firmware v1.4.3 >>> >>> v1.4.4 is the latest Juno firmware. You should update. Although it >>> won't fix these problems as it wasn't intended to. And as you used >>> 16.06 for your baseline testing, I assume you probably also have old >>> SCP firmware in your environment too. >>> >>> >>>> Build Date: Apr 12 2016 >>>> >>>> MIC RAM configuration (pms_v105.bin)... >>>> >>>> Configuring motherboard (rev D, var A)... >>>> IOFPGA image \MB\HBI0262D\io_b118.bit >>>> >>>> As Leif suggested, i will replace the ASSERT with an error message and >>>> send >>>> another patch. >>>> >>> Thanks, that's all we need to do, I think. >>> >> >> Done. >> >>>> Could you please check one thing for me on your boards? Use a build which >>>> worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> >>>> reboot >>>> -> see "PCIe .." and "SMBus .." errors -> boot UEFI. >>> >>> After I get the "PCIe .." and "SMBus .." errors, the board will never >>> boot into UEFI. It hangs forever and I have to power-cycle the board. >>> This rarely happens for me on R2. >>> >>> Sometimes, I do UEFI shell -> reset -s -> reboot and the board will >>> give the "PCIe .." and "SMBus .." error, but most of the time, it does >>> not and UEFI continues to boot. >> >> >> Interesting, if i do reset -s, i am getting "PCIe ..." error and stop with >> ASSERT error in UEFI all the time. >> > > Funny, but while testing mainline [1] before testing your v2 patch, R1 > and R2 both crashed after software shutdown with: > > ASSERT [ArmJunoDxe] > /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(397): > !EFI_ERROR (Status) > > > But even worse than that, at one point, R2 did this: > > Installation of the FDT using the device path > <VenHw(E7223039-5836-41E1-B542-D7EC736C5E59)/board.dtb> completed. > PCI: The size 0x7F000000 of the region 0x80000000 has been increased > to be a power of two for the AXI translation table. > PCI: The size 0x180000000 of the region 0x880000000 has been increased > to be a power of two for the AXI translation table. > > > Synchronous Exception at 0x00000000FDEC712C > PC 0x0000FDEC712C (0x0000FDEC5000+0x0000212C) [ 0] CpuIo2Dxe.dll > PC 0x0000FDEC6570 (0x0000FDEC5000+0x00001570) [ 0] CpuIo2Dxe.dll > PC 0x0000FDE449CC (0x0000FDE41000+0x000039CC) [ 1] PciHostBridge.dll > PC 0x0000F886A6B4 (0x0000F8864000+0x000066B4) [ 2] PciBusDxe.dll > PC 0x0000F8867C54 (0x0000F8864000+0x00003C54) [ 2] PciBusDxe.dll > PC 0x0000F8867C24 (0x0000F8864000+0x00003C24) [ 2] PciBusDxe.dll > PC 0x0000F886CA08 (0x0000F8864000+0x00008A08) [ 2] PciBusDxe.dll > PC 0x0000F88677A8 (0x0000F8864000+0x000037A8) [ 2] PciBusDxe.dll > PC 0x0000F8865920 (0x0000F8864000+0x00001920) [ 2] PciBusDxe.dll > PC 0x0000FE005F8C (0x0000FDFFA000+0x0000BF8C) [ 3] DxeCore.dll > PC 0x0000FE005358 (0x0000FDFFA000+0x0000B358) [ 3] DxeCore.dll > PC 0x0000F8A29D6C (0x0000F8A28000+0x00001D6C) [ 4] ArmJunoDxe.dll > PC 0x0000FE0177B8 (0x0000FDFFA000+0x0001D7B8) [ 5] DxeCore.dll > PC 0x0000FE016F38 (0x0000FDFFA000+0x0001CF38) [ 5] DxeCore.dll > PC 0x0000FE005134 (0x0000FDFFA000+0x0000B134) [ 5] DxeCore.dll > PC 0x0000FE0175C4 (0x0000FDFFA000+0x0001D5C4) [ 5] DxeCore.dll > PC 0x0000FE0179C0 (0x0000FDFFA000+0x0001D9C0) [ 5] DxeCore.dll > PC 0x0000FE017E68 (0x0000FDFFA000+0x0001DE68) [ 5] DxeCore.dll > PC 0x0000FDDEDA34 (0x0000FDDE1000+0x0000CA34) [ 6] BdsDxe.dll > PC 0x0000FDDFD534 (0x0000FDDE1000+0x0001C534) [ 6] BdsDxe.dll > PC 0x0000FDDE3E70 (0x0000FDDE1000+0x00002E70) [ 6] BdsDxe.dll > PC 0x0000FDFFC3DC (0x0000FDFFA000+0x000023DC) [ 7] DxeCore.dll > PC 0x0000FDFFB4A0 (0x0000FDFFA000+0x000014A0) [ 7] DxeCore.dll > PC 0x0000FDFFB024 (0x0000FDFFA000+0x00001024) [ 7] DxeCore.dll > > [ 0] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe/DEBUG/CpuIo2Dxe.dll > [ 1] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/ArmPlatformPkg/ArmJunoPkg/Drivers/PciHostBridgeDxe/PciHostBridgeDxe/DEBUG/PciHostBridge.dll > [ 2] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Bus/Pci/PciBusDxe/PciBusDxe/DEBUG/PciBusDxe.dll > [ 3] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll > [ 4] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe/DEBUG/ArmJunoDxe.dll > [ 5] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll > [ 6] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll > [ 7] /linaro/platforms/uefi/edk2/Build/ArmJuno/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll > > X0 0x0000000057F00000 X1 0x00000000FDECC030 X2 > 0x0000000057F00000 X3 0x000000000000001A > X4 0x00000000FD434E98 X5 0x00000000FDEC64D0 X6 > 0x00000000F887A080 X7 0x00000000FDE4A4B8 > X8 0x00000000FE02D690 X9 0x0000004100000000 X10 > 0x0000000057EFFFFF X11 0xC000000000000000 > X12 0x0000000000000000 X13 0x0000000000000003 X14 > 0x0000000000000000 X15 0x0000000000000003 > X16 0x00000000FEFFFBF0 X17 0x0000000000000000 X18 > 0x0000000000000000 X19 0x0000000000000013 > X20 0x0000000000000000 X21 0x0000000000000000 X22 > 0x0000000000000000 X23 0x0000000000000000 > X24 0x0000000000000000 X25 0x0000000000000000 X26 > 0x0000000000000000 X27 0x0000000000000000 > X28 0x0000000000000000 FP 0x00000000FEFFF2F0 LR 0x00000000FDEC6570 > > V0 0x0000000000000000 0000000000000000 V1 0xAB1C5ED5923F82A4 > 59F111F13956C25B > V2 0x550C7DC3243185BE 12835B01D807AA98 V3 0xC19BF1749BDC06A7 > 80DEB1FE72BE5D74 > V4 0x240CA1CC0FC19DC6 EFBE4786E49B69C1 V5 0x76F988DA5CB0A9DC > 4A7484AA2DE92C6F > V6 0xBF597FC7B00327C8 A831C66D983E5152 V7 0x1429296706CA6351 > D5A79147C6E00BF3 > V8 0x0000000000000000 2E1B213827B70A85 V9 0x0000000000000000 > 766A0ABB650A7354 > V10 0x0000000000000000 A81A664BA2BFE8A1 V11 0x0000000000000000 > D6990624D192E819 > V12 0x0000000000000000 1E376C0819A4C116 V13 0x0000000000000000 > 4ED8AA4A391C0CB3 > V14 0x0000000000000000 78A5636F748F82EE V15 0x0000000000000000 > A4506CEB90BEFFFA > V16 0x3C83709547545153 BC990E9F897D4FA8 V17 0xBEBEF297C5EC0578 > E1D99AE8C2B92607 > V18 0x13242833C5F05BAB 101C24C521387481 V19 0x0A50ABCAD0BDDA12 > 996865483E880969 > V20 0x6C4ABAA53A9BE1CB 4416D9F479B08221 V21 0x60952147C3A68574 > 55B8AE51435C1A1A > V22 0x9FEB2A3B4AB8D3BF 88C1883495C7F76F V23 0xD0C224BC8FB77E09 > 3DB8D233CF470963 > V24 0x0E72963E02D21C93 C95B757DA62B3A12 V25 0x2A77DBA1E4EE5D5C > E08479A1B557DFA8 > V26 0xB593F86668CA8129 927A0019CDEC1A76 V27 0xD000200000E00002 > A6201000E0008C04 > V28 0x8020000081000200 892080400880A022 V29 0x2085000000000136 > 801489A520002304 > V30 0x0801200082040060 0384E05702949200 V31 0x4064084112940202 > 1000458804200800 > > SP 0x00000000FEFFF2D0 ELR 0x00000000FDEC712C SPSR 0x60000209 FPSR > 0x00000000 > ESR 0x96000210 FAR 0x0000000057F00000 > > ESR : EC 0x25 IL 0x1 ISS 0x00000210 > > Data abort: Synchronous external abort > > Stack dump: > 00000FEFFF1D0: 996865483E880969 0A50ABCAD0BDDA12 4416D9F479B08221 > 6C4ABAA53A9BE1CB > 00000FEFFF1F0: 55B8AE51435C1A1A 60952147C3A68574 88C1883495C7F76F > 9FEB2A3B4AB8D3BF > 00000FEFFF210: 3DB8D233CF470963 D0C224BC8FB77E09 C95B757DA62B3A12 > 0E72963E02D21C93 > 00000FEFFF230: E08479A1B557DFA8 2A77DBA1E4EE5D5C 927A0019CDEC1A76 > B593F86668CA8129 > 00000FEFFF250: A6201000E0008C04 D000200000E00002 892080400880A022 > 8020000081000200 > 00000FEFFF270: 801489A520002304 2085000000000136 0384E05702949200 > 0801200082040060 > 00000FEFFF290: 1000458804200800 4064084112940202 00000000FDEC712C > 0000000060000209 > 00000FEFFF2B0: 0000000000000000 0000000096000210 0000000057F00000 > 000000000000001A >> 00000FEFFF2D0: 0000000057F00000 0000000057F00000 FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF > 00000FEFFF2F0: 00000000FEFFF350 00000000FDE449CC A831C66D983E5152 > 00000000FD434E98 > 00000FEFFF310: 000000000000001A 0000000057F00000 0000000057F00000 > 00000000FDECC000 > 00000FEFFF330: 00000000FD434E98 0101000000000000 0000000000000000 > 00000000FD434E98 > 00000FEFFF350: 00000000FEFFF3A0 00000000F886A6B4 00000000FEFFF3A0 > 00000000FD434E98 > 00000FEFFF370: 000000000000001A 0000000057F00000 00000000FEFFF3A0 > 00000000FD46C860 > 00000FEFFF390: 00000000FDECC000 00000000FD46C798 00000000FEFFF420 > 00000000F8867C54 > 00000FEFFF3B0: 0000000057F00000 00000000FDDB8018 00000000FEFFF400 > 57F00000F8879090 > ASSERT [ArmCpuDxe] > /linaro/platforms/uefi/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(265): > ((BOOLEAN)(0==1)) > > I just wanted to add to this problem report in case we start tracking it again. I've been testing software shutdown and reboot using upstream commit 5734d48 with Daniil's v2 patch applied and rebooting (with "reset -s") in a loop. Eventually after many reboots, I got Juno R1 to report the same error as R0 on boot: ERROR: PCIe CSR read failed to respond ERROR: SMBus transaction not claimed However, unlike R0 which reports this forever in a loop, R1 reports those two error messages 100 times, then reports this before continuing to boot the board: ERROR: reading PCIE Switch VID PCIe MAC addresss 0002-F700-60BB MAC addrs test: PASSED SMC MAC address 0002-F700-60B9 Setting HDMI0 mode for SVGA. Setting HDMI1 mode for SVGA. SoC SMB clock enabled. Testing SMB clock... SMB clock running Releasing system resets... UART0 set to SoC UART0 UART1 set to SoC UART1 NOTICE: Booting Trusted Firmware [snip] I've not seen boot continue after the "PCIe CSR ..." errors before. At this point, as Daniil reported, EDK2 hangs detecting the controller: ASSERT [ArmJunoDxe] /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(395): !EFI_ERROR (Status) Where, in my tree, I see ArmJunoDxe.c has these lines on code at line 394/395: 394: Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, FALSE); 395: ASSERT_EFI_ERROR (Status); I'm going to stop looking into this issue now until I get new motherboard firmware with a proposed fix. > Anyway, these problems are all due to dodgy lower level firmware, > which should be fixed eventually. And your v2 patch removing the > assert is out best way forward. > > Cheers, > Ryan. > > >> >>> >>> >>>> As i understand, you do >>>> not have the ASSERT with your old builds so you can start the UEFI shell. >>>> Use the "pci" shell command to list all PCI devices. Do you see them >>>> correctly enumerated? I'd like to make sure that by ignoring the ASSERT >>>> errors we will not face any further problems with PCI. >>>> >>> In the case where my board continues to boot after software shutdown, >>> the pci shell command shows this: >>> >>> Shell> pci >>> Seg Bus Dev Func >>> --- --- --- ---- >>> 00 00 00 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 1556 Device 1100 Prog Interface 0 >>> 00 01 00 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 01 00 ==> Pre 2.0 device - All devices other than VGA >>> Vendor 0000 Device 0000 Prog Interface 0 >>> 00 02 02 00 ==> Pre 2.0 device - All devices other than VGA >>> Vendor 0000 Device 0000 Prog Interface 0 >>> 00 02 03 00 ==> Pre 2.0 device - All devices other than VGA >>> Vendor 0000 Device 0000 Prog Interface 0 >>> 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 10 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 1F 00 ==> Pre 2.0 device - All devices other than VGA >>> Vendor 0000 Device 0000 Prog Interface 0 >>> >>> Which is not the same as when the board is power cycled. On power >>> cycle, I see this: >>> >>> Seg Bus Dev Func >>> --- --- --- ---- >>> 00 00 00 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 1556 Device 1100 Prog Interface 0 >>> 00 01 00 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 01 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 02 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 03 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 10 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 02 1F 00 ==> Bridge Device - PCI/PCI bridge >>> Vendor 111D Device 8090 Prog Interface 0 >>> 00 03 00 00 ==> Mass Storage Controller - Other mass storage >>> controll >>> er >>> Vendor 1095 Device 3132 Prog Interface 0 >>> 00 08 00 00 ==> Network Controller - Ethernet controller >>> Vendor 11AB Device 4380 Prog Interface 0 >>> >>> Cheers, >>> Ryan. >>> >>> >> >> If I let UEFI boot to the shell after "PCIe ..." error, the pci shell >> command shows me a long list of unrecognized PCI devices with all >> combinations of BDFs. >> >> Thanks, >> Daniil >> >> >>>> Thanks, >>>> Daniil >>>> >>>> >>>> Thanks, >>>> Daniil >>>> >>>> >>>> Prior to your "Set Marvell Yukon MAC address" patch, or with the >>>> earlier version, the board would boot anyway, but the Yukon device >>>> would be missing. >>>> >>>> Now it dies. >>>> >>>> I don't know which is worse, but I think hanging is worse than an >>>> ethernet port dropping out. Although hanging is a bit more obvious >>>> that there's a problem... >>>> >>>> >>>> + } >>>> } >>>> >>>> STATIC >>>> -- >>>> 2.7.4 >>>> >>>> _______________________________________________ >>>> edk2-devel mailing list >>>> edk2-devel@lists.01.org >>>> https://lists.01.org/mailman/listinfo/edk2-devel >>>> >>>> >>>> _______________________________________________ >>>> edk2-devel mailing list >>>> edk2-devel@lists.01.org >>>> https://lists.01.org/mailman/listinfo/edk2-devel >>>> >>>> >>> _______________________________________________ >>> edk2-devel mailing list >>> edk2-devel@lists.01.org >>> https://lists.01.org/mailman/listinfo/edk2-devel >> >> > > [1] mainline is this commit at the moment: > 5734d48 2017-01-22 ShellPkg SmbiosView: Add decoding of SMBIOS spec > 3.1.1 [Star Zeng] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-01-24 12:34 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-18 23:27 [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 Daniil Egranov 2017-01-19 13:49 ` Ryan Harkin 2017-01-19 15:13 ` Leif Lindholm 2017-01-20 1:34 ` Daniil Egranov 2017-01-20 10:30 ` Ryan Harkin 2017-01-20 20:57 ` Daniil Egranov 2017-01-23 11:26 ` Ryan Harkin 2017-01-23 12:56 ` Ryan Harkin 2017-01-24 2:19 ` Daniil Egranov 2017-01-24 11:05 ` Ryan Harkin 2017-01-24 12:34 ` Ryan Harkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox