From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by ml01.01.org (Postfix) with ESMTP id BDA9E81EE6 for ; Mon, 23 Jan 2017 18:19:32 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 56647AD7; Mon, 23 Jan 2017 18:19:32 -0800 (PST) Received: from [192.168.236.128] (unknown [10.119.48.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D6C1E3F3D6; Mon, 23 Jan 2017 18:19:31 -0800 (PST) To: Ryan Harkin References: <1484782032-82342-1-git-send-email-daniil.egranov@arm.com> <20170119151307.GZ25883@bivouac.eciton.net> <6f842667-253f-57ce-cf76-a47ad464aa97@arm.com> Cc: "edk2-devel@lists.01.org" , Leif Lindholm From: Daniil Egranov Message-ID: <63991be9-816a-98b2-c3f8-e3cb3e86419c@arm.com> Date: Mon, 23 Jan 2017 20:19:31 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: Subject: Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jan 2017 02:19:32 -0000 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Hi Ryan, On 01/23/2017 06:56 AM, Ryan Harkin wrote: > On 20 January 2017 at 20:57, Daniil Egranov wrote: >> Hi Ryan, >> >> >> On 01/20/2017 04:30 AM, Ryan Harkin wrote: >> >> On 20 January 2017 at 01:34, Daniil Egranov wrote: >> >> Hi Leif, Ryan >> >> >> On 01/19/2017 09:13 AM, Leif Lindholm wrote: >> >> On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: >> >> On 18 January 2017 at 23:27, Daniil Egranov wrote: >> >> The Marvell Yukon MAC address load supported only on Juno R1 and R2. >> It disabled for Juno R0 due to PCI issues on this board. >> >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Daniil Egranov >> >> Tested-by: Ryan Harkin >> >> --- >> ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> index 47ff587..e9e6990 100644 >> --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c >> @@ -378,6 +378,7 @@ OnEndOfDxe ( >> EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; >> EFI_HANDLE Handle; >> EFI_STATUS Status; >> + UINT32 JunoRevision; >> >> // >> // PCI Root Complex initialization >> @@ -393,8 +394,12 @@ OnEndOfDxe ( >> Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, >> FALSE); >> ASSERT_EFI_ERROR (Status); >> >> - Status = ArmJunoSetNicMacAddress (); >> - ASSERT_EFI_ERROR (Status); >> + GetJunoRevision (JunoRevision); >> + >> + if (JunoRevision != JUNO_REVISION_R0) { >> + Status = ArmJunoSetNicMacAddress (); >> + ASSERT_EFI_ERROR (Status); >> >> This is just an FYI, but I stacked your patch on top of mainline, like this: >> >> 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: >> Fixed crash on Juno R0 [Daniil Egranov] >> 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. >> [Thomas Huth] >> >> The first time I ran this, Juno R0 worked fine, but on R1 and R2, the >> assert triggered: >> >> UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) >> [snip] >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): >> !EFI_ERROR (Status) >> >> I worked out what is happening. And it's not to do with this patch. >> It's another fall-out from the re-work you did to the previous patch. >> It's also ultimately due to a bug the firmware. >> >> With the initial version of your "Set Marvell Yukon MAC address" >> patch, this hang didn't happen. I suspect that was because your error >> checking was weaker and certain PCIe failures didn't trigger the >> assert. >> >> To reproduce the error with this commit: >> 1) power on and boot R1 or R2 into Shell >> I do this by interrupting the boot by pressing ESCAPE and using the boot >> menu >> 2) At the Shell prompt, run "reset -s" to shutdown >> 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" >> 4) the board will hang while booting UEFI, assuming the board firmware >> doesn't die with constant messages like this: >> >> ERROR: PCIe CSR read failed to respond >> ERROR: SMBus transaction not claimed >> >> Assuming the problem is firmware, not EDK2, what should we do about it? >> >> OK, so instinctively, my reaction was that "the reset -s bug is a >> system controller firmware bug and we shouldn't work around >> it". However, since it is actually disrupting Ryan's workflow, which >> frequently doesn't touch PCI at all, I think downgrading the ASSERT to >> an error message is a good idea short-term. >> >> Daniil - could you make that change please? >> >> / >> Leif >> >> >> I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus >> transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon >> driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error >> messages during the initial boot. >> >> Testing motherboard interfaces (FPGA build 118)... >> SRAM 32MB test: PASSED >> LAN9118 test: PASSED >> KMI1/2 test: PASSED >> MMC test: PASSED >> PB/LEDs test: PASSED >> FPGA UART test: PASSED >> ERROR: PCIe CSR read failed to respond >> ERROR: SMBus transaction not claimed >> ERROR: PCIe CSR read failed to respond >> ... >> >> Once it went through reporting these errors, the UEFI starts loading but >> still fails in OnEndOfDxe(): >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): >> !EFI_ERROR (Status) >> >> This is the original ArmJunoDxe code: >> Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, >> FALSE); >> ASSERT_EFI_ERROR (Status); <---- line 110 >> >> I'm confused at which point in the commit history this is line 110. >> >> Just before your MAC commit, it's line 108. After your commit, and in >> the current mainline, it's line 394. >> >> Then, I have to go back to Jeremy's commit >> 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see >> another change to ArmJunoDxe.c. Before that commit, the assert appears >> on line 116. >> >> On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 >> on line 125. >> >> So I'm unsure what code you are running. >> >> >> This shows the code where it fails for me after MAC address code was removed >> from ArmJunoDxe.c. It's part of my development branch and lines may not >> matching any public commits > You should consider switching branch to mainline when trying to > compare and report problems with other people, then we have a common > baseline to work with. > > >> ... but i pulled 16.06 release for a test and > 16.06? Why so old? What's wrong with 16.12? You are missing firmware > updates and fixes by using such an old release. I wanted to check how long this problem existed. Just picked up one of the old builds. > >> see the ASSERT on the same code in case of "PCIe .." and "SMBus .." errors. >> The line number should be corresponding to >> 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit: >> >> ASSERT_EFI_ERROR (Status = Not Found) >> ASSERT [ArmJunoDxe] >> /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124): >> !EFI_ERROR (Status) >> >> >> but it actually fails here first: >> Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, >> &PciRootComplexDevicePath, >> &Handle); >> >> Ryan - could you try to remove Marvell patches and check if you also >> catching "PCIe .." and "SMBus .." errors without them and your build still >> fails with other ASSERTs related to PCI. >> >> Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI >> enumeration is completely corrupted. >> >> These errors do not appear if board was reset with the nPBRESET button. It >> never fails with "reset -w" and "reset -c". I also loaded Debian and used >> the shutdown command from there and got the same "PCIe .." and "SMBus .." >> errors after "reboot" command from the "Cmd>" prompt. Possibly, the >> "reboot" command from the board shell prompt does not reset the board >> correctly so it looks like a firmware issue. >> >> Yes, that's pretty much what I've been saying. There appears to be a >> firmware bug that causes PCIe to be in an unstable state after >> software shutdown of the AP by the motherboard, followed by reset. >> >> The "PCIe CSR read failed to respond" error is nothing to do with EDK2 >> and happens before EDK2 runs. >> >> However, unlike you, if I don't get the "PCIe CSR read failed to >> respond" error, upstream EDK2 before your MAC address commit will boot >> without ASSERTing on R0/1/2. >> >> >> Do you mean, when you do have the "PCIe .." error ? As i understand, if you >> power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the MAC >> address code during the first boot and can start UEFI Shell? >> > Correct. > > >> So I've never seen the assert at line 110 as you pointed out above. >> That's curious. I can see why that would/should assert, but it doesn't >> for me. Perhaps it's something to do with the tree you are building >> from? Or the devices you have connected? I have no additional SATA or >> PCIe devices connected to my board. >> >> Cheers, >> Ryan. >> >> >> I am using R2 for the tests. The tests were done without SATA or PCI devices >> connected to the board. Unfortunately, I do not have R1 board at this moment >> to check it. >> > That's fine, R2 is good enough. > > >> This is my R2 firmware versions: >> >> ARM V2M-Juno Boot loader v1.0.0 >> HBI0262 build 1844 >> >> ARM V2M_Juno r2 Firmware v1.4.3 > v1.4.4 is the latest Juno firmware. You should update. Although it > won't fix these problems as it wasn't intended to. And as you used > 16.06 for your baseline testing, I assume you probably also have old > SCP firmware in your environment too. > > >> Build Date: Apr 12 2016 >> >> MIC RAM configuration (pms_v105.bin)... >> >> Configuring motherboard (rev D, var A)... >> IOFPGA image \MB\HBI0262D\io_b118.bit >> >> As Leif suggested, i will replace the ASSERT with an error message and send >> another patch. >> > Thanks, that's all we need to do, I think. > Done. >> Could you please check one thing for me on your boards? Use a build which >> worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> reboot >> -> see "PCIe .." and "SMBus .." errors -> boot UEFI. > After I get the "PCIe .." and "SMBus .." errors, the board will never > boot into UEFI. It hangs forever and I have to power-cycle the board. > This rarely happens for me on R2. > > Sometimes, I do UEFI shell -> reset -s -> reboot and the board will > give the "PCIe .." and "SMBus .." error, but most of the time, it does > not and UEFI continues to boot. Interesting, if i do reset -s, i am getting "PCIe ..." error and stop with ASSERT error in UEFI all the time. > > >> As i understand, you do >> not have the ASSERT with your old builds so you can start the UEFI shell. >> Use the "pci" shell command to list all PCI devices. Do you see them >> correctly enumerated? I'd like to make sure that by ignoring the ASSERT >> errors we will not face any further problems with PCI. >> > In the case where my board continues to boot after software shutdown, > the pci shell command shows this: > > Shell> pci > Seg Bus Dev Func > --- --- --- ---- > 00 00 00 00 ==> Bridge Device - PCI/PCI bridge > Vendor 1556 Device 1100 Prog Interface 0 > 00 01 00 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 01 00 ==> Pre 2.0 device - All devices other than VGA > Vendor 0000 Device 0000 Prog Interface 0 > 00 02 02 00 ==> Pre 2.0 device - All devices other than VGA > Vendor 0000 Device 0000 Prog Interface 0 > 00 02 03 00 ==> Pre 2.0 device - All devices other than VGA > Vendor 0000 Device 0000 Prog Interface 0 > 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 10 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 1F 00 ==> Pre 2.0 device - All devices other than VGA > Vendor 0000 Device 0000 Prog Interface 0 > > Which is not the same as when the board is power cycled. On power > cycle, I see this: > > Seg Bus Dev Func > --- --- --- ---- > 00 00 00 00 ==> Bridge Device - PCI/PCI bridge > Vendor 1556 Device 1100 Prog Interface 0 > 00 01 00 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 01 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 02 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 03 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 10 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 02 1F 00 ==> Bridge Device - PCI/PCI bridge > Vendor 111D Device 8090 Prog Interface 0 > 00 03 00 00 ==> Mass Storage Controller - Other mass storage controll > er > Vendor 1095 Device 3132 Prog Interface 0 > 00 08 00 00 ==> Network Controller - Ethernet controller > Vendor 11AB Device 4380 Prog Interface 0 > > Cheers, > Ryan. > > If I let UEFI boot to the shell after "PCIe ..." error, the pci shell command shows me a long list of unrecognized PCI devices with all combinations of BDFs. Thanks, Daniil >> Thanks, >> Daniil >> >> >> Thanks, >> Daniil >> >> >> Prior to your "Set Marvell Yukon MAC address" patch, or with the >> earlier version, the board would boot anyway, but the Yukon device >> would be missing. >> >> Now it dies. >> >> I don't know which is worse, but I think hanging is worse than an >> ethernet port dropping out. Although hanging is a bit more obvious >> that there's a problem... >> >> >> + } >> } >> >> STATIC >> -- >> 2.7.4 >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel >> >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel >> >> > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel