From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-x230.google.com (mail-lf0-x230.google.com [IPv6:2a00:1450:4010:c07::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 45D7681EDF for ; Mon, 23 Jan 2017 04:56:12 -0800 (PST) Received: by mail-lf0-x230.google.com with SMTP id z134so92872340lff.3 for ; Mon, 23 Jan 2017 04:56:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Si+b87sKbAXdw6XfyCG8tdnpBfN46LrU3tRHqvbMyoo=; b=W0wvPsEq01tlyn5Al3rM3JMTVu/RQ3lS6eaOLtNffhg+4WOuKpcsjP6iccPf16/nZD EHK7hmyH1L7PjmDDejBrVnSFYi1fVb5DP3IrVgGosrStG45fGHpggF4ZT/5Wv+K5Ld6X DEvykoMpLdpD0ihyNS7/16Mec2R4646NrSVvc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Si+b87sKbAXdw6XfyCG8tdnpBfN46LrU3tRHqvbMyoo=; b=VBUyKwKs6vGlHOkhX6IHxa7qTRuidj3h6n9dJckVCj8T4PkAkvVz0jO2+j8hxNKB9o RH0orkBpqturx5K9YqxhCqg4dqwgITmgttbhYkGSPrMv6IWQRtVX44NNIFNcdCDlNQmB 19tc1NYhJWke05cuWXDXS+VNkaS8UvHZKCe4quHErRn92++Kgd6n/21LrOlaG9q9SqUs jV0Jz/2r3TlakL5spH1g11p8fcNUjZj/9LrSd80uCUtuRETI6ph6MgicIskqbqXyOQER 0Xe9na7X+M4o3gwjk6CN8LVZHCb0V5HL7R4x55b7LNq5EkgWvrWrXctX6LnwiDaLizCR sLKg== X-Gm-Message-State: AIkVDXL57hONcAFGYbjZDi/msRInUqOiCL9TU2obv6xzv2vLI5f7C7u02HTbfguswzB1dybSxgF3O/2M+6Ku/7r7 X-Received: by 10.25.29.195 with SMTP id d186mr9164292lfd.72.1485176170284; Mon, 23 Jan 2017 04:56:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.207.72 with HTTP; Mon, 23 Jan 2017 04:56:09 -0800 (PST) In-Reply-To: <6f842667-253f-57ce-cf76-a47ad464aa97@arm.com> References: <1484782032-82342-1-git-send-email-daniil.egranov@arm.com> <20170119151307.GZ25883@bivouac.eciton.net> <6f842667-253f-57ce-cf76-a47ad464aa97@arm.com> From: Ryan Harkin Date: Mon, 23 Jan 2017 12:56:09 +0000 Message-ID: To: Daniil Egranov Cc: "edk2-devel@lists.01.org" , Leif Lindholm Subject: Re: [PATCH] ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: Fixed crash on Juno R0 X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jan 2017 12:56:12 -0000 Content-Type: text/plain; charset=UTF-8 On 20 January 2017 at 20:57, Daniil Egranov wrote: > Hi Ryan, > > > On 01/20/2017 04:30 AM, Ryan Harkin wrote: > > On 20 January 2017 at 01:34, Daniil Egranov wrote: > > Hi Leif, Ryan > > > On 01/19/2017 09:13 AM, Leif Lindholm wrote: > > On Thu, Jan 19, 2017 at 01:49:04PM +0000, Ryan Harkin wrote: > > On 18 January 2017 at 23:27, Daniil Egranov wrote: > > The Marvell Yukon MAC address load supported only on Juno R1 and R2. > It disabled for Juno R0 due to PCI issues on this board. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Daniil Egranov > > Tested-by: Ryan Harkin > > --- > ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > index 47ff587..e9e6990 100644 > --- a/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > +++ b/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c > @@ -378,6 +378,7 @@ OnEndOfDxe ( > EFI_DEVICE_PATH_PROTOCOL* PciRootComplexDevicePath; > EFI_HANDLE Handle; > EFI_STATUS Status; > + UINT32 JunoRevision; > > // > // PCI Root Complex initialization > @@ -393,8 +394,12 @@ OnEndOfDxe ( > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, > FALSE); > ASSERT_EFI_ERROR (Status); > > - Status = ArmJunoSetNicMacAddress (); > - ASSERT_EFI_ERROR (Status); > + GetJunoRevision (JunoRevision); > + > + if (JunoRevision != JUNO_REVISION_R0) { > + Status = ArmJunoSetNicMacAddress (); > + ASSERT_EFI_ERROR (Status); > > This is just an FYI, but I stacked your patch on top of mainline, like this: > > 5f81f61 2017-01-18 ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe: > Fixed crash on Juno R0 [Daniil Egranov] > 19ca06b 2017-01-19 OvmfPkg: Remove superfluous return statements. > [Thomas Huth] > > The first time I ran this, Juno R0 worked fine, but on R1 and R2, the > assert triggered: > > UEFI firmware (version 5f81f61 built at 11:56:52 on Jan 19 2017) > [snip] > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /linaro/platforms/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(401): > !EFI_ERROR (Status) > > I worked out what is happening. And it's not to do with this patch. > It's another fall-out from the re-work you did to the previous patch. > It's also ultimately due to a bug the firmware. > > With the initial version of your "Set Marvell Yukon MAC address" > patch, this hang didn't happen. I suspect that was because your error > checking was weaker and certain PCIe failures didn't trigger the > assert. > > To reproduce the error with this commit: > 1) power on and boot R1 or R2 into Shell > I do this by interrupting the boot by pressing ESCAPE and using the boot > menu > 2) At the Shell prompt, run "reset -s" to shutdown > 3) At the ARM Boot Loader "Cmd>" prompt, run "reboot" > 4) the board will hang while booting UEFI, assuming the board firmware > doesn't die with constant messages like this: > > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > > Assuming the problem is firmware, not EDK2, what should we do about it? > > OK, so instinctively, my reaction was that "the reset -s bug is a > system controller firmware bug and we shouldn't work around > it". However, since it is actually disrupting Ryan's workflow, which > frequently doesn't touch PCI at all, I think downgrading the ASSERT to > an error message is a good idea short-term. > > Daniil - could you make that change please? > > / > Leif > > > I've been able to reproduce "PCIe CSR read failed to respond" and "SMBus > transaction not claimed" errors on my Juno R2. I disabled Marvell Yukon > driver (.dsc/.fdf) and removed ArmJunoDxe patch but still see the same error > messages during the initial boot. > > Testing motherboard interfaces (FPGA build 118)... > SRAM 32MB test: PASSED > LAN9118 test: PASSED > KMI1/2 test: PASSED > MMC test: PASSED > PB/LEDs test: PASSED > FPGA UART test: PASSED > ERROR: PCIe CSR read failed to respond > ERROR: SMBus transaction not claimed > ERROR: PCIe CSR read failed to respond > ... > > Once it went through reporting these errors, the UEFI starts loading but > still fails in OnEndOfDxe(): > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /home/user/workspace/juno/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(110): > !EFI_ERROR (Status) > > This is the original ArmJunoDxe code: > Status = gBS->ConnectController (Handle, NULL, PciRootComplexDevicePath, > FALSE); > ASSERT_EFI_ERROR (Status); <---- line 110 > > I'm confused at which point in the commit history this is line 110. > > Just before your MAC commit, it's line 108. After your commit, and in > the current mainline, it's line 394. > > Then, I have to go back to Jeremy's commit > 58a4bff071832e587d18f97597b5d571dcebc9d7 on 27 July 2016 before I see > another change to ArmJunoDxe.c. Before that commit, the assert appears > on line 116. > > On my tree, 17.01, 16.12 and 16.11 has the assert on line 246, 16.10 > on line 125. > > So I'm unsure what code you are running. > > > This shows the code where it fails for me after MAC address code was removed > from ArmJunoDxe.c. It's part of my development branch and lines may not > matching any public commits You should consider switching branch to mainline when trying to compare and report problems with other people, then we have a common baseline to work with. > ... but i pulled 16.06 release for a test and 16.06? Why so old? What's wrong with 16.12? You are missing firmware updates and fixes by using such an old release. > see the ASSERT on the same code in case of "PCIe .." and "SMBus .." errors. > The line number should be corresponding to > 032082bf9906f82b77c161c1c2efbddfd4bb3751 commit: > > ASSERT_EFI_ERROR (Status = Not Found) > ASSERT [ArmJunoDxe] > /home/user/workspace/juno_16.06_orig/uefi/edk2/ArmPlatformPkg/ArmJunoPkg/Drivers/ArmJunoDxe/ArmJunoDxe.c(124): > !EFI_ERROR (Status) > > > but it actually fails here first: > Status = gBS->LocateDevicePath (&gEfiPciRootBridgeIoProtocolGuid, > &PciRootComplexDevicePath, > &Handle); > > Ryan - could you try to remove Marvell patches and check if you also > catching "PCIe .." and "SMBus .." errors without them and your build still > fails with other ASSERTs related to PCI. > > Leif - in my tests, if it fails with "PCIe .." and "SMBus ..", the UEFI PCI > enumeration is completely corrupted. > > These errors do not appear if board was reset with the nPBRESET button. It > never fails with "reset -w" and "reset -c". I also loaded Debian and used > the shutdown command from there and got the same "PCIe .." and "SMBus .." > errors after "reboot" command from the "Cmd>" prompt. Possibly, the > "reboot" command from the board shell prompt does not reset the board > correctly so it looks like a firmware issue. > > Yes, that's pretty much what I've been saying. There appears to be a > firmware bug that causes PCIe to be in an unstable state after > software shutdown of the AP by the motherboard, followed by reset. > > The "PCIe CSR read failed to respond" error is nothing to do with EDK2 > and happens before EDK2 runs. > > However, unlike you, if I don't get the "PCIe CSR read failed to > respond" error, upstream EDK2 before your MAC address commit will boot > without ASSERTing on R0/1/2. > > > Do you mean, when you do have the "PCIe .." error ? As i understand, if you > power-cycle the board (no "PCIe .." errors) you do not see ASSERT at the MAC > address code during the first boot and can start UEFI Shell? > Correct. > So I've never seen the assert at line 110 as you pointed out above. > That's curious. I can see why that would/should assert, but it doesn't > for me. Perhaps it's something to do with the tree you are building > from? Or the devices you have connected? I have no additional SATA or > PCIe devices connected to my board. > > Cheers, > Ryan. > > > I am using R2 for the tests. The tests were done without SATA or PCI devices > connected to the board. Unfortunately, I do not have R1 board at this moment > to check it. > That's fine, R2 is good enough. > This is my R2 firmware versions: > > ARM V2M-Juno Boot loader v1.0.0 > HBI0262 build 1844 > > ARM V2M_Juno r2 Firmware v1.4.3 v1.4.4 is the latest Juno firmware. You should update. Although it won't fix these problems as it wasn't intended to. And as you used 16.06 for your baseline testing, I assume you probably also have old SCP firmware in your environment too. > Build Date: Apr 12 2016 > > MIC RAM configuration (pms_v105.bin)... > > Configuring motherboard (rev D, var A)... > IOFPGA image \MB\HBI0262D\io_b118.bit > > As Leif suggested, i will replace the ASSERT with an error message and send > another patch. > Thanks, that's all we need to do, I think. > Could you please check one thing for me on your boards? Use a build which > worked for you (no ASSERT) and your steps: UEFI shell -> reset -s -> reboot > -> see "PCIe .." and "SMBus .." errors -> boot UEFI. After I get the "PCIe .." and "SMBus .." errors, the board will never boot into UEFI. It hangs forever and I have to power-cycle the board. This rarely happens for me on R2. Sometimes, I do UEFI shell -> reset -s -> reboot and the board will give the "PCIe .." and "SMBus .." error, but most of the time, it does not and UEFI continues to boot. > As i understand, you do > not have the ASSERT with your old builds so you can start the UEFI shell. > Use the "pci" shell command to list all PCI devices. Do you see them > correctly enumerated? I'd like to make sure that by ignoring the ASSERT > errors we will not face any further problems with PCI. > In the case where my board continues to boot after software shutdown, the pci shell command shows this: Shell> pci Seg Bus Dev Func --- --- --- ---- 00 00 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 1556 Device 1100 Prog Interface 0 00 01 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 01 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 02 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 03 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 10 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 1F 00 ==> Pre 2.0 device - All devices other than VGA Vendor 0000 Device 0000 Prog Interface 0 Which is not the same as when the board is power cycled. On power cycle, I see this: Seg Bus Dev Func --- --- --- ---- 00 00 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 1556 Device 1100 Prog Interface 0 00 01 00 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 01 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 02 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 03 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 0C 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 10 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 02 1F 00 ==> Bridge Device - PCI/PCI bridge Vendor 111D Device 8090 Prog Interface 0 00 03 00 00 ==> Mass Storage Controller - Other mass storage controll er Vendor 1095 Device 3132 Prog Interface 0 00 08 00 00 ==> Network Controller - Ethernet controller Vendor 11AB Device 4380 Prog Interface 0 Cheers, Ryan. > Thanks, > Daniil > > > Thanks, > Daniil > > > Prior to your "Set Marvell Yukon MAC address" patch, or with the > earlier version, the board would boot anyway, but the Yukon device > would be missing. > > Now it dies. > > I don't know which is worse, but I think hanging is worse than an > ethernet port dropping out. Although hanging is a bit more obvious > that there's a problem... > > > + } > } > > STATIC > -- > 2.7.4 > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > >