From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web12.412.1588041095709148397 for ; Mon, 27 Apr 2020 19:31:36 -0700 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: jeremy.linton@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 451CA30E; Mon, 27 Apr 2020 19:31:34 -0700 (PDT) Received: from [192.168.122.166] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EF7253F305; Mon, 27 Apr 2020 19:31:33 -0700 (PDT) Subject: Re: [edk2-devel] [PATCH] MdeModulePkg/UsbBusDxe: Rebuild the description table after Reset Device To: "Jiang, Guomin" , "Wu, Hao A" , "devel@edk2.groups.io" Cc: "Wang, Jian J" , "Ni, Ray" , "ard.biesheuvel@arm.com" , "Tian, Feng" , "Kinney, Michael D" References: <20200425013620.1159-1-guomin.jiang@intel.com> From: "Jeremy Linton" Message-ID: Date: Mon, 27 Apr 2020 21:31:29 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Hi, On 4/27/20 8:14 PM, Jiang, Guomin wrote: > Add stakeholder and link https://edk2.groups.io/g/devel/message/58193 > > Add discussion history and add comment. > >>> Hi Jeremy, >>> >>> You can refer https://edk2.groups.io/g/devel/message/58125 for discussion >> about this solution. >> >> Oh fun, odd how a bug can exist in a code base for years and then this >> happens.. I will move the discussion there. >> >> Thanks, >> >>> >>> Two issue I found: >>> 1. Memory leakage may occur if doing so and I am investigating it. >> >> It seems our solutions differ a bit? > > The memory leakage may occur when invoke UsbBuildDescTable(), because you allocate new buffer but haven't freed the old allocated buffer ye. What tool are you using to detect memory leaks? > >> >>> 2. It test pass with OVMF but fail in real platform, and I am figuring out the >> flow. >> >> Hmm, I've been seeing this on a RPI with an attached USB3 hub and 5 bay USB >> JBOD. (there is another problem but the reset crash is keeps the machine >> from booting). > > Can you share detail information with us. Sure, this is all off the shelf hardware, its an rpi4+edk2. What you you like? The device that causes the ASSERT() is a JMicron Controller (152d:0562) in one of these: http://www.yottamaster.com/index.php?route=product/product&path=63_65&product_id=91 behind a VIA based usb3 hub (2109:8110). There is a fpaste of the log here: https://paste.centos.org/view/1860e620 Grab anything you want quick, it expires in 1 day. The last few lines are: UsbBotDataTransfer: DataIn Stall UsbBootExecCmd: Success to Exec 0x88 Cmd (Result = 1) UsbBootRequestSense: (Invalid Parameter) with error code (70) sense key 5/24/0 XhcCheckUrbResult: STALL_ERROR! Completecode = 6 Recovery Halted Slot = 6,Dci = 3 XhcResetEndpoint: Slot = 0x6, Dci = 0x3 XhcSetTrDequeuePointer: Slot = 0x6, Dci = 0x3, Urb = 0x3597C998 XhcBulkTransfer: error - Device Error, transfer - 2 UsbBotDataTransfer: (Device Error) UsbBotDataTransfer: DataIn Stall UsbBootExecCmd: Success to Exec 0x88 Cmd (Result = 1) UsbBootRequestSense: (Invalid Parameter) with error code (70) sense key 5/24/0 UsbMassReadBlocks: UsbBootReadBlocks (Invalid Parameter) -> Reset Disable device slot 6! Enable Slot Successfully, The Slot ID = 0x6 Address 6 assigned successfully UsbIoPortReset: device is now ADDRESSED at 6 ASSERT [XhciDxe] /home/jlinton/rpi2/edk2/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched) BTW: I'm pretty sure I've seen this with various other pieces of hardware in the past as well. Usually in an unreproducible manner. This one was easy because the EP is just 5/2400, following a linux reboot (where the device is working). With the patch I suggested it gets those 5/2400's and continues the boot. I was a bit unable to decide if I should attempt to manually request the device descriptors in the XHCI driver rather than modifying the USB core. OTOH, that solution ends but being a bit ugly, because presumably one would do it in the initial set address/etc by querying the device descriptors at that point. > >> -----Original Message----- >> From: Wu, Hao A >> Sent: Monday, April 27, 2020 9:36 AM >> To: Jiang, Guomin ; devel@edk2.groups.io >> Cc: Wang, Jian J ; Ni, Ray >> Subject: RE: [edk2-devel] [PATCH] MdeModulePkg/UsbBusDxe: Rebuild the >> description table after Reset Device >> >>> -----Original Message----- >>> From: Jiang, Guomin >>> Sent: Sunday, April 26, 2020 9:05 PM >>> To: Wu, Hao A; devel@edk2.groups.io >>> Cc: Wang, Jian J; Ni, Ray >>> Subject: RE: [edk2-devel] [PATCH] MdeModulePkg/UsbBusDxe: Rebuild the >>> description table after Reset Device >>> >>> Add comment inline. >>> >>>> -----Original Message----- >>>> From: Wu, Hao A >>>> Sent: Sunday, April 26, 2020 4:04 PM >>>> To: Jiang, Guomin ; devel@edk2.groups.io >>>> Cc: Wang, Jian J ; Ni, Ray >>>> Subject: RE: [edk2-devel] [PATCH] MdeModulePkg/UsbBusDxe: Rebuild >>>> the description table after Reset Device >>>> >>>>> -----Original Message----- >>>>> From: Jiang, Guomin >>>>> Sent: Sunday, April 26, 2020 2:44 PM >>>>> To: Wu, Hao A; devel@edk2.groups.io >>>>> Cc: Wang, Jian J; Ni, Ray >>>>> Subject: RE: [edk2-devel] [PATCH] MdeModulePkg/UsbBusDxe: Rebuild >>>> the >>>>> description table after Reset Device >>>>> >>>>> Hi Hao, >>>>> >>>>> Add comments inline. >>>>> >>>>>> -----Original Message----- >>>>>> From: Wu, Hao A >>>>>> Sent: Sunday, April 26, 2020 1:12 PM >>>>>> To: devel@edk2.groups.io; Wu, Hao A ; Jiang, >>>>> Guomin >>>>>> >>>>>> Cc: Wang, Jian J ; Ni, Ray >>>>>> >>>>>> Subject: RE: [edk2-devel] [PATCH] MdeModulePkg/UsbBusDxe: >>>>>> Rebuild the description table after Reset Device >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: devel@edk2.groups.io [mailto:devel@edk2.groups.io] On >>> Behalf >>>>> Of >>>>>>> Wu, Hao A >>>>>>> Sent: Sunday, April 26, 2020 1:10 PM >>>>>>> To: Jiang, Guomin; devel@edk2.groups.io >>>>>>> Cc: Wang, Jian J; Ni, Ray >>>>>>> Subject: Re: [edk2-devel] [PATCH] MdeModulePkg/UsbBusDxe: >>>> Rebuild >>>>>> the >>>>>>> description table after Reset Device >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Jiang, Guomin >>>>>>>> Sent: Saturday, April 25, 2020 9:36 AM >>>>>>>> To: devel@edk2.groups.io >>>>>>>> Cc: Wang, Jian J; Wu, Hao A; Ni, Ray >>>>>>>> Subject: [PATCH] MdeModulePkg/UsbBusDxe: Rebuild the >>>> description >>>>>>>> table after Reset Device >>>>>>>> >>>>>>>> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=2694 >>>>>>>> >>>>>>>> When the USB fail and then Reset Device, it should rebuild >>> description. >>>>>>>> >>>>>>>> Signed-off-by: Guomin Jiang >>>>>>>> Cc: Jian J Wang >>>>>>>> Cc: Hao A Wu >>>>>>>> Cc: Ray Ni >>>>>>>> --- >>>>>>>> MdeModulePkg/Bus/Usb/UsbBusDxe/UsbBus.c | 5 +++++ >>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>> >>>>>>>> diff --git a/MdeModulePkg/Bus/Usb/UsbBusDxe/UsbBus.c >>>>>>>> b/MdeModulePkg/Bus/Usb/UsbBusDxe/UsbBus.c >>>>>>>> index 4b4915c019..9f2d2cc87f 100644 >>>>>>>> --- a/MdeModulePkg/Bus/Usb/UsbBusDxe/UsbBus.c >>>>>>>> +++ b/MdeModulePkg/Bus/Usb/UsbBusDxe/UsbBus.c >>>>>>>> @@ -869,6 +869,11 @@ UsbIoPortReset ( >>>>>>>> >>>>>>>> >>>>>>>> DEBUG (( EFI_D_INFO, "UsbIoPortReset: device is now >>> ADDRESSED >>>>>>>> at %d\n", Dev->Address)); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> + // >>>>>>>> >>>>>>>> + // The description will be invalid after reset, should >>>>>>>> + rebuild it as >>> well. >>>>>>>> >>>>>>>> + // >>>>>>>> >>>>>>>> + UsbBuildDescTable (Dev); >>>>>>> >>>>>>> >>>>>>> Hello Guomin, >>>>>>> >>>>>>> Thanks for the proposed patch. >>>>>>> >>>>>>> Could you help to explain in more detail for the above fix >>>>>>> with regard to the transfer ring not being set properly in the XHCI >> driver? >>>> Thanks. >>>>> >>>>> When I debug, I dump the below data structure and found that >>>>> before UsbMassReset, the corresponding slot transfer ring is >>>>> normal and invalid after UsbMassReset. >>>>> USB_DEV_CONTEXT = 0x148 >>>>> +0x0 Enabled >>>>> +0x1 SlotId >>>>> +0x4 RouteString >>>>> +0x8 ParentRouteString >>>>> +0xC XhciDevAddr >>>>> +0xD BusDevAddr >>>>> +0x10 InputContext >>>>> +0x18 OutputContext >>>>> +0x20 EndpointTransferRing[31] >>>>> +0x118 DevDesc >>>>> +0x130 ConfDesc >>>>> +0x138 ActiveConfiguration >>>>> +0x140 ActiveAlternateSetting >>>>> >>>>> Before UsbMassReset >>>>> 000002D0: -01 01 00 00 00 00 10 10 >>>>> 000002E0: 00 00 00 00 01 01 00 00-80 02 FA 06 00 00 00 00 >>>>> 000002F0: C0 16 FA 06 00 00 00 00-98 27 FC 06 00 00 00 00 >>>>> 00000300: 00 00 00 00 00 00 00 00-98 26 FC 06 00 00 00 00 >>>>> 00000310: 98 BE F9 06 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000320: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000330: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000340: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000350: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000360: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000370: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000380: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000390: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003A0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003B0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003C0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003D0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003E0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003F0: 12 01 00 03 00 00 00 09-F4 46 01 00 00 00 01 02 >>>>> 00000400: 03 01 00 00 00 00 00 00-98 CD F9 06 00 00 00 00 >>>>> 00000410: 01 00 00 00 00 00 00 00-18 24 FC 06 00 00 00 00 >>>>> >>>>> After UsbMassReset >>>>> 000002D0: -01 01 00 00 00 00 10 10 >>>>> 000002E0: 00 00 00 00 01 01 00 00-80 02 FA 06 00 00 00 00 >>>>> 000002F0: C0 16 FA 06 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000300: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000310: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000320: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000330: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000340: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000350: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000360: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000370: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000380: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000390: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003A0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003B0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003C0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003D0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003E0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 000003F0: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000400: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00000410: 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> >>>>> After Reset Finished, it will try to get data again by sending >>>>> command to Usb Mass Device and it will use USB_DEV_CONTEXT. >>>> EndpointTransferRing[4]. >>>>> But it haven't been initialized after reset, and ASSERT will >>>>> trigger when access the USB_DEV_CONTEXT. EndpointTransferRing and >>>>> will >>> show >>>>> ASSERT >>>>> >>>> >>> >> c:\users\guominji\dcg10nm\edk2\MdeModulePkg\Bus\Pci\XhciDxe\XhciSch >>>> e >>>>> d.c(1909): TrsRing != ((void *) 0) >>>>> >>>>> The EndpointTranferRing should be initialized by >>>>> UsbBuildDescTable() but seem that it is omitted in Reset branch, so I add >> it. >>>> >>>> >>>> Sorry Guomin, I still do not fully understand the link between >>>> calling the >>>> UsbBuildDescTable() in UsbBusDxe and the re-initialization of the >>>> 'EndpointTranferRing' in the XhciDxe. Could you help to provide more >> detail? >>> >>> I am sorry that make confusion to you, I will explain it in detail now. >>> 1. The critical call flow as below >>> UsbMassReset() >>> --> UsbMass->Transport->Reset() == UsbBotResetDevice() >>> --> UsbBot->UsbIo->UsbPortReset() == UsbIoPortReset() >>> --> UsbIf->Device->ParentIf->HubApi->ResetPort() == >>> UsbRootHubResetPort() >>> --> UsbHcGetRootHubPortStatus() >>> --> UsbBus->Usb2Hc-GetRootHubPortStatus() == >>> XhcGetRootHubPortStatus() >>> --> XhcPollPortStatusChange() >>> --> XhcDisableSlotCmd() or >>> XhcDisableSlotCmd64() [Very very critical routine 1] >>> --> XhcInitializeDeviceSlot() or >>> XhcInitializeDeviceSlot64() [Very very critical routine 2] >>> --> UsbSetAddress() >>> --> UsbSetConfig() >>> --> UsbBus->Usb2Hc->ControlTransfer() == XhcControlTransfer() >>> --> XhcSetConfigCmd() or XhcSetConfigCmd64() >>> [Very very critical routine 3] >>> >>> There are three very very critical routine, 1. XhcDisableSlotCmd() or >>> XhcDisableSlotCmd64() will disable the slot context and free the >>> allocated memory. >>> 2. XhcInitializeDeviceSlot() or XhcInitializeDeviceSlot() will >>> initialize the basic control and slot endpoint transfer ring. >>> 3. XhcSetConfigCmd() or XhcSetConfigCmd64() will configure the other >>> transfer ring. It depend on Xhc- >>>> UsbDevContext[Slot].DevDesc.NumConfigurations, unfortunately, the >>> DevDesc haven't benn initialized by UsbBuildDescTable and it is 0. So >>> XhcSetConfigCmd() or XhcSetConfigCmd64() haven't run. >>> >>> After run below 3 critical routine, the device is initialize but >>> EndpointTransferRing of Input and Output haven't been initialized and >>> still keep NULL. >>> >>> When UsbMassReadBlocks, it will use uninitialized EndpointTransferRing >>> and will ASSERT. >> >> >> Thanks a lot Guomin, >> >> Now I understand the reason behind for the proposed fix. >> >> I would suggest to add an abstract of the above analysis in the commit >> message, such information will be helpful for others who might work on >> USB/Xhci in the future. >> >> >>> >>>> Another thing I found (if current proposed fix is a proper solution) >>>> is that in >>>> UsbBuildDescTable() , several memory allocations will be made to >>>> store the device descriptor/config descriptor. Could you help to >>>> double check if their old values are properly freed in order to avoid >> memory leak? >>>> >>> >>> I will investigate it and give you feedback. >> >> >> Thanks for this. >> >> Best Regards, >> Hao Wu >> >> >>> >>>> Best Regards, >>>> Hao Wu >>>> >>>> >>>>> >>>>>>> >>>>>>> Also, judging from the function description comments in >>>>>> UsbBuildDescTable(): >>>>>>> |> /** >>>>>>> |> Build the whole array of descriptors. This function must >>>>>>> |> be called after UsbGetMaxPacketSize0 returns the max packet >>>>>>> |> size correctly for endpoint 0. >>>>>>> |> ... >>>>>>> |> **/ >>>>>>> |> EFI_STATUS >>>>>>> |> UsbBuildDescTable ( >>>>>>> |> IN USB_DEVICE *UsbDev >>>>>>> |> ) >>>>>>> >>>>>>> Does function UsbGetMaxPacketSize0() need to be called before >>>>>>> UsbBuildDescTable() in the proposed fix? >>>>> >>>>> I ignored it and will double check it. >>>>> >>>>>> >>>>>> >>>>>> One more thing, could you help to add the information for what >>>>>> tests have been done for the proposed patch as well? >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>> >>>>> Have test OVMF and test pass, detail as below I use >>>>> qemu-w64-2020020, you should install it first and add it to PATH >>>>> environment, the procedure as below 1. Import the test patch from >>>>> https://bugzilla.tianocore.org/attachment.cgi?id=508 >>>>> 2. type ```build -p OvmfPkg\OvmfPkgX64.dsc -b DEBUG -a X64``` to >>>>> generate OVMF.fd 3. type ```qemu-img create file.img 1G``` to >>>>> create test image. >>>>> 4. type ```qemu-system-x86_64 -bios OVMF.fd -drive >>>>> if=none,id=stick,file=file.img -device nec-usb-xhci,id=xhci, >>>>> -device >>>>> usb- storage,bus=xhci.0,drive=stick -global >>>>> isa-debugcon.iobase=0x402 >>>>> - debugcon file:xhci.log``` 5. you will see hang currently and >>>>> will see ASSERT when check xhci.log. >>>>> 6. import proposal patch, the symptom disappear. >>>>> >>>>>> >>>>>>> >>>>>>> Best Regards, >>>>>>> Hao Wu >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> + >>>>>>>> >>>>>>>> // >>>>>>>> >>>>>>>> // Reset the current active configure, after this device >>>>>>>> >>>>>>>> // is in CONFIGURED state. >>>>>>>> >>>>>>>> -- >>>>>>>> 2.25.1.windows.1 >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >