From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=141.146.126.78; helo=aserp2120.oracle.com; envelope-from=aaron.young@oracle.com; receiver=edk2-devel@lists.01.org Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id BDBA921167463 for ; Mon, 15 Oct 2018 13:52:54 -0700 (PDT) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w9FKmsvl184178; Mon, 15 Oct 2018 20:52:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=ih7p4umkIsL0WWi49tDBvHTvGgixHYIxvLOkocv4oC4=; b=H8MVzmtaxCimDZISzzcA1J5Jhaf9eNH+SIoDZFFZCtbTwUBm+puZbJwgcC5jNEs2gAZH JHWkaZxXJH/xxgcfzMPy9W+rYLrMGz3yfNQJ3Z/bZK8gCbL4Vibo/ogXtZyMHz0EAVtQ i91IPwFSDbcmaibVtUEpV+/Om4JZsqK4CBKDE1HzJLPJG6MFjxi+Qw0Jk765dgb0cVcT 0NpCXaeXTTgIVBMmCBcVk0jXeFMRgy1c9jLiAhHULbiHda9tHYyTSqeW0wj6y65p1cL8 Ov6pSBKkAWgUfz3NOajtYE/NeipDpsQwpgMKbbVPnWOATpOsqtkRy7vbdHKIP9PPOCvU Uw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2n38npvub7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 Oct 2018 20:52:50 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w9FKqnmo024859 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 Oct 2018 20:52:49 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w9FKqnnf008634; Mon, 15 Oct 2018 20:52:49 GMT Received: from [10.39.251.180] (/10.39.251.180) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 15 Oct 2018 13:52:48 -0700 To: Laszlo Ersek , Leif Lindholm Cc: Ard Biesheuvel , "edk2-devel@lists.01.org" , Brijesh Singh , pjones@redhat.com, Aaron Young References: <3b4a4980-42d0-51a0-eccd-ceba83b6c78f@oracle.com> <20181004092411.4bgio3ewvmwkfuyq@bivouac.eciton.net> From: aaron.young@oracle.com Organization: Oracle Corporation Message-ID: <85d38be2-fb2f-669d-cc39-3fb6a8d09842@oracle.com> Date: Mon, 15 Oct 2018 13:52:45 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9047 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1810150179 Subject: Re: Regression with PXE boot on OvmfPkg/VirtioNetDxe driver on aarch64 system X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2018 20:52:55 -0000 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US On 10/04/18 06:06, Laszlo Ersek wrote: > On 10/04/18 11:24, Leif Lindholm wrote: >> +Peter >> >> On Wed, Oct 03, 2018 at 04:59:54PM -0700, aaron.young@oracle.com wrote: >>> I am suspecting that this patch to GRUB is the cause of a Buffer being >>> re-transmitted before reaping the Buffer via SNP->GetStatus(): >>> >>> https://git.centos.org/blob/rpms!grub2.git/1065bd29e776aef83f927747882140dcb6fd5cde/SOURCES!0183-efinet-retransmit-if-our-device-is-busy.patch >>> >>> So, to reproduce the issue, the GRUB used via PXE boot needs to include >>> this patch. >> So the issue cannot be reproduced with upstream GRUB? >> >> Does Fedora/Red Hat include the same patch? > Here's what I can see. > > (1) In upstream grub , at > commit 8ada906031d9 ("msdos: Fix overflow in converting partition start > and length into 512B blocks", 2018-09-27), on the master branch, the > patch is not present. > > (2) In "rhboot" grub2 , where the > master branch seems to track upstream grub, the patch is present on at > least the "fedora-28" and "rhel-7.5" branches. Commit hashes, > respectively: c2b126f52143, 1b9767c136082. > > (3) In the commit message, Josef wrote, "When I fixed the txbuf handling > I ripped out the retransmission code". I think he referred to his > earlier commit 4fe8e6d4a127 ("efinet: handle get_status() on buggy > firmware properly", 2015-08-09). That commit is upstream. > > In my opinion, commit 4fe8e6d4a127, the chronologically first, and > upstream, tweak, was right (assuming the comment it added was true, > about grub). > > On the other hand, the downstream-only (chronologically 2nd) commit was > wrong. Not only did it break the spec, it broke even grub's own internal > invariant, described in the comment that was added in upstream commit > 4fe8e6d4a127. The comment states, "we only transmit one packet at a > time". With the downstream-only tweak applied, that's no longer true. > Namely, SNP.Transmit() is called while we know another transmission is > pending on the egress queue. That's the definition of sending more than > one packet at a time. > > I'm curious why the patch in question is not upstream. Was it submitted > and rejected? Submitted and ignored? Not submitted at all? > > I'm not a fan of the hard-liner "spec above everything" approach. In > this case though, after the downstream-only tweak, grub is inconsistent > not only with the spec, but with itself too. > > IMO, downstream(s) should revert the downstream-only patch. > > Thanks, > Laszlo I have confirmed that reverting this GRUB patch indeed fixes the issue (i.e. with the VirtioNetDxe/SnpSharedHelpers.c(184) ASSERT). Thanks for the help/info in resolving this issue. As a follow up question - it seems that the VirtioNetDxe driver is fragile in that it can get into broken state if (for whatever reason) a tx buffer is not successfully transmitted and thus never shows back up on the Used Ring. i.e. If a SNP client has repeatedly called SNP-GetStatus() and, after a certain amount of time, fails gets back the buffer, what should it do? If it attempts to re-transmit the buffer, it'll hit the ASSERT. Perhaps it should shutdown/re-init the interface in this case (to free up the buffer mapping in Dev->TxBufCollection)? Or, are we confident this condition can _never_ happen? Thanks again, -Aaron