From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=2607:f8b0:4864:20::12a; helo=mail-it1-x12a.google.com; envelope-from=ard.biesheuvel@linaro.org; receiver=edk2-devel@lists.01.org Received: from mail-it1-x12a.google.com (mail-it1-x12a.google.com [IPv6:2607:f8b0:4864:20::12a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5FEAF21164899 for ; Mon, 15 Oct 2018 18:55:23 -0700 (PDT) Received: by mail-it1-x12a.google.com with SMTP id i76-v6so30318879ita.3 for ; Mon, 15 Oct 2018 18:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=mYKFjUHbcOsd1weJpZvWzImSB6hQY0ec47dGkl9jKmA=; b=JXLfW5G2dXF66erxcnqqmVPte5W8q5bBOxrH/vieip9l5lzvlvVUtKUIP/AmuzyPMW OFDNeMWZy7/hmNts57g+rLaDyk/CdblNAM2y7QJ1Hpp5NT9F7Jd87MYcBPotMerQWBlT ktPZ6bfGviYSyDBv+kMxZALkjoHIeFIWgbbUw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=mYKFjUHbcOsd1weJpZvWzImSB6hQY0ec47dGkl9jKmA=; b=UDdXDMIp9Hq39p8GSKJ2OiUFuKL7bAzPgyd0QdAwWg+4cgQyxrqvQlCNG/Ny3jqXmf DNUNHY8I/F3nrfd2N3cujs3/Moh3tb55TzFG6bIbchhZ2SObtePMAtOO4SHLQrcxOOuM kVjrC/nU5F8KQ2Qlxu3qCCTl6sP8+zC49hTA1z58SzIvfNfAQJofT3mvStDLinkJk8BH kGQN8l0+KDXzNC9I3NBlDK6x39IFaUsh6AoU1d5QmJYqiDLGzzSRLaril16Am+QItrzV HaBBeWIjlJKemLKVduSNwd3mmCJC082tTBspto3/LFi8dEyHaJaty7mB9CJtIKVLpWwM A2CA== X-Gm-Message-State: ABuFfohqrUwiEA+hQmddeZqFcF7V41dlROHir5ISIEds7x2czkNZOWJ+ 4YZ/gogLCYPTCBIFoxYYJmEuenTH+A2+aWlrzq0h7g== X-Google-Smtp-Source: ACcGV620ov9yq7aH0vnCmPBFjyHbCK9SDBqi1KwYKq5tZ+vBr9pWe1dAPE3zHiROfCR6lYAKW1dHsDSTA33f63k3C+c= X-Received: by 2002:a05:660c:383:: with SMTP id x3mr12967260itj.121.1539654922817; Mon, 15 Oct 2018 18:55:22 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:5910:0:0:0:0:0 with HTTP; Mon, 15 Oct 2018 18:55:22 -0700 (PDT) In-Reply-To: <85d38be2-fb2f-669d-cc39-3fb6a8d09842@oracle.com> References: <3b4a4980-42d0-51a0-eccd-ceba83b6c78f@oracle.com> <20181004092411.4bgio3ewvmwkfuyq@bivouac.eciton.net> <85d38be2-fb2f-669d-cc39-3fb6a8d09842@oracle.com> From: Ard Biesheuvel Date: Tue, 16 Oct 2018 03:55:22 +0200 Message-ID: To: aaron.young@oracle.com Cc: Laszlo Ersek , Leif Lindholm , "edk2-devel@lists.01.org" , Brijesh Singh , Peter Jones Subject: Re: Regression with PXE boot on OvmfPkg/VirtioNetDxe driver on aarch64 system X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2018 01:55:24 -0000 Content-Type: text/plain; charset="UTF-8" On 15 October 2018 at 22:52, wrote: > > > On 10/04/18 06:06, Laszlo Ersek wrote: >> >> On 10/04/18 11:24, Leif Lindholm wrote: >>> >>> +Peter >>> >>> On Wed, Oct 03, 2018 at 04:59:54PM -0700, aaron.young@oracle.com wrote: >>>> >>>> I am suspecting that this patch to GRUB is the cause of a Buffer being >>>> re-transmitted before reaping the Buffer via SNP->GetStatus(): >>>> >>>> >>>> https://git.centos.org/blob/rpms!grub2.git/1065bd29e776aef83f927747882140dcb6fd5cde/SOURCES!0183-efinet-retransmit-if-our-device-is-busy.patch >>>> >>>> So, to reproduce the issue, the GRUB used via PXE boot needs to >>>> include >>>> this patch. >>> >>> So the issue cannot be reproduced with upstream GRUB? >>> >>> Does Fedora/Red Hat include the same patch? >> >> Here's what I can see. >> >> (1) In upstream grub , at >> commit 8ada906031d9 ("msdos: Fix overflow in converting partition start >> and length into 512B blocks", 2018-09-27), on the master branch, the >> patch is not present. >> >> (2) In "rhboot" grub2 , where the >> master branch seems to track upstream grub, the patch is present on at >> least the "fedora-28" and "rhel-7.5" branches. Commit hashes, >> respectively: c2b126f52143, 1b9767c136082. >> >> (3) In the commit message, Josef wrote, "When I fixed the txbuf handling >> I ripped out the retransmission code". I think he referred to his >> earlier commit 4fe8e6d4a127 ("efinet: handle get_status() on buggy >> firmware properly", 2015-08-09). That commit is upstream. >> >> In my opinion, commit 4fe8e6d4a127, the chronologically first, and >> upstream, tweak, was right (assuming the comment it added was true, >> about grub). >> >> On the other hand, the downstream-only (chronologically 2nd) commit was >> wrong. Not only did it break the spec, it broke even grub's own internal >> invariant, described in the comment that was added in upstream commit >> 4fe8e6d4a127. The comment states, "we only transmit one packet at a >> time". With the downstream-only tweak applied, that's no longer true. >> Namely, SNP.Transmit() is called while we know another transmission is >> pending on the egress queue. That's the definition of sending more than >> one packet at a time. >> >> I'm curious why the patch in question is not upstream. Was it submitted >> and rejected? Submitted and ignored? Not submitted at all? >> >> I'm not a fan of the hard-liner "spec above everything" approach. In >> this case though, after the downstream-only tweak, grub is inconsistent >> not only with the spec, but with itself too. >> >> IMO, downstream(s) should revert the downstream-only patch. >> >> Thanks, >> Laszlo > > > I have confirmed that reverting this GRUB patch indeed fixes the issue (i.e. > with the VirtioNetDxe/SnpSharedHelpers.c(184) ASSERT). > > Thanks for the help/info in resolving this issue. > > As a follow up question - it seems that the VirtioNetDxe driver is fragile > in that it can get into broken state if (for whatever reason) a tx buffer is > not successfully transmitted and thus never shows back up on the Used Ring. > i.e. If a SNP client has repeatedly called SNP-GetStatus() and, after a > certain amount of time, fails gets back the buffer, what should it do? If it > attempts to re-transmit the buffer, it'll hit the ASSERT. Perhaps it should > shutdown/re-init the interface in this case (to free up the buffer mapping > in Dev->TxBufCollection)? Or, are we confident this condition can _never_ > happen? > It is an implementation detail of GRUB that it only ever uses a single TX buffer: the protocol supports an arbitrary number of buffers, and repeated calls to GetStatus() will return each one once the network stack is done with them. So the answer is simply to allocate another buffer, and use that.