public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: aaron.young@oracle.com
To: Laszlo Ersek <lersek@redhat.com>,
	Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	Brijesh Singh <brijesh.singh@amd.com>,
	pjones@redhat.com, Aaron Young <aaron.young@oracle.com>
Subject: Re: Regression with PXE boot on OvmfPkg/VirtioNetDxe driver on aarch64 system
Date: Mon, 15 Oct 2018 13:52:45 -0700	[thread overview]
Message-ID: <85d38be2-fb2f-669d-cc39-3fb6a8d09842@oracle.com> (raw)
In-Reply-To: <dcc618fa-a9bf-3d57-5f57-5ae9e9c3148a@redhat.com>



On 10/04/18 06:06, Laszlo Ersek wrote:
> On 10/04/18 11:24, Leif Lindholm wrote:
>> +Peter
>>
>> On Wed, Oct 03, 2018 at 04:59:54PM -0700, aaron.young@oracle.com wrote:
>>>   I am suspecting that this patch to GRUB is the cause of a Buffer being
>>> re-transmitted before reaping the Buffer via SNP->GetStatus():
>>>
>>> https://git.centos.org/blob/rpms!grub2.git/1065bd29e776aef83f927747882140dcb6fd5cde/SOURCES!0183-efinet-retransmit-if-our-device-is-busy.patch
>>>
>>>   So, to reproduce the issue, the GRUB used via PXE boot needs to include
>>> this patch.
>> So the issue cannot be reproduced with upstream GRUB?
>>
>> Does Fedora/Red Hat include the same patch?
> Here's what I can see.
>
> (1) In upstream grub <https://git.savannah.gnu.org/git/grub.git>, at
> commit 8ada906031d9 ("msdos: Fix overflow in converting partition start
> and length into 512B blocks", 2018-09-27), on the master branch, the
> patch is not present.
>
> (2) In "rhboot" grub2 <https://github.com/rhboot/grub2>, where the
> master branch seems to track upstream grub, the patch is present on at
> least the "fedora-28" and "rhel-7.5" branches. Commit hashes,
> respectively: c2b126f52143, 1b9767c136082.
>
> (3) In the commit message, Josef wrote, "When I fixed the txbuf handling
> I ripped out the retransmission code". I think he referred to his
> earlier commit 4fe8e6d4a127 ("efinet: handle get_status() on buggy
> firmware properly", 2015-08-09). That commit is upstream.
>
> In my opinion, commit 4fe8e6d4a127, the chronologically first, and
> upstream, tweak, was right (assuming the comment it added was true,
> about grub).
>
> On the other hand, the downstream-only (chronologically 2nd) commit was
> wrong. Not only did it break the spec, it broke even grub's own internal
> invariant, described in the comment that was added in upstream commit
> 4fe8e6d4a127. The comment states, "we only transmit one packet at a
> time". With the downstream-only tweak applied, that's no longer true.
> Namely, SNP.Transmit() is called while we know another transmission is
> pending on the egress queue. That's the definition of sending more than
> one packet at a time.
>
> I'm curious why the patch in question is not upstream. Was it submitted
> and rejected? Submitted and ignored? Not submitted at all?
>
> I'm not a fan of the hard-liner "spec above everything" approach. In
> this case though, after the downstream-only tweak, grub is inconsistent
> not only with the spec, but with itself too.
>
> IMO, downstream(s) should revert the downstream-only patch.
>
> Thanks,
> Laszlo

I have confirmed that reverting this GRUB patch indeed fixes the issue 
(i.e. with the VirtioNetDxe/SnpSharedHelpers.c(184) ASSERT).

   Thanks for the help/info in resolving this issue.

   As a follow up question - it seems that the VirtioNetDxe driver is 
fragile in that it can get into broken state if (for whatever reason) a 
tx buffer is not successfully transmitted and thus never shows back up 
on the Used Ring. i.e. If a SNP client has repeatedly called 
SNP-GetStatus() and, after a certain amount of time, fails gets back the 
buffer, what should it do? If it attempts to re-transmit the buffer, 
it'll hit the ASSERT. Perhaps it should shutdown/re-init the interface 
in this case (to free up the buffer mapping in Dev->TxBufCollection)? 
Or, are we confident this condition can _never_ happen?

   Thanks again,

   -Aaron






  reply	other threads:[~2018-10-15 20:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30 20:17 Regression with PXE boot on OvmfPkg/VirtioNetDxe driver on aarch64 system aaron.young
2018-08-31 13:17 ` Laszlo Ersek
2018-10-03 16:56   ` aaron.young
2018-10-03 17:35     ` Ard Biesheuvel
2018-10-03 23:59       ` aaron.young
2018-10-04  9:24         ` Leif Lindholm
2018-10-04 13:06           ` Laszlo Ersek
2018-10-15 20:52             ` aaron.young [this message]
2018-10-16  1:55               ` Ard Biesheuvel
2018-10-17  9:50                 ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85d38be2-fb2f-669d-cc39-3fb6a8d09842@oracle.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox