public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Fu, Siyuan" <siyuan.fu@intel.com>
To: "Tomas Pilar (tpilar)" <tpilar@solarflare.com>,
	"Wu, Jiaxin" <jiaxin.wu@intel.com>,
	Laszlo Ersek <lersek@redhat.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: "Ye, Ting" <ting.ye@intel.com>
Subject: Re: Network Stack Budgeting
Date: Tue, 29 Jan 2019 13:06:25 +0000	[thread overview]
Message-ID: <B1FF2E9001CE9041BD10B825821D5BC58B6E155B@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <197303c9-8af5-6092-afe1-404cf86dd800@solarflare.com>

Hi, Tomas

> -----Original Message-----
> From: Tomas Pilar (tpilar) [mailto:tpilar@solarflare.com]
> Sent: Tuesday, January 29, 2019 6:54 PM
> To: Fu, Siyuan <siyuan.fu@intel.com>; Wu, Jiaxin <jiaxin.wu@intel.com>; Laszlo
> Ersek <lersek@redhat.com>; edk2-devel@lists.01.org
> Cc: Ye, Ting <ting.ye@intel.com>
> Subject: Re: [edk2] Network Stack Budgeting
> 
> 
> 
> On 29/01/2019 03:20, Fu, Siyuan wrote:
> > Hi, Tomas
> >
> >> -----Original Message-----
> >> From: Tomas Pilar (tpilar) [mailto:tpilar@solarflare.com]
> >> Sent: Monday, January 28, 2019 7:24 PM
> >> To: Fu, Siyuan <siyuan.fu@intel.com>; Wu, Jiaxin <jiaxin.wu@intel.com>;
> Laszlo
> >> Ersek <lersek@redhat.com>; edk2-devel@lists.01.org
> >> Cc: Ye, Ting <ting.ye@intel.com>
> >> Subject: Re: [edk2] Network Stack Budgeting
> >>
> >> Hi Siyuan,
> >>
> >> I had to do some tweaks due to the fact that people in the wild want to PXE
> >> boot whole WDS installations which are upwards of 300MB.
> >>
> >> I have an internal set of queues where the NIC receives into. Snp.Receive()
> >> will get a packet from this store. I had to implement a separate queue for
> >> unicast/multicast/broadcast traffic so that I could reprioritise and
> deliver
> >> unicast traffic while under ARP spam. I tied internal poll to receive and
> >> transmit, so Snp.Receive() will poll the NIC and then dequeue a packet
> while
> >> Snp.Transmit() will transmit the packet and then poll the NIC to get a
> >> completion. I also implemented a busy poll for 100us or until the next
> packet
> >> comes in when the NIC is polled.
> > I see some problems here. Both Snp.Receive() and Snp.Transmit() should be
> non-blocking
> > interface in current design and UEFI spec, but it's not true in your driver.
> > For Receive(), it could poll the NIC only once and return either a received
> packet
> > or an error, it should not do any blocking poll to wait for a packet.
> On some platforms (tftp clients), if the Snp.Receive() returns EFI_NOT_READY
> they do not poll the NIC until the next MnpSystemPoll(). Hence the delay
> before returning EFI_NOT_READY in case the packet is still incoming.

OK, that's really not good practice.

> > For Transmit(), it just need to put the packet into the TX queue and return
> directly,
> > without checking if the TX is completed.
> > Instead of that, Snp.GetStatus() should be
> > used to check if the packet has really been sent out.
> I've seen quite a few platforms where GetStatus was only called by the
> MnpSystemPoll() and the application would not call it. This means our hardware
> rings are stuffed with completions and we stop being able to transmit. Yes,
> you could say that we should solve the application, but unfortunately I can't
> do that.

The application doesn't need to call Snp.GetStatus() to recycle the TX buffer, the
Mnp.Transmit() will do this automatically when it found the SNP's TX buffer is full.
See MnpSyncSendPacket() and MnpRecycleTxBuf(). 

> >
> > May I know how did you control the 100us timeout of the busy poll (by which
> API?),
> > and what's the TPL of this busy poll?
> >
> Create a timeout timer with value of 1000. Then busy poll in a loop until
> packet is received or a timer is triggered.
> This runs at TPL_CALLBACK while the EfxPoll function itself runs at TPL_NOTIFY.
> 
> --
>   /* Set the poll moderation timeout to 100us */
>   gBS->SetTimer(NetDev->PollModeratorTimer, TimerRelative, 1000);
> 
>   while (!InterruptStatus) {
>     EfxPoll(NULL, NetDev->Efx);
>     InterruptStatus = EfxStatus(NetDev->Efx);
> 
>     if (gBS->CheckEvent(NetDev->PollModeratorTimer))
>       return InterruptStatus;
>   }
> 
>   return InterruptStatus;
> --

This is exactly the case what I worried about. Some platform has a minimum timer
interval (like 1ms or 10ms), so if you set a timer interval shorter than that,
it will actually be signaled after the minimum value. Please check if your
platform could support a such a short timeout (100us).

> 
> >> Otherwise I have seen a pathological case where the TFTP client would pull
> one
> >> packet per second but the TFTP_RETRANSMIT interval on the server was also
> one
> >> second and everything was awful.
> > Why TFTP client just pull one packet per second? I think it’s not correct
> and it
> > could use the poll() function to accelerate the receive. Why you trying to
> solve a
> > TFTP layer problem in SNP layer? It breaks the design principle of network
> layer model.
> Yes, I appreciate the principle. However, in practice we don't get to sell
> adapters that do not PXE boot and it's pointless to argue with customers that
> they have a rubbish TFTP client implementation. So we put in workarounds into
> the driver.

OK I understand your situation now. Technically it will be better to fix the
problem in TFTP client, so I still suggest you to confirm this with your customer
or the TFTP application owner. This workaround in your SNP driver will put you
in the risk that a normal application which follow UEFI spec may not work correctly
with you, so I think you should only consider this method if you can confirm
the TFTP client is the only application layer client you need to serve.

Best Regards,
Siyuan

> >> In theory the busy poll moderator might cause an issue in a pathological
> case,
> >> if it stalls the machine for 100us for each packet - with 100 packets per
> >> second this would eat only 10ms.
> > Not only 10ms, it's at least 10ms, and in theory. If you have multiple NICs
> in the
> > system, this time will double. If someone tries to poll the NIC, the time
> will increase
> > significantly. It also depends on who you measure the 100us timeout, I did
> see some
> > platform which has a minimum timer interval limitation, that you thought you
> are using
> > a 100us timeout but it’s actually 1ms or even 10ms.
> >
> > So my suggestion is do NOT add any busy wait in SNP layer API, keep it as
> unblocking,
> > and the try to solve the application layer problem in application layer.
> I'll try dropping the busy poll and see whether that helps
> 
> Cheers,
> Tom
> >
> > Best Regards,
> > Siyuan
> >
> >> Cheers,
> >> Tom
> >>
> >> On 27/01/2019 14:28, Fu, Siyuan wrote:
> >>> Hi, Tom
> >>>
> >>>> -----Original Message-----
> >>>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> >> Tomas
> >>>> Pilar (tpilar)
> >>>> Sent: Friday, January 25, 2019 8:09 PM
> >>>> To: Wu, Jiaxin <jiaxin.wu@intel.com>; Laszlo Ersek <lersek@redhat.com>;
> >> edk2-
> >>>> devel@lists.01.org
> >>>> Cc: Ye, Ting <ting.ye@intel.com>
> >>>> Subject: Re: [edk2] Network Stack Budgeting
> >>>>
> >>>> Yeah, that makes sense I suppose. The end result is however that the
> >> network
> >>>> device is 'opened' as soon as ConnectController() is called rather than
> >> when
> >>>> someone wants to do networking. This seems wrong and directly leads to a
> >> DoS
> >>>> of a platform in case of heavy network load (unless we implement
> budgeting
> >> in
> >>>> the network driver, which is a really not what it should be doing).
> >>> It's true that a configured MNP will start a background system poll to
> >> receive
> >>> network packets, but normally it won't lead to a DoS, because the receive
> >> frequency
> >>> is actually very low. The MNP driver only tries to receive 1 packet from
> SNP
> >> for every
> >>> 10 milliseconds (see MNP_SYS_POLL_INTERVAL, MnpSystemPoll() and related
> >> timer event).
> >>> So no matter how heavy the network is, MNP driver (and upper layer stack)
> >> will
> >>> only process at most 100 packets per second, all other packets should be
> >> dropped
> >>> by UNDI driver directly.
> >>>
> >>> So if there is nobody doing busy network receiving, the network stack cost
> >> will be
> >>> at most 100 packets per NIC device and per second.
> >>>
> >>> In your first email you mentioned "some performance improvements to my
> >> network driver",
> >>> may I know what's the improvement method you are using?
> >>>
> >>> Best Regards,
> >>> Siyuan
> >>>
> >>>> How do you suggest we solve this problem?
> >>>>
> >>>> Cheers,
> >>>> Tom
> >>>>
> >>>> On 25/01/2019 08:44, Wu, Jiaxin wrote:
> >>>>> Hi Tom,
> >>>>>
> >>>>> One thing I want to highlight is that our design of network stack is not
> >>>> only for the PXE/HTTP boot trigged in BootManager, but also to make sure
> >> it's
> >>>> workable once there is any MNP instance configured by upper drivers
> >>>> (ARP/IPv4/IPv6).
> >>>>> Take ARP/IP as an example, once ARP/IP are started, we need a heartbeat
> to
> >>>> process any ARP requests, which is required by ARP functionality. In such
> a
> >>>> case, SNP must be started to make ARP/IP drivers works well.
> >>>>> Thanks,
> >>>>> Jiaxin
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> >>>>>> Tomas Pilar (tpilar)
> >>>>>> Sent: Friday, January 25, 2019 1:43 AM
> >>>>>> To: Laszlo Ersek <lersek@redhat.com>; edk2-devel@lists.01.org
> >>>>>> Subject: Re: [edk2] Network Stack Budgeting
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 24/01/2019 16:49, Laszlo Ersek wrote:
> >>>>>>> On 01/24/19 14:25, Tomas Pilar (tpilar) wrote:
> >>>>>>>> Hmm,
> >>>>>>>>
> >>>>>>>> Mnp->Configure() will eventually call MnpStartSnp().
> >>>>>>>>
> >>>>>>>> A grep for Mnp->Configure shows that:
> >>>>>>>> * ArpDxe performs this on DriverBinding.Start()
> >>>>>>>> * Ip6Dxe performs this on DriverBinding.Start()
> >>>>>>>>
> >>>>>>>> Ipv4 and DnsDhcp do this as a part of their Configure() they expose
> in
> >>>> the
> >>>>>> API.
> >>>>>>> Yes, that makes sense. All of the above drivers are UEFI drivers that
> >>>>>>> follow the UEFI driver model, AIUI. As long as the controller is not
> >>>>>>> connected from BDS, no BindingStart() function should be called in
> these.
> >>>>>> Ah, but I would expect the BDS to call ConnectController() on the NIC,
> >> but
> >>>> I
> >>>>>> would not expect the network stack to be started unless the device is
> >>>>>> actually chosen for PXE booting. In other words, the above protocols
> >> should
> >>>>>> follow the example of EFI_DNS4_PROTOCOL that binds against the device
> >>>>>> during BDS but .Configure() is not automatically called by
> >>>>>> DriverBinding.Start().
> >>>>>>
> >>>>>> .Configure() should be called by the BootManager if networking is
> >> actually
> >>>> to
> >>>>>> be done. That in turn will configure Mnp and start Snp.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Tom
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> edk2-devel mailing list
> >>>>>> edk2-devel@lists.01.org
> >>>>>> https://lists.01.org/mailman/listinfo/edk2-devel
> >>>> _______________________________________________
> >>>> edk2-devel mailing list
> >>>> edk2-devel@lists.01.org
> >>>> https://lists.01.org/mailman/listinfo/edk2-devel


  reply	other threads:[~2019-01-29 13:06 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 10:55 Network Stack Budgeting Tomas Pilar (tpilar)
2019-01-23 14:14 ` Ye, Ting
2019-01-23 14:51   ` Tomas Pilar (tpilar)
2019-01-23 16:08 ` Laszlo Ersek
2019-01-23 16:27   ` Tomas Pilar (tpilar)
2019-01-23 17:47     ` Andrew Fish
2019-01-23 22:18     ` Laszlo Ersek
2019-01-24 11:37       ` Tomas Pilar (tpilar)
2019-01-24 12:25         ` Laszlo Ersek
2019-01-24 12:58           ` Tomas Pilar (tpilar)
2019-01-24 13:25             ` Tomas Pilar (tpilar)
2019-01-24 16:49               ` Laszlo Ersek
2019-01-24 17:43                 ` Tomas Pilar (tpilar)
2019-01-25  8:44                   ` Wu, Jiaxin
2019-01-25 12:08                     ` Tomas Pilar (tpilar)
2019-01-27 14:28                       ` Fu, Siyuan
2019-01-28 11:24                         ` Tomas Pilar (tpilar)
2019-01-29  3:20                           ` Fu, Siyuan
2019-01-29 10:54                             ` Tomas Pilar (tpilar)
2019-01-29 13:06                               ` Fu, Siyuan [this message]
2019-01-29 13:12                                 ` Tomas Pilar (tpilar)
2019-01-29 13:42                               ` Laszlo Ersek
2019-01-29 13:52                                 ` Tomas Pilar (tpilar)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B1FF2E9001CE9041BD10B825821D5BC58B6E155B@SHSMSX103.ccr.corp.intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox