[gPXE] [Etherboot-developers] [PATCH] [tftp] Kick off TFTP RRQ from a process to avoid losing first packet

Tue Jan 12 11:27:27 EST 2010

2010/1/8 Stefan Hajnoczi <stefanha at gmail.com>:
> Thanks for the debugging you have done Thomas.  I looked into this in
> order to understand what dependencies there are between tftp, timer,
> and networking code.  Here is what I've found:
>
> 1. When tftp.c opens a UDP connection to the server, the name resolver
> first interposes itself.  The xfer interface is not connected to a UDP
> socket yet, instead it is connected to the name resolver.
>
> The idea is that name resolution (DNS) occurs and when the IP address
> of the host is determined, the name resolver redirects the xfer to the
> UDP connection with an IP address.
>
> Two things about the name resolver:
> a. It keeps the socket (xfer) window size at zero so tftp.c or any
> other user knows data cannot be sent yet.  Your patch takes advantage
> of this.
> b. Even numeric IP addresses like (10.0.2.1) require the resolver step
> function to be called once - therefore the round-robin process.c
> dispatcher needs to happen at least once before the UDP connection
> settles.

Maybe it would be good to do the first step from the open call itself?
Then at least for IP addresses it wouldn't introduce any delays. But I
don't think it really matters because the other protocols like HTTP
are already implemented with the same pattern as my TFTP patch.

> 2. Transmitting a UDP packet from tftp.c requires an IP-to-MAC address
> lookup from the ARP cache.  If there is a cache miss, the packet is
> dropped with an error and an ARP request is sent out.  There is no
> xfer transmit queue to retransmit packets once there is an ARP reply -
> instead user code is expected to reconstruct the packet and transmit
> again.  The tftp.c code does this using the retry timer and a simple
> state model so it knows which packet to send when.

I've never seen this happen in the context of a TFTP request, but it's
probably because my DHCP server and TFTP server are the same machine
and its MAC address is already cached from the DHCP request. I guess
it would be best if UDP could cache at least one packet for this case,
since it's easier to handle it in one place rather than everywhere UDP
packets are transmitted. If the user code doesn't know that the packet
was lost due to the ARP issue, it can't retransmit immediately and if
it has a backoff timer it will introduce an unnecessary delay.

> When fetching the first file over TFTP, the ARP cache will be empty
> and our UDP packet is dropped due to an ARP cache miss.  At least one
> trip around the round-robin dispatcher is required to receive the ARP
> reply and get TFTP to retransit the RRQ packet, this time successfully
> on the wire because of an ARP cache hit.

Again I guess that'll only be if there hasn't been any previous
traffic (DHCP) to the same host.

> 3. All this is complicated by the use of an exponential backoff timer
> in tftp.c.  This is not a simple interval timer.  It will change
> duration depending on whether it expires or is told to stop before
> expiring.

That sounds to me again like a reason for the UDP layer to handle
retransmit when it KNOWS the packet can't be transmitted due to the
ARP issue (or at least report an error back rather than just silently
failing, but even in that case the timer code becomes more complicated
for every UDP user),

> I think the timer needs to be used very carefully since it can
> introduce delays where there shouldn't be any.

> Hope this is useful.

Very. It has added a lot to my understanding of what's going on. With
this in mind I think my patch is the best solution currently - can it
be applied?

Next step could be looking into the architecture of the resolver and
UDP, to see if they can return an error back if the request couldn't
be completed right away, then we can reschedule after step()ing. But
even this is suboptimal. It would be far better to introduce some kind
of wait object (semaphore) which owns a separate run queue, and
instead of step() make it possible to call sleep_on(semaphore). Then
the caller would be scheduled again before that object was scheduled
(e.g. because the DNS request was completed). Apart from stopping
aimless looping, this would make it possible to detect idle situations
and go to sleep, which could be useful for laptops or other
power-sensitive applications.

I'd like to look into this more, but right now I don't have much time,
hopefully in a few weeks I will.

Thomas