Michael Decker: Driver Development

Week 3


10 June

Continued respinning my code today. Most of the functionality has been respun now, and soon a few purely formatting commits will take place (comments, tabs, naming, reordering, splitting to .h file)

11 June

Ran the code through the compiler this morning; cleaned up some things from the output.

Finally got to test her out. Tests were run via PXE chainloading from the onboard NIC to the PCI card. The DHCP server gave PXE the gPXE image, and it gave gPXE an HTTP URL, which pointed to a gPXE script that loads a kernel from an external HTTP server.

It wasn't responding past open() at first, so I built a debug image of the original driver to record the output for comparison. Fixed a few mistakes, but I believe the big one was the rx status not being checked in eepro100_poll().

Now she makes it a little ways, though appears to still have problems:

gPXE 0.9.3 -- Open Source Boot Firmware -- http://etherboot.org
Features: HTTP DNS TFTP iSCSI AoE bzImage Multiboot PXE PXEXT

net0: 00:90:27:43:84:4b on PCI01:00.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
Waiting for link-up on net0... ok
DHCP (net0 00:90:27:43:84:4b)... ok
net0: 192.168.1.19/255.255.255.0 gw 192.168.1.1
Booting from filename "http://192.168.1.9/gtest.gpxe"
http://192.168.1.9/gtest.gpxe... ok
http://rom.etherboot.org/gtest/bz2bzImage... Input/output error (0x1d0c6039)
Could not fetch http://rom.etherboot.org/gtest/bz2bzImage: Input/output error ()
Could not boot http://192.168.1.9/gtest.gpxe: Input/output error (0x1d0c6039)
No more network devices

Next up, more debugging. Once she runs well I'll make formatting adjustments, proper commenting, and consistent naming. After that, transmits will be expanded to multiple TxFDs.

12 June

Today, more debugging ensued. I changed eepro100_transmit and eepro100_poll to check the descriptors for any set bits rather than specific ones, as the original driver did. I ensured that priv→rxfd was incremented in eepro100_poll.

I then found the driver successfully communicating without any errors, although it was painfully slow. I stripped speedo_soft_rx_reset, and removed the abort command in particular. I also modified eepro100_poll to loop until processing all received packets, rather than only handling one and returning. This, along with removing the goto check_suspension line ensured that a soft_reset didn't occur until all rxfds were available.

This increased transfer speeds by many orders of magnitude. Now I can load a kernel & initrd file in less than a second instead of two minutes.

I had my weekly meeting today instead of tomorrow because of birthday plans. Highlights of the meeting included the need to dynamically allocate all tx & rx descriptors and data buffers. This entails finding any alignment requirements that may exist.

Things are starting to pick up now with a live, breathing driver to play with! :-D

14 June

I spent some time today reading up on how to transition to dynamic allocation for the tx & rx descriptors. I started with the Intel datasheet, wherein I recalled some mention of simple and flexible memory modes, which allow separation of the data buffer from the descriptor.

From the beginning of Chapter 6:

Note: Although references are made to both simplified and flexible memory modes for transmit and
receive commands, only the simplified mode is supported. All bit settings and silicon
configurations only refer to the simplified memory mode.

So, is flexible memory mode supported or not?

Section 6.4.3.1.2 details the RFD (receive frame descriptor) format. The SF bit:

SF (Bit 19) The SF bit equals 0 for simplified mode.

So then, does setting this =1 support flexible mode?

Then I found this blurb on the H bit:

H (Bit 20) The H bit indicates if the current RFD is a header RFD. If it equals 1, the current RFD is
a header RFD, and if it is 0, it is not a header RFD.
NOTE: If a load HDS command was not previously issued, the device disregards this
bit.

It appears, if you request 'early interrupts', an interrupt occurs when the header portion of an Ethernet packet is received. This early data goes into a header RFD. The load HDS command:

101 Load Header Data Size (HDS). After a load HDS command is issued, the
device expects to only find header RFDs or to be used in Receive DMA mode
until it is reset. This value defines the size of the header portion of the RFDs or
receive buffers. The HDS value is defined by the lower 14 bits of the SCB
General Pointer; thus, bits 15 through 31 should always be set to zeros when
using this command. The value of HDS should be an even non-zero number

So now, what is 'Receive DMA mode'? I found a mention of it:

011 Receive DMA Redirect. This command is only valid for the 82558 and later
devices. The buffers are indicated by an RBD chain, which is pointed to by an
offset stored in the general pointer register (in the RU base).

Aha! So it seems this 'Receive DMA mode' is what allows your rx data to be separate from the rx descriptor. This blurb seems to indicate such functionality is only available on the 82558 and later.

So then I looked at how the linux drivers are handling their rx buffers. There are actually two linux drivers for these NICs - eepro100.c and e100.c. The former is what the gPXE driver is based on, the latter is written by Intel.

Both linux drivers appear to use the simplified rx mode, and store both the RFD header and rx data within their buffers. Prior to handing off the rx data to the network subsystem, the buffer's internal data pointer is adjusted to point at the packet data.

Looking at gPXE's io_buffer structure, it appears I can do the same:

/**
 * A persistent I/O buffer
 *
 * This data structure encapsulates a long-lived I/O buffer.  The
 * buffer may be passed between multiple owners, queued for possible
 * retransmission, etc.
 */
struct io_buffer {
	/** List of which this buffer is a member
	 *
	 * The list must belong to the current owner of the buffer.
	 * Different owners may maintain different lists (e.g. a
	 * retransmission list for TCP).
	 */
	struct list_head list;
 
	/** Start of the buffer */
	void *head;
	/** Start of data */
	void *data;
	/** End of data */
	void *tail;
	/** End of the buffer */
        void *end;
};

I then decided it'd be best to make my planned formatting changes now, rather than continue to prolong the inevitable changes. Thus, I spent some time making purely formatting changes, that is, no changes to what the code actually does changed, except for a few debug statements. The diff probably won't be of use, as it's all different :-)

I removed the description of operation as it's all changing. I'll write in a new description once the final driver operation is set. IFEC = Intel Fast Ethernet Controller. From what I can find, it seems only these controllers are referred to as Fast Ethernet Controllers by Intel. It's short and simple.

15 June

I rewrote ifec_scb_cmd_wait(). Now it returns an error code if the command unit doesn't become ready within a timeout interval. The timeout is a configurable #define CU_CMD_TIMEOUT. The return code is propagated through ifec_scb_cmd(), so the caller can check that if they want.

I rearranged operations in ifec_net_open(), to give a seemingly more logical order. It seems to work the same, I wonder if this makes a difference.

After the last commit, I considered there may be some nuance to the hardware design that makes the original ordering always work, and the new ordering occasionally fail. I wouldn't notice this for a while.

This got me wondering if I should strive to keep the drivers as close to the original Linux version as possible. This could take advantage of any subtle hardware problems that their code has evolved to avert.

Although I may just be over-thinking things. One would hope any hardware defects would be clearly identified. Would the Intel engineers releasing open-source drivers for their hardware spell out how they worked around hardware bugs..? I wonder if my work over the past few days hasn't been a waste.

I haven't spoken to Marty since making these changes, so we'll see what he says.

Tomorrow I hope to integrate dynamic allocation for the tx & rx, as well as expand tx to multiple descriptors.