Michael Decker: Driver Development

Week 6

2 July

Today, I'll be doing some testing with the eepro100. I've been using one PCI card for testing thus far, so I'll toss in another and run more tests.

Testing revealed a problem on the 82559. A series of TX overflows occurred. Testing with the original legacy driver did not show this behavior, so faulty hardware is not likely the root cause.

3 July

The open() routine, which performs two commands, ias and config, was modified. It was changed to match the behavior of the legacy driver. That is, the ias command links into the config command, which then suspends the device. One start command is issued which executes both commands in sequence.

Additionally, I ensured the configure command is aligned to a physical even address by allocating it through malloc_dma(). The allocation & initialization of the ias command was moved to just precede the configure command, as they are now executed together.

Also, I added in the legacy code which tests if the device is plugged in. The result of this test calls either netdev_link_up() or netdev_link_down().

The 82559 now appears to operate properly under normal conditions. However, I still have a problem if I boot the system with both cards, and the first card fails to boot. The second card causes a triple fault around when the http get is being performed. I haven't tested the legacy driver in such a double-booting situation, so I'm not sure if there are any special hardware issues to consider in this setup.

5 July

The debugging continues.

I tested the legacy eepro100 driver with two 8255x PCI cards installed in my target system. Only one card had the network cable attached at a time. gPXE acted as if only one card was installed, a behavior which differed from the new driver. Near the end of the probe routine of the legacy driver:

 765         if (!(mdio_read(eeprom[6] & 0x1f, 1) & (1 << 2))) {
 766                 printf("Valid link not established\n");
 767                 eepro100_disable(nic);
 768                 return 0;
 769         }

I commented out lines 767 & 768 and re-tested. This time, gPXE did not attempt to boot both cards. The original, legacy driver used common global data which would make these tests fruitless, nonetheless.

It seems likely there is something wrong with the driver code which is causing a failure after the first boot attempt. In a previous journal entry, I alluded to a possible hardware issue because all global state data for each driver instance is located within a struct ifec_private, created with alloc_etherdev() in ifec_pci_probe(). The only globally-defined data is ifec_cfg, which is only read from when initializing the malloc_dma()ed configure command.

This routine is called two times in sequence when gPXE boots, as each card is probed. This allocation should completely separate the state of each card. The only difference between the second card's booting sequence from a boot without the first boot attempt is a time delay.

There is a short duration while the first card attempts to network boot. After this period, the second card is attempted. Any other changes in the state of the machine should be dependent on other parts of gPXE outside of the driver code.

Perhaps some GDB debugging is in order?

QR Code
QR Code soc:2008:mdeck:journal:week6 (generated for current page)