This is an old revision of the document!
====== Michael Decker: Driver Development ====== ==== Week 5 ==== ---- === 23 & 24 June === OK, so I rewrote the tx path again, using the suspend bit as necessary. To give you an idea of what I'm working with, here is a **short overview**. The 8255x processes //command blocks// to perform actions, such as configuration and transmit. {{:soc:2008:mdeck:journal:memarch.png|}} The gPXE driver executes an //individual address setup// command and then a //configure// command in ''ifec_net_open()''. In addition, a ring of //transmit control blocks// (TCB) are initialized. The configure command's ''link'' member points to the first TCB in the list, and the suspend bit is set. When ''ifec_net_transmit()'' is called, the next TCB in the ring is configured to transmit and suspend, then the card is issued a //resume// command. The card will fetch the next command block, the address of which was cached from the ''link'' field of the previous command block. In our case, the ''link'' fields never change, and they form the ring of TCBs. The transmit command will be processed, and the card will suspend. If multiple ''ifec_net_transmit()''s are issued quickly, they may form a chain of TCBs without intermediate suspends occuring. This is enabled by clearing the suspend bit in the previous TCB when preparing a TCB for transmission. The code to do this is rather straight-forward, so why did it take two days to rewrite this? In a word, **bugs**. There were a few mistakes in my code, such as calls ''ifec_scb_cmd_wait ( ioaddr )'' instead of ''ifec_scb_cmd_wait ( ioaddr + SCBCmd )'', or assigning ''tcb->link = ptr'' instead of ''tcb->link = virt_to_bus ( ptr )''. However, there was a momma-jomma bug that eluded me for some time. This bug caused the machine to triple fault or simply freeze at seemingly random moments, although the moment was the same for any given compile. The bug was located in a loop in ''ifec_net_poll()'': <file> /* Check status of transmitted packets */ while ( ( status = tcb->status ) && tcb->iob ) { if ( status & TCB_U ) { netdev_tx_complete_err ( netdev, tcb->iob, -ENOMEM ); } else { netdev_tx_complete ( netdev, tcb->iob ); } DBGIO ( "tx completion\n" ); tcb->iob = NULL; tcb->status = 0; tcb->command &= ~CmdSuspend; /* Allow controller to resume. */ tcb = a->tcb_tail = tcb->next; /* Next TCB */ } </file> The above is the proper code. The bug was one line, [[http://git.etherboot.org/?p=people/mdeck/gpxe.git;a=blob;f=src/drivers/net/eepro100.c;h=5a3fa83c81a550f2d4624401c3909c356350c2be;hb=1a2f25aa714f8f4c0adbaf2c186d69f986000dc5#l453|line 453 in this commit]]. The line ''free_iob ( tcb->iob );'' is not only unnecessary, it is wrong. The ''netdev_tx_complete()'' functions free this io_buffer. After finally fixing this bug, I tested it out more and it was working just fine. //Except//, a few times after it successfully downloaded the kernel and initrd over http, the system froze. I haven't been able to duplicate this behavior since; perhaps it's the same [[:soc:2008:dverkamp:journal:week5|bug that DrV encountered]].