This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
soc:2010:andreif:journal:week7 [2010/07/07 09:51]
soc:2010:andreif:journal:week7 [2010/07/10 09:46]
Line 30: Line 30:
 In other news, Piotr reported a problem with the pcnet32 driver which, fortunately,​ was easy to fix. Thanks Piotr! In other news, Piotr reported a problem with the pcnet32 driver which, fortunately,​ was easy to fix. Thanks Piotr!
 +==== Day 4 [ Thu 8 Jul 2010 ] ====
 +Git commit: [[http://​git.etherboot.org/?​p=people/​andreif/​gpxe.git;​a=commit;​h=f4ae3fafb3291254e99c378f909599e8930c9431|f4ae3fafb3291254e99c378f909599e8930c9431]]
 +I did a lot of debugging today, which turned out to be a real PITA because of the following reasons:
 +  * the router I am using causes link failures every now and then. I got tired of it so I'm back to USB sticks
 +  * I have no serial port so more than a couple of DBG messages are difficult to handle ( Pause/Break ftw )
 +  * Too many RX rings ( there were 32, now set to 16 )
 +  * Forgot to call ''​netdev_rx()''​ so iobufs were never passed to the upper layers. This was the reason for ''​alloc_iob()''​ failing since iobufs were never freed
 +  * I was using the wrong kind of descriptor in .transmit so the flaglen field never got set. Thus, the NIC was not sending any packets
 +  * = instead of ==. I know some people prefer to write if ( 1 == var ) in order to avoid these issues but IMHO it seriously affects readability.
 +  * Misinterpretation of the flag field ( I'm still not sure about NV_RX_AVAIL )
 +  * I wasn't setting up the descriptor rings' physical address correctly
 +Finding these bugs involved a lot of printf debugging. This took a lot of time especially because I constantly had to limit the number of DBGs.
 +After all of this, a minor victory occured, gPXE managed to send the first DHCP DISCOVER packet (actionally two, I'm not sure if this is from gPXE or there is something wrong with the driver), _but_ the dhcp server was not replying at all. I forgot to mention that I was using Wireshark all this time to see if there was any traffic on the wire. I looked at the packet sent by gPXE and saw that it contained a BOOTP section. I wasn't sure if DHCP servers are automatically configured to reply to BOOTP packets so I went back to my dhcp.conf file.
 +Extract from the .conf:
 +# Fixed IP addresses can also be specified for hosts. ​  These addresses
 +# should not also be listed as being available for dynamic assignment.
 +# Hosts for which fixed IP addresses have been specified can boot using
 +# BOOTP or DHCP.   Hosts for which no fixed address is specified can only
 +# be booted with DHCP, unless there is an address range on the subnet
 +# to which a BOOTP client is connected which has the dynamic-bootp flag
 +# set.
 +After adding a host configuration in the .conf, the server finally replied with a DHCP OFFER packet. Still, the NIC remained silent, so now there probably are issues in RX.
 +==== Day 5 [ Fri 9 Jul 2010 ] ====
 +==== Day 6 [ Sat 10 Jul 2010 ] ====
 +Git commit: [[http://​git.etherboot.org/?​p=people/​andreif/​gpxe.git;​a=commit;​h=7b189803f76097d4b6b47bb5b90146259b7c8834|7b189803f76097d4b6b47bb5b90146259b7c8834]]
 +Yet Another Debugging Session. Managed to fix more bugs, still not working properly.
 +  * When allocating rx descriptors,​ was using the original descriptor format instead of the extended one
 +  * I was not sending received packets to the upper layers correctly. Used iob_put and fixed a bug where a NULL iobuf would be netdev_rx-ed
 +  * This one was the most difficult to fix. Last time, I left off with the server replying with a DHCP OFFER packet and the NIC remaining silent after that. I thought it was an RX issue. Still, after finding and fixing some RX bugs, the problem still remained. Running out of ideas, I tried sending another packet to the NIC (aka, not a DHCP OFFER). Lo and behold, an ARP packet was received properly by the NIC. The destination address of the ARP packet was The DHCP OFFER destination address was a unicast address. Clearly, the NIC did unwanted filtering. Digging through the registers I set the NIC in promiscuous mode and it replied to the DHCP server.{{:​soc:​2010:​andreif:​journal:​forcedet-buggy1.png?​1000|forcedeth driver first packets}}
 +  * Finally got NV_RX_AVAIL and NV_TX_VALID right
 +Eventually, the above process stops with an TX overflow error. The NIC is not sending out packets properly, even though the flaglen field is marked with NV_TX_VALID. I fiddled with the code some more and right now it does not send packets at all (sometimes it does, sometimes it sends the first two DISCOVER packets but the server does not reply). Also, notice in the above picture that there are some malformed-packets (the white ones). I've yet to figure out what is causing these issues.

QR Code
QR Code soc:2010:andreif:journal:week7 (generated for current page)