Differences

This shows you the differences between two versions of the page.

Link to this comparison view

debug [2006/06/10 00:49]
mdc created
debug [2006/06/10 00:49] (current)
mdc created
Line 1: Line 1:
 +====== gPXE debugging ======
 +
 +=== Sat Jun 10 07:40:52 UTC 2006 ===
 +
 +I've started to work on getting more drivers working in gPXE 0.5, and have come across an interesting problem.
 +
 +I am not sure what is causing the problem, so I need to debug it.  It occurred to me that this is a perfect opportunity to show how the process works, and get more people involved.
 +
 +So, let's debug this together. ​ I am doing this on real hardware. ​ I have a client machine (our StUdly LinuxWorld demo PC, no less :), a server machine, and a serial cable from the client to another machine to capture gPXE output.
 +
 +So, let's debug an actual problem. ​ I have no idea what the problem is, and what I'd like is for you (yes, you!) to help me to figure out what is going on.
 +
 +Now, note that I actually _enjoy_ debugging :)  I enjoy it the way some people enjoy reading mystery novels. ​ Rather than figuring out "who done it?", I'm figuring out "what is actually going on?"​. ​ That to me is the secret and essence of good debugging. ​ Rather than trying to "fix the bug", you understand what the code is doing, and you will begin to know how to make it do something better.
 +
 +So, let's get started.
 +
 +I ran Michael'​s AoE demo, see [[http://​sourceforge.net/​mailarchive/​forum.php?​thread_id=11136042&​forum_id=48022]] which boots a Fedora image which lives on a server. ​ It works fine, boots FC5, and uses the rtl8139 driver. ​ When playing with drivers, it's good to have a case that works, because there are a lot of variables in the hardware/​software equation.
 +
 +So, I decided to compile the same driver using gPXE from CVS, and run the test with my compiled driver. ​ I enabled the serial port in src/​config.h:​
 +
 +  #define CONSOLE_FIRMWARE ​       /* Default BIOS console */
 +  #define CONSOLE_SERIAL ​         /* Serial port */
 +
 +so that I could log Etherboot output to my laptop.
 +
 +So I have three machines in the mix, in an arrangement that looks something like this:
 +
 +      +-Serial-Cable--9600-8N1-+
 +      |                        |
 +  +--------+ ​ +--------+ ​ +--------+
 +  | client |  | server |  | laptop |
 +  +--------+ ​ +--------+ ​ +--------+
 +       ​\ ​          ​| ​          /
 +        \          |          /
 +         ​\ ​        ​| ​        /
 +          \        |        /
 +        +--------------------+
 +        |    10/100 Switch ​  |
 +        +--------------------+
 +
 +The server is running vblade, which serves up an AoE partition on a network for other machines to access. ​ I am also compiling the bin/​rtl8139.dsk images from gPXE cvs source on the server.
 +
 +So, I downloaded the gPXE cvs source, and did:
 +
 +      make DEBUG=rtl8139 bin/​rtl8139.dsk
 +
 +Then put a formatted floppy in the server drive, and did:
 +
 +      cat bin/​rtl8139.dsk > /dev/fd0
 +
 +I then put the disk the client, and booted it:
 +
 +This is the debugging output from gPXE:
 +
 +  EEPROM is an AT93C46
 +  MAC address is 00c04963458d
 +  RX ring at f6dfac4
 +  Initialising AoE device e0.0
 +  virt_to_bus ( pkb->​data ) = f6e2000 align = 0
 +  pkb_len=24 pad_len=1c
 +  TX id 0 at f6e2000+40
 +  TX id 0 complete
 +  RX packet at offset 0+428
 +  RX packet at offset 42c+6e
 +  virt_to_bus ( pkb->​data ) = f6e2000 align = 0
 +  pkb_len=24 pad_len=1c
 +  TX id 1 at f6e2000+40
 +  TX id 1 complete
 +  RX packet at offset 4a0+428
 +  RX packet at offset 8cc+40
 +  virt_to_bus ( pkb->​data ) = f6e2000 align = 0
 +  pkb_len=24 pad_len=1c
 +  TX id 2 at f6e2000+40
 +  TX id 2 complete
 +  RX packet at offset 910+428
 +  RX packet at offset d3c+40
 +  virt_to_bus ( pkb->​data ) = f6e2000 align = 0
 +  pkb_len=24 pad_len=1c
 +  TX id 3 at f6e2000+40
 +  TX id 3 complete
 +  RX packet at offset d80+40
 +  RX packet at offset dc4+428
 +  RX packet at offset 11f0+40
 +  RX packet at offset 1234+40
 +  virt_to_bus ( pkb->​data ) = f6e2000 align = 0
 +  pkb_len=24 pad_len=1c
 +  TX id 0 at f6e2000+40
 +  TX id 0 complete
 +  RX packet at offset 1278+428
 +
 +  Could not reach AoE device e0.0
 +  Press any key to exit
 +
 +Now, the first question one might ask is, "​What'​s it supposed to do?"
 +
 +Well, it's supposed to talk to the AoE server (vblade in this case), download an image, and run it.
 +
 +So, what does the protocol look like?
 +
 +Here's a packet trace from tcpdump. ​ Note that we have the MAC (ethernetE hardware) address of the client, so we can tell tcpdump to give us a trace of just packets to and from the server that match the clients MAC address, which is:
 +
 +      00:​c0:​49:​63:​45:​8d
 +
 +Here's what a tcpdump session looks like:
 +
 +  $ sudo /​usr/​sbin/​tcpdump ether host 00:​c0:​49:​63:​45:​8d
 +  tcpdump: listening on eth0
 +  13:​30:​58.858667 0:​c0:​49:​63:​45:​8d Broadcast 88a2 64:
 +                           1000 0000 0000 ebeb 0001 0000 01ec 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000 0000
 +                           0000
 +  13:​30:​58.858887 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0001 0000 0140 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 5353 204e 4548
 +                           4552 2020 2020 2020 2020 2020 2020 0000
 +                           0000 0000 3156 0a30 2020 2020 6f43 6172
 +                           6469
 +  13:​30:​59.266816 0:​c0:​49:​63:​45:​8d Broadcast 88a2 64:
 +                           1000 0000 0000 ebeb 0002 0000 01ec 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000 0000
 +                           0000
 +  13:​30:​59.266915 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0002 0000 0140 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 5353 204e 4548
 +                           4552 2020 2020 2020 2020 2020 2020 0000
 +                           0000 0000 3156 0a30 2020 2020 6f43 6172
 +                           6469
 +  13:​31:​00.003411 0:​c0:​49:​63:​45:​8d Broadcast 88a2 64:
 +                           1000 0000 0000 ebeb 0003 0000 01ec 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000 0000
 +                           0000
 +  13:​31:​00.003513 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0003 0000 0140 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 5353 204e 4548
 +                           4552 2020 2020 2020 2020 2020 2020 0000
 +                           0000 0000 3156 0a30 2020 2020 6f43 6172
 +                           6469
 +  13:​31:​01.540733 0:​c0:​49:​63:​45:​8d Broadcast 88a2 64:
 +                           1000 0000 0000 ebeb 0004 0000 01ec 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000 0000
 +                           0000
 +  13:​31:​01.540832 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0004 0000 0140 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 5353 204e 4548
 +                           4552 2020 2020 2020 2020 2020 2020 0000
 +                           0000 0000 3156 0a30 2020 2020 6f43 6172
 +                           6469
 +  13:​31:​04.616461 0:​c0:​49:​63:​45:​8d Broadcast 88a2 64:
 +                           1000 0000 0000 ebeb 0005 0000 01ec 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000 0000
 +                           0000
 +  13:​31:​04.616565 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0005 0000 0140 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 5353 204e 4548
 +                           4552 2020 2020 2020 2020 2020 2020 0000
 +                           0000 0000 3156 0a30 2020 2020 6f43 6172
 +                           6469
 +
 +Note that the client (00:​c0:​49:​63:​45:​8d) keeps sending a broadcast to the server (00:​11:​11:​77:​aa:​f3). The server responds, but the response is apparently not accepted by the client, because it continues broadcasting.
 +
 +So, where do we start? ​ Well, one thing to ask is, "what does a good packet trace look like?"​.
 +
 +Here's one from the AoE demo test image:
 +
 +  $ sudo /​usr/​sbin/​tcpdump ether host 00:​c0:​49:​63:​45:​8d
 +  tcpdump: listening on eth0
 +  13:​42:​16.706926 0:​c0:​49:​63:​45:​8d Broadcast 88a2 60:
 +                           1000 0000 0000 ebeb 0001 0000 01ec 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000
 +  13:​42:​16.707127 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0001 0000 0140 0000
 +                           00e0 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 5353 204e 4548
 +                           4552 2020 2020 2020 2020 2020 2020 0000
 +                           0000 0000 3156 0a30 2020 2020 6f43 6172
 +                           6469
 +  13:​42:​16.792007 0:​c0:​49:​63:​45:​8d 0:​11:​11:​77:​aa:​f3 88a2 60:
 +                           1000 0000 0000 ebeb 0002 4000 0124 0000
 +                           0000 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000
 +  13:​42:​16.792109 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0002 4000 0040 0000
 +                           0000 0000 0000 faeb 3112 0000 4c49 4c4f
 +                           1606 1000 0100 007c 0000 0000 0000 0000
 +                           0000 5eac 08c0 7409 b40e bb07 00cd 10eb
 +                           f2b9 1300 b486 cd15 cd18 31c0 8ed0 bc00
 +                           7cfb
 +  13:​42:​16.792497 0:​c0:​49:​63:​45:​8d 0:​11:​11:​77:​aa:​f3 88a2 60:
 +                           1000 0000 0000 ebeb 0003 4000 0124 0000
 +                           0000 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000
 +  13:​42:​16.792574 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0003 4000 0040 0000
 +                           0000 0000 0000 faeb 3112 0000 4c49 4c4f
 +                           1606 1000 0100 007c 0000 0000 0000 0000
 +                           0000 5eac 08c0 7409 b40e bb07 00cd 10eb
 +                           f2b9 1300 b486 cd15 cd18 31c0 8ed0 bc00
 +                           7cfb
 +  13:​42:​16.792913 0:​c0:​49:​63:​45:​8d 0:​11:​11:​77:​aa:​f3 88a2 60:
 +                           1000 0000 0000 ebeb 0004 4000 0124 3f00
 +                           0000 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000
 +  13:​42:​16.792977 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0004 4000 0040 3f00
 +                           0000 0000 0000 faeb 2101 b501 4c49 4c4f
 +                           1606 463e 7f44 0000 0000 1e07 7f44 9792
 +                           8e54 c100 8260 d010 0c00 b8c0 078e d0bc
 +                           0008 fb52 5306 56fc 8ed8 31ed 60b8 0012
 +                           b336
 +  13:​42:​16.794842 0:​c0:​49:​63:​45:​8d 0:​11:​11:​77:​aa:​f3 88a2 60:
 +                           1000 0000 0000 ebeb 0005 4000 0124 d010
 +                           0c00 0000 0000 0000 0000 0000 0000 0000
 +                           0000 0000 0000 0000 0000 0000 0000
 +  13:​42:​16.794912 0:​11:​11:​77:​aa:​f3 0:​c0:​49:​63:​45:​8d 88a2 1060:
 +                           1800 0000 0000 ebeb 0005 4000 0040 d010
 +                           0c00 0000 0000 d28e 0b00 ef8e 0b00 f08e
 +                           0b00 af8f 0b00 b08f 0b00 9b93 0b00 9c93
 +                           0b00 9d93 0b00 9e93 0b00 c710 0c00 c810
 +                           0c00 c910 0c00 ca10 0c00 cb10 0c00 cc10
 +                           0c00
 +
 +Note that in this case, the client (00:​c0:​49:​63:​45:​8d) only sends a single broadcast to the server (00:​11:​11:​77:​aa:​f3). ​ After that, it speaks directly to the server by address.
 +
 +So, Dear Reader, ​ I ask you to suggest a next debugging step.  I know what I would do, but let's do this together. ​ I'm going to get some sleep (It's 2:15 am Saturday morning here), and when I come back, I'll continue debugging according to suggestions from this list.
 +
 +Now isn't this more interesting that reading along? :)  You can look at the code, set up a debugging system like mine, draw from your own experience, or however you want.
 +
 +I'll look forward to any and all suggestions to understand what's going on here. Why isn't the client responding appropriately? ​ Remember, this same setup works with a different floppy, so we can pretty safely say the hardware is working.
 +
 +So, what do you think we should do next (and why)? :)
 +
 +
  

QR Code
QR Code debug (generated for current page)