June 22: I started out the evening of the 22nd working on a debugging issue…

I asked rwcr for some help and he gladly provided it. The issue of the night was debugging why open (or any other procedure) was not being called at all.

rwcr showed me how to hook up to to a gdbstub with udp debugging and I inserted an rtl8139b into my testing machine. After a while tracing variables and stepping into procedures, I found (again, with much help from rwcr) that the reference to netdevice that I was returning to the core was infact a reference to the net1 device, rather than the actual net0 device. Hens, my open (and subsequent functions) were not called because the core was calling the functions of another driver!

After tracing through what I had written, i found that I was initializing netdevice[0] twice; once in probe, then again (unintentionally) in skge_initialize.

Removed this instance, compiled, booted, and all of open's DBGP commands showed up on the screen. Yay.

So, moving on, the current obstacle is that mac address appears to be broken again. I will figure out why this is broken again, however, I checked the rest of the variables being initialized in dbg and they all appear to check out! yay. I will be moving on to poll this week.

June 23:

Unfortunately, today I came home and saw that my computer was frozen trying to enumerate it's RAM. After some basic troubleshooting and memtest x86, i found that my main desktop machine has some faulty ram in it. Hopefully I'll be able to replace the ram in store tomorrow… 4x 2GB sticks of corsair dominator 1066 isn't cheap…

All I'll be able to get done today (well, the rest of today) is finish this journal post. However, tonight I initialized a complete build/test environment in my laptop and downloaded my git tree there. Hopefully this will allow me to continue to work and minimize my down time to the time it took me to setup this laptop.

June 24:

Everything is fixed! My computer(s) are back online and ready to go.

Today I began by tracing attempting to trace out the source of the memory error I'm experiencing.
I started with inserting several DEBP messages to follow program flow. Unfortunately after several hours, I've made no “actual” progress.

June 25:

I have made some headway, but still have not resolved the memory issue.

This is what the code is executing as:

skge_probe - start
     skge_probe - middle: addr 0xde258b10 irq 10 chip 0x0 rev 0
     ll_addr[i]: 00:21:91:91:10:6dskge_initialize - start
skge_perform_software_reset()
     initialize -> removing error bits
skge_enable_test_mode - 1
skge_enable_test_mode - 0
chip id: 177
chip id: 10
chip id: 176
chip id: 178
     initialize -> chip id: MARV: 0xb1
     initialize -> ram_size  : 65536
     initialize -> ram_offset: 0
     initialize -> wasn't genesis
     initialize -> Clearing error bits
     initialize -> Performing reset
     initialize -> Stopping card
     initialize -> Turning LED on
     initialize -> Enabling arbiter
     initialize -> Setting timeout init values
     initialize -> Setting clock values
skge_usecs2clk
hwkhz
     initialize -> Resetting each port
yukon_reset start
yukon_reset end
skge initialize - end
port: 0skge_probe - end - return 0



gPXE 0.9.7+ -- Open Source Boot Firmware -- http://etherboot.org
Features: HTTP DNS TFTP AoE iSCSI bzImage COMBOOT ELF Multiboot PXE PXEXT

skge_open
skge net0: enabling interface
skge_ring_alloc 1
skge_ring_alloc 2
skge_ring_alloc 3 - ring->start = 97264
skge_ring_alloc 3 - ring->count = 6
skge_ring_alloc 3 - vaddr = 0
                skge_ring_alloc 3 - i = 0
                skge_ring_alloc 3 - e = 97264
                skge_ring_alloc 3 - d = 0
                skge_ring_alloc 3 - i = 1
                skge_ring_alloc 3 - e = 97284
                skge_ring_alloc 3 - d = 32
                skge_ring_alloc 3 - i = 2
                skge_ring_alloc 3 - e = 97304
                skge_ring_alloc 3 - d = 64
                skge_ring_alloc 3 - i = 3
                skge_ring_alloc 3 - e = 97324
                skge_ring_alloc 3 - d = 96
                skge_ring_alloc 3 - i = 4
                skge_ring_alloc 3 - e = 97344
                skge_ring_alloc 3 - d = 128
                skge_ring_alloc 3 - i = 5
                skge_ring_alloc 3 - e = 97364
                skge_ring_alloc 3 - d = 160
skge_ring_alloc 4
skge_ring_alloc 5
Function: skge_rx_fill -
Function: skge_rx_fill - end
here 0009skge_ring_alloc 1
skge_ring_alloc 2
skge_ring_alloc 3 - ring->start = 97392
skge_ring_alloc 3 - ring->count = 6
skge_ring_alloc 3 - vaddr = 192
                skge_ring_alloc 3 - i = 0
                skge_ring_alloc 3 - e = 97392
                skge_ring_alloc 3 - d = 192
                skge_ring_alloc 3 - i = 1
                skge_ring_alloc 3 - e = 97412
                skge_ring_alloc 3 - d = 224
                skge_ring_alloc 3 - i = 2
                skge_ring_alloc 3 - e = 97432
                skge_ring_alloc 3 - d = 256
                skge_ring_alloc 3 - i = 3
                skge_ring_alloc 3 - e = 97452
                skge_ring_alloc 3 - d = 288
                skge_ring_alloc 3 - i = 4
                skge_ring_alloc 3 - e = 97472
                skge_ring_alloc 3 - d = 320
                skge_ring_alloc 3 - i = 5
                skge_ring_alloc 3 - e = 97492
                skge_ring_alloc 3 - d = 352
skge_ring_alloc 4
skge_ring_alloc 5
here 0011yukon_mac_init - Not Yukon Lite   - Not Yukon Lite   - Autoneg disabled - half duplex
Function: yukon_init - start
skge à: phy read timeout port 0 reg 0 val 0
Function: yukon_init - end
yukon_mac_init - endhere 0002
adapter : 96796
rxqaddr : 82636
port    : 0
ram_addr: -65281
chunk: 8454017

At this point (directly following the “chunk” execution, execution haults and the system becomes non-responsive.

I managed to narrow down exection to a single point in the source code that execution haults on, however, I find it hard to believe that between two successive DBGP() statements execution haults.

Lines 637, 638, and 639 of skge.c:

 637         DBGP("chunk: %d\n",chunk);
 638         skge_ramset(adapter, rxqaddr[port], ram_addr, chunk);
 639         DBGP("here 0003\n");

and, subsequently, the first few lines in the definition of skge_ramset:

 415 static void skge_ramset(struct skge_adapter *hw, u16 q, u32 start, size_t len) {
 416         u32 end;
 417         DBGP("skge_ramset - start");

Thus, I'm very concerned by the fact that the output does *NOT* look like this…

[...]
chunk: 8454917
skge_ramset - start
[...]

As these are basicly subsequent lines of execution (ignoring the function devision).

Also, i sent an email to gsoc-mentors-2009 today; The contents of which read (briefly) as:

when built as "make bin/skge.pxe DEBUG=skge:7", output is http://pxe.asdlkf.net/single.txt, and execution haults
when built as "make bin/skge--rtl8139.pxe DEBUG=skge:7" and execution ... works?

June 26: <day taken off, friends over>

June 27:

Looking more closely into the output of the two commands run at the end of the 25th (the 2 different versions of building skge), the mentor email list pointed me towards “vaddr” being 0. This does absolutly make sense as a probable cause for a crash. I will look more directly into following this variable in both versions of the executible build as soon as my meeting is over today.

… Later that day…

So, vaddr was just a symptom.

I spent about 90 minutes with MCB30 and AndyTim trying to diagnose what was going on; In the end, one specific line stood out:

       netdev = alloc_etherdev (sizeof (*adapter));

Hmm…… It COULD have something to do with the fact that netdev is being allocated the size of *adapter…

/GROAN

Ok, so, better. Things appear to be running smoothly.

June 28: Taken off, Helping a friend paint a wall. Then watching it dry. Then painting it again.

– Chris