June 22:
I started out the evening of the 22nd working on a debugging issue...
I asked rwcr for some help and he gladly provided it. The issue of the night was debugging why open (or any other procedure) was not being called at all.
rwcr showed me how to hook up to to a gdbstub with udp debugging and I inserted an rtl8139b into my testing machine. After a while tracing variables and stepping into procedures, I found (again, with much help from rwcr) that the reference to netdevice that I was returning to the core was infact a reference to the net1 device, rather than the actual net0 device. Hens, my open (and subsequent functions) were not called because the core was calling the functions of another driver!
After tracing through what I had written, i found that I was initializing netdevice[0] twice; once in probe, then again (unintentionally) in skge_initialize.
Removed this instance, compiled, booted, and all of open's DBGP commands showed up on the screen. Yay.
So, moving on, the current obstacle is that mac address appears to be broken again. I will figure out why this is broken again, however, I checked the rest of the variables being initialized in dbg and they all appear to check out! yay. I will be moving on to poll this week.
June 23:
Unfortunately, today I came home and saw that my computer was frozen trying to enumerate it's RAM. After some basic troubleshooting and memtest x86, i found that my main desktop machine has some faulty ram in it. Hopefully I'll be able to replace the ram in store tomorrow... 4x 2GB sticks of corsair dominator 1066 isn't cheap...
All I'll be able to get done today (well, the rest of today) is finish this journal post. However, tonight I initialized a complete build/test environment in my laptop and downloaded my git tree there.
Hopefully this will allow me to continue to work and minimize my down time to the time it took me to setup this laptop.
June 24:
Everything is fixed! My computer(s) are back online and ready to go.
Today I began by tracing attempting to trace out the source of the memory error I'm experiencing.\\
I started with inserting several DEBP messages to follow program flow. Unfortunately after several hours, I've made no "actual" progress.
June 25:
I have made some headway, but still have not resolved the memory issue.
This is what the code is executing as:\\
skge_probe - start
skge_probe - middle: addr 0xde258b10 irq 10 chip 0x0 rev 0
ll_addr[i]: 00:21:91:91:10:6dskge_initialize - start
skge_perform_software_reset()
initialize -> removing error bits
skge_enable_test_mode - 1
skge_enable_test_mode - 0
chip id: 177
chip id: 10
chip id: 176
chip id: 178
initialize -> chip id: MARV: 0xb1
initialize -> ram_size : 65536
initialize -> ram_offset: 0
initialize -> wasn't genesis
initialize -> Clearing error bits
initialize -> Performing reset
initialize -> Stopping card
initialize -> Turning LED on
initialize -> Enabling arbiter
initialize -> Setting timeout init values
initialize -> Setting clock values
skge_usecs2clk
hwkhz
initialize -> Resetting each port
yukon_reset start
yukon_reset end
skge initialize - end
port: 0skge_probe - end - return 0
gPXE 0.9.7+ -- Open Source Boot Firmware -- http://etherboot.org
Features: HTTP DNS TFTP AoE iSCSI bzImage COMBOOT ELF Multiboot PXE PXEXT
skge_open
skge net0: enabling interface
skge_ring_alloc 1
skge_ring_alloc 2
skge_ring_alloc 3 - ring->start = 97264
skge_ring_alloc 3 - ring->count = 6
skge_ring_alloc 3 - vaddr = 0
skge_ring_alloc 3 - i = 0
skge_ring_alloc 3 - e = 97264
skge_ring_alloc 3 - d = 0
skge_ring_alloc 3 - i = 1
skge_ring_alloc 3 - e = 97284
skge_ring_alloc 3 - d = 32
skge_ring_alloc 3 - i = 2
skge_ring_alloc 3 - e = 97304
skge_ring_alloc 3 - d = 64
skge_ring_alloc 3 - i = 3
skge_ring_alloc 3 - e = 97324
skge_ring_alloc 3 - d = 96
skge_ring_alloc 3 - i = 4
skge_ring_alloc 3 - e = 97344
skge_ring_alloc 3 - d = 128
skge_ring_alloc 3 - i = 5
skge_ring_alloc 3 - e = 97364
skge_ring_alloc 3 - d = 160
skge_ring_alloc 4
skge_ring_alloc 5
Function: skge_rx_fill -
Function: skge_rx_fill - end
here 0009skge_ring_alloc 1
skge_ring_alloc 2
skge_ring_alloc 3 - ring->start = 97392
skge_ring_alloc 3 - ring->count = 6
skge_ring_alloc 3 - vaddr = 192
skge_ring_alloc 3 - i = 0
skge_ring_alloc 3 - e = 97392
skge_ring_alloc 3 - d = 192
skge_ring_alloc 3 - i = 1
skge_ring_alloc 3 - e = 97412
skge_ring_alloc 3 - d = 224
skge_ring_alloc 3 - i = 2
skge_ring_alloc 3 - e = 97432
skge_ring_alloc 3 - d = 256
skge_ring_alloc 3 - i = 3
skge_ring_alloc 3 - e = 97452
skge_ring_alloc 3 - d = 288
skge_ring_alloc 3 - i = 4
skge_ring_alloc 3 - e = 97472
skge_ring_alloc 3 - d = 320
skge_ring_alloc 3 - i = 5
skge_ring_alloc 3 - e = 97492
skge_ring_alloc 3 - d = 352
skge_ring_alloc 4
skge_ring_alloc 5
here 0011yukon_mac_init - Not Yukon Lite - Not Yukon Lite - Autoneg disabled - half duplex
Function: yukon_init - start
skge à: phy read timeout port 0 reg 0 val 0
Function: yukon_init - end
yukon_mac_init - endhere 0002
adapter : 96796
rxqaddr : 82636
port : 0
ram_addr: -65281
chunk: 8454017
At this point (directly following the "chunk" execution, execution haults and the system becomes non-responsive.
I managed to narrow down exection to a single point in the source code that execution haults on, however, I find it hard to believe that between two successive DBGP() statements execution haults.
Lines 637, 638, and 639 of skge.c:
637 DBGP("chunk: %d\n",chunk);
638 skge_ramset(adapter, rxqaddr[port], ram_addr, chunk);
639 DBGP("here 0003\n");
and, subsequently, the first few lines in the definition of skge_ramset:
415 static void skge_ramset(struct skge_adapter *hw, u16 q, u32 start, size_t len) {
416 u32 end;
417 DBGP("skge_ramset - start");
Thus, I'm very concerned by the fact that the output does *NOT* look like this...
[...]
chunk: 8454917
skge_ramset - start
[...]
As these are basicly subsequent lines of execution (ignoring the function devision).
Also, i sent an email to gsoc-mentors-2009 today; The contents of which read (briefly) as:
when built as "make bin/skge.pxe DEBUG=skge:7", output is http://pxe.asdlkf.net/single.txt, and execution haults
when built as "make bin/skge--rtl8139.pxe DEBUG=skge:7" and execution ... works?
June 26:
June 27:
Looking more closely into the output of the two commands run at the end of the 25th (the 2 different versions of building skge), the mentor email list pointed me towards "vaddr" being 0. This does absolutly make sense as a probable cause for a crash.
I will look more directly into following this variable in both versions of the executible build as soon as my meeting is over today.
... Later that day...
So, vaddr was just a symptom.
I spent about 90 minutes with MCB30 and AndyTim trying to diagnose what was going on; In the end, one specific line stood out:
netdev = alloc_etherdev (sizeof (*adapter));
Hmm...... It COULD have something to do with the fact that netdev is being allocated the size of *adapter...
/GROAN
Ok, so, better. Things appear to be running smoothly.
June 28: Taken off, Helping a friend paint a wall. Then watching it dry. Then painting it again.
-- Chris