This is an old revision of the document!
June 22: I started out the evening of the 22nd working on a debugging issue... I asked rwcr for some help and he gladly provided it. The issue of the night was debugging why open (or any other procedure) was not being called at all. rwcr showed me how to hook up to to a gdbstub with udp debugging and I inserted an rtl8139b into my testing machine. After a while tracing variables and stepping into procedures, I found (again, with much help from rwcr) that the reference to netdevice that I was returning to the core was infact a reference to the net1 device, rather than the actual net0 device. Hens, my open (and subsequent functions) were not called because the core was calling the functions of another driver! After tracing through what I had written, i found that I was initializing netdevice[0] twice; once in probe, then again (unintentionally) in skge_initialize. Removed this instance, compiled, booted, and all of open's DBGP commands showed up on the screen. Yay. So, moving on, the current obstacle is that mac address appears to be broken again. I will figure out why this is broken again, however, I checked the rest of the variables being initialized in dbg and they all appear to check out! yay. I will be moving on to poll this week. June 23: Unfortunately, today I came home and saw that my computer was frozen trying to enumerate it's RAM. After some basic troubleshooting and memtest x86, i found that my main desktop machine has some faulty ram in it. Hopefully I'll be able to replace the ram in store tomorrow... 4x 2GB sticks of corsair dominator 1066 isn't cheap... All I'll be able to get done today (well, the rest of today) is finish this journal post. However, tonight I initialized a complete build/test environment in my laptop and downloaded my git tree there. Hopefully this will allow me to continue to work and minimize my down time to the time it took me to setup this laptop. June 24: Everything is fixed! My computer(s) are back online and ready to go. Today I began by tracing attempting to trace out the source of the memory error I'm experiencing.\\ I started with inserting several DEBP messages to follow program flow. Unfortunately after several hours, I've made no "actual" progress. June 25: I have made some headway, but still have not resolved the memory issue. This is what the code is executing as:\\ <code> skge_probe - start skge_probe - middle: addr 0xde258b10 irq 10 chip 0x0 rev 0 ll_addr[i]: 00:21:91:91:10:6dskge_initialize - start skge_perform_software_reset() initialize -> removing error bits skge_enable_test_mode - 1 skge_enable_test_mode - 0 chip id: 177 chip id: 10 chip id: 176 chip id: 178 initialize -> chip id: MARV: 0xb1 initialize -> ram_size : 65536 initialize -> ram_offset: 0 initialize -> wasn't genesis initialize -> Clearing error bits initialize -> Performing reset initialize -> Stopping card initialize -> Turning LED on initialize -> Enabling arbiter initialize -> Setting timeout init values initialize -> Setting clock values skge_usecs2clk hwkhz initialize -> Resetting each port yukon_reset start yukon_reset end skge initialize - end port: 0skge_probe - end - return 0 gPXE 0.9.7+ -- Open Source Boot Firmware -- http://etherboot.org Features: HTTP DNS TFTP AoE iSCSI bzImage COMBOOT ELF Multiboot PXE PXEXT skge_open skge net0: enabling interface skge_ring_alloc 1 skge_ring_alloc 2 skge_ring_alloc 3 - ring->start = 97264 skge_ring_alloc 3 - ring->count = 6 skge_ring_alloc 3 - vaddr = 0 skge_ring_alloc 3 - i = 0 skge_ring_alloc 3 - e = 97264 skge_ring_alloc 3 - d = 0 skge_ring_alloc 3 - i = 1 skge_ring_alloc 3 - e = 97284 skge_ring_alloc 3 - d = 32 skge_ring_alloc 3 - i = 2 skge_ring_alloc 3 - e = 97304 skge_ring_alloc 3 - d = 64 skge_ring_alloc 3 - i = 3 skge_ring_alloc 3 - e = 97324 skge_ring_alloc 3 - d = 96 skge_ring_alloc 3 - i = 4 skge_ring_alloc 3 - e = 97344 skge_ring_alloc 3 - d = 128 skge_ring_alloc 3 - i = 5 skge_ring_alloc 3 - e = 97364 skge_ring_alloc 3 - d = 160 skge_ring_alloc 4 skge_ring_alloc 5 Function: skge_rx_fill - Function: skge_rx_fill - end here 0009skge_ring_alloc 1 skge_ring_alloc 2 skge_ring_alloc 3 - ring->start = 97392 skge_ring_alloc 3 - ring->count = 6 skge_ring_alloc 3 - vaddr = 192 skge_ring_alloc 3 - i = 0 skge_ring_alloc 3 - e = 97392 skge_ring_alloc 3 - d = 192 skge_ring_alloc 3 - i = 1 skge_ring_alloc 3 - e = 97412 skge_ring_alloc 3 - d = 224 skge_ring_alloc 3 - i = 2 skge_ring_alloc 3 - e = 97432 skge_ring_alloc 3 - d = 256 skge_ring_alloc 3 - i = 3 skge_ring_alloc 3 - e = 97452 skge_ring_alloc 3 - d = 288 skge_ring_alloc 3 - i = 4 skge_ring_alloc 3 - e = 97472 skge_ring_alloc 3 - d = 320 skge_ring_alloc 3 - i = 5 skge_ring_alloc 3 - e = 97492 skge_ring_alloc 3 - d = 352 skge_ring_alloc 4 skge_ring_alloc 5 here 0011yukon_mac_init - Not Yukon Lite - Not Yukon Lite - Autoneg disabled - half duplex Function: yukon_init - start skge à: phy read timeout port 0 reg 0 val 0 Function: yukon_init - end yukon_mac_init - endhere 0002 adapter : 96796 rxqaddr : 82636 port : 0 ram_addr: -65281 chunk: 8454017 </code> At this point (directly following the "chunk" execution, execution haults and the system becomes non-responsive. I managed to narrow down exection to a single point in the source code that execution haults on, however, I find it hard to believe that between two successive DBGP() statements execution haults. Lines 637, 638, and 639 of skge.c: <code> 637 DBGP("chunk: %d\n",chunk); 638 skge_ramset(adapter, rxqaddr[port], ram_addr, chunk); 639 DBGP("here 0003\n"); </code> and, subsequently, the first few lines in the definition of skge_ramset: <code> 415 static void skge_ramset(struct skge_adapter *hw, u16 q, u32 start, size_t len) { 416 u32 end; 417 DBGP("skge_ramset - start"); </code> Thus, I'm very concerned by the fact that the output does *NOT* look like this... <code> [...] chunk: 8454917 skge_ramset - start [...] </code> As these are basicly subsequent lines of execution (ignoring the function devision). Also, i sent an email to gsoc-mentors-2009 today; The contents of which read (briefly) as: <code> when built as "make bin/skge.pxe DEBUG=skge:7", output is http://pxe.asdlkf.net/single.txt, and execution haults when built as "make bin/skge--rtl8139.pxe DEBUG=skge:7" and execution ... works? </code> June 26: <day taken off, friends over> June 27: Looking more closely into the output of the two commands run at the end of the 25th (the 2 different versions of building skge), the mentor email list pointed me towards "vaddr" being 0. This does absolutly make sense as a probable cause for a crash. I will look more directly into following this variable in both versions of the executible build as soon as my meeting is over today. -- Chris