It's ALIIIIIVE!
After a long day of head-scratching and bug-squashing, I've gotten the wireless to work in my test system with no apparent problems.
First, the commits:
Also, the obligatory screenshot:
I've successfully loaded both the “smoke test” script in that screenshot, and the more rigorous gtest.gpxe that loads tomsrtbt. Load time for the latter was about 57 seconds from “dhcp net0” to “Uncompressing Linux…”, including the time it took for me to type the “chain” command. Compared to 47 seconds on the wired rtl8139 NIC in the same box, which didn't have to go through a 5-second network probe sequence, that's not too bad.
The DMA issue I noted on Sunday turned out to be more or less a false alarm. Upon checking the Linux driver I discovered that it simply ignores those DMA_FAIL errors (drops the packet with no record thereof), and when I modified my code to do the same (and fixed a bunch of bugs in the MAC layer), everything was able to work. It may be that we're suffering in performance due to all the errors, and that they could be fixed by some configuration setting, but as Realtek won't give up a datasheet and no other open-source OS has drivers I could compare with, this is the best I can do for now. My working hypothesis is that DMA_FAIL might have been overloaded to apply also to “the radio gave me garbage for this one” scenarios.
Things left to do on 802.11 include:
It probably goes without saying, but I'm really excited about this. I probably won't be doing much work on gPXE tomorrow, because I have to catch up on some other things and I'd like to see some other people successfully test this before I forge boldly ahead.
No commits - I had some non-gPXE things I had to catch up on, and was doing setup for later work. I now have a kernel and hostapd configuration that will let me configure my development box with all sorts of strange wireless setups, so I can test them under gPXE. This will come in especially handy for encryption.
I think I'll try and knock off the above bunch of “easy” tasks tomorrow, pending discussion with mdc and/or mcb30.
I was planning to work on some of the “easy” tasks above, but I realized another one not on that list that, while painful, is significantly less so done now than it would be at the end of the summer. In our last weekly meeting, mcb30 mentioned the need of having proper Doxygen-style comments on all core networking code.
So that was most of what I did today. 851 lines' worth of documentation. My wrists are rather sore now…
Hopefully tomorrow I can actually work on the coding side of things
Today was a bugfix and minor enhancement day.
Marty got his order of rtl8185 cards delivered, so he was able to test my code thus far. His AP handled things a little differently than mine did, revealing a couple of bugs in my code (see commits below), but after I fixed those he was able to associate, do DHCP and load smoke-test.gpxe over the wireless. He initially attempted to load gtest.gpxe immediately after the smoke-test; it and its constituent images loaded fine, but did not boot cleanly due to some confusion with the earlier loaded smoke-test.gpxe script. The “boot” command in gtest failed based on an ambiguity, and when boot bz2bzImage was specified manually, the kernel could not mount its initrd. This is not a wireless issue; testing over a wired network reveals the exact same behavior. `imgstat' is revealing:
smoke-test.gpxe: 71 bytes [script] [LOADED] "" bz2bzImage: 829673 bytes [bzImage] [LOADED] "root=100" initrd.bz2: 880411 bytes "" gtest.gpxe: 111 bytes [script] [LOADED] ""
It seems that scripts are retained as images, but as though they were loaded after the commands they executed. Since smoke-test.gpxe is first in the list, it looks like it was being passed to the kernel as an initrd before the real initrd, and the kernel tried to interpret the gPXE script as a filesystem. I doubt this is the desired behavior, but perhaps Michael could clarify things?
Commits today:
With the settings applicator in place, you can now do something like
gPXE> set net0/ssid meh_wireless gPXE> ifopen net0 [802.11 associating... gPXE> ok, meh_wireless] set net0/ssid NETGEAR [802.11 associating... gPXE> ok, NETGEAR]
Re-running DHCP if necessary is the responsibility of the user, as I believe it should be. The fact that the prompt shows up halfway through the association is an ugly-looking consequence of the fact that association runs asynchronously. Probably it would be better done using the “monojob” interface and a wait; I've added that to the list above.
For other remaining “easy” items on my to-do list, involving intelligent retries and command-line-accessible state, I'll want to talk with Michael to determine the best way to integrate them into the existing way of things, if indeed they're worth doing at all.
I've also started thinking about how to handle encryption elegantly; I'll post a description here once I run it past the mentors in our meeting tomorrow.
Meeting minutes:
Priorities for this coming week: debugging cleanup, mainline cleanup, iSCSI testing, rate control.
I think it might be a very good idea to change association to run synchronously, using the monojob interface. I've already seen a few issues that were caused by the asynchronous process being interleaved with some activity that should strictly follow it. Need to get Michael's input on this.
I'll have a video card arriving on Monday, so I can update my kernel and use the host-AP and packet injection functionality of my card.
Also, Marty was kind enough to send me a WRT54GL so I should be able to get more testing done with that.