Joshua Oreman: 802.11 wireless development

Journal Week 3

Monday, 8 June


After a long day of head-scratching and bug-squashing, I've gotten the wireless to work in my test system with no apparent problems. :-)

First, the commits:

Also, the obligatory screenshot: gPXE loading a script over 802.11

I've successfully loaded both the “smoke test” script in that screenshot, and the more rigorous gtest.gpxe that loads tomsrtbt. Load time for the latter was about 57 seconds from “dhcp net0” to “Uncompressing Linux…”, including the time it took for me to type the “chain” command. Compared to 47 seconds on the wired rtl8139 NIC in the same box, which didn't have to go through a 5-second network probe sequence, that's not too bad.

The DMA issue I noted on Sunday turned out to be more or less a false alarm. Upon checking the Linux driver I discovered that it simply ignores those DMA_FAIL errors (drops the packet with no record thereof), and when I modified my code to do the same (and fixed a bunch of bugs in the MAC layer), everything was able to work. It may be that we're suffering in performance due to all the errors, and that they could be fixed by some configuration setting, but as Realtek won't give up a datasheet and no other open-source OS has drivers I could compare with, this is the best I can do for now. My working hypothesis is that DMA_FAIL might have been overloaded to apply also to “the radio gave me garbage for this one” scenarios.

Things left to do on 802.11 include:

  • [easy] Provide CTS protection if the network asks for it; required for 802.11 g/b interoperability. Done 6/12
  • [easy] If we don't get a response to our assoc/auth packets within a few seconds during the association phase, resend them. Done 6/15
  • [easy, added 6/10] If we're running without a user-specified SSID, and we can't associate with the best-signal network due to e.g. encryption, try the others. Possibly extend this to trying others if we get a DHCP that doesn't provide PXE options. Decided 6/15 not to do; it would cause network probe to use significantly more resources, without benefiting the typical gPXE user. The minimal automatic functionality we have now (associate with strongest-signal if SSID is blank) should be enough.
  • [easy] Provide command-line facilities to see what the card's doing, or extend `ifstat' to show it. (Channel, BSSID, signal strength, etc; this stuff can be very useful for network debugging.) Possibly provide a facility to scan for all networks (one way of doing that is prototyped in net80211.h, but I haven't implemented it).
  • [easy] Include error message tables for the 802.11 status and reason codes. These would be rather big, but “association denied - status 12” is of no help to anyone who doesn't have the 802.11 standard on hand. Perhaps we should just teach them to gpxebot, and/or figure out a way to encode them into the 32-bit return code space. Error encoding done 6/12
  • [easy, added 6/12] Clean up status displays, so the prompt doesn't appear in the middle of the association message after an ifopen or SSID set.
  • [medium] Provide some means of intelligent rate control (decrease rate if we're dropping a lot of packets, cautiously increase it if things seem to be going swimmingly).
  • [hard] Encryption.
  • [hard] Driver for ath5k cards (very common, very powerful chipset with a cooperative manufacturer). Maybe more drivers than that if I have time, but ath5k is quite a bit more complicated than rtl8180/8185.

It probably goes without saying, but I'm really excited about this. :-) I probably won't be doing much work on gPXE tomorrow, because I have to catch up on some other things and I'd like to see some other people successfully test this before I forge boldly ahead.

Tuesday-Wednesday, 9-10 June

No commits - I had some non-gPXE things I had to catch up on, and was doing setup for later work. I now have a kernel and hostapd configuration that will let me configure my development box with all sorts of strange wireless setups, so I can test them under gPXE. This will come in especially handy for encryption.

I think I'll try and knock off the above bunch of “easy” tasks tomorrow, pending discussion with mdc and/or mcb30.

Thursday, 11 June

I was planning to work on some of the “easy” tasks above, but I realized another one not on that list that, while painful, is significantly less so done now than it would be at the end of the summer. In our last weekly meeting, mcb30 mentioned the need of having proper Doxygen-style comments on all core networking code.

So that was most of what I did today. 851 lines' worth of documentation. My wrists are rather sore now…

Hopefully tomorrow I can actually work on the coding side of things :-)

Friday, 12 June

Today was a bugfix and minor enhancement day.

Marty got his order of rtl8185 cards delivered, so he was able to test my code thus far. His AP handled things a little differently than mine did, revealing a couple of bugs in my code (see commits below), but after I fixed those he was able to associate, do DHCP and load smoke-test.gpxe over the wireless. He initially attempted to load gtest.gpxe immediately after the smoke-test; it and its constituent images loaded fine, but did not boot cleanly due to some confusion with the earlier loaded smoke-test.gpxe script. The “boot” command in gtest failed based on an ambiguity, and when boot bz2bzImage was specified manually, the kernel could not mount its initrd. This is not a wireless issue; testing over a wired network reveals the exact same behavior. `imgstat' is revealing:

smoke-test.gpxe: 71 bytes [script] [LOADED] ""
bz2bzImage: 829673 bytes [bzImage] [LOADED] "root=100"
initrd.bz2: 880411 bytes ""
gtest.gpxe: 111 bytes [script] [LOADED] ""

It seems that scripts are retained as images, but as though they were loaded after the commands they executed. Since smoke-test.gpxe is first in the list, it looks like it was being passed to the kernel as an initrd before the real initrd, and the kernel tried to interpret the gPXE script as a filesystem. I doubt this is the desired behavior, but perhaps Michael could clarify things?

Commits today:

With the settings applicator in place, you can now do something like

gPXE> set net0/ssid meh_wireless
gPXE> ifopen net0
[802.11 associating... gPXE> ok, meh_wireless]
set net0/ssid NETGEAR
[802.11 associating... gPXE> ok, NETGEAR]

Re-running DHCP if necessary is the responsibility of the user, as I believe it should be. The fact that the prompt shows up halfway through the association is an ugly-looking consequence of the fact that association runs asynchronously. Probably it would be better done using the “monojob” interface and a wait; I've added that to the list above.

For other remaining “easy” items on my to-do list, involving intelligent retries and command-line-accessible state, I'll want to talk with Michael to determine the best way to integrate them into the existing way of things, if indeed they're worth doing at all.

I've also started thinking about how to handle encryption elegantly; I'll post a description here once I run it past the mentors in our meeting tomorrow.

Saturday, 13 June

Meeting minutes:

  • Michael couldn't make it to most of the meeting, but he sent an email prior. I need to start preparing my changes to be merged into mainline, using a separate branch with properly-separated commits. Debugging messages should be done using the gPXE standard DBGC, and the “[802.11 associating… ok, <network>]” message should not be enabled by non-debugging default.
  • All the items from last week's meeting have been addressed.
  • Need to send an email to etherboot-developers about the image strangeness observed yesterday.
  • Need to allow imgfree with an argument, at least in cases (scripts) where this can be safely done.
  • Noted that the best places to start for 802.11 knowledge are the Linux code (in net/mac80211/) and the 802.11 spec from IEEE (see my notes page).
  • Discussed possible handling of required firmware files for b43 and similar cards: use an embedded image and document distribution restrictions.
  • Need to do some performance testing.
  • Marty had some iSCSI issues in testing. iSCSI wouldn't autoboot - this seems to be down to the timing and asynchronous nature of association - and when manually booted a “scandisk” call eventually seemed to stall. I'll test this on my own setup where I can sniff the wire, and see what's happening with the scandisk problem.

Priorities for this coming week: debugging cleanup, mainline cleanup, iSCSI testing, rate control.

I think it might be a very good idea to change association to run synchronously, using the monojob interface. I've already seen a few issues that were caused by the asynchronous process being interleaved with some activity that should strictly follow it. Need to get Michael's input on this.

I'll have a video card arriving on Monday, so I can update my kernel and use the host-AP and packet injection functionality of my card.

Also, Marty was kind enough to send me a WRT54GL :-) so I should be able to get more testing done with that.

QR Code
QR Code soc:2009:oremanj:journal:week3 (generated for current page)