This is an old revision of the document!


Joshua Oreman: 802.11 wireless development

Journal Week 12

I can hardly believe there's only a week left of SoC. It's been a wonderful experience working with such talented developers, and I hope my coursework in the fall will leave me enough time to continue contributing :-)

Also, I believe this IRC message needs to find a permanent record here:

10:45 <     mcb30> At the point you're talking about, the system is not fully initialised.  On many systems, the memory map
                   is not yet valid.  If running normal BIOS-level code is marked with "Here be dragons", running during POST
                   is marked with "Here be huge, ugly, vindictive, sociopathic dragons with a mean sense of humour"

Well put indeed!

Monday, 10 August

Not too much gPXE work today. I pushed a cleaned-up version of the large-ROM fix from my ath5k branch to staging as bigrom-oremanj (following the new staging tree protocol). A suggestion by Michael for making some of the condition checks for overflow more intuitive revealed the rather surprising fact that bit-shifting in C by more places than the size of the variable is undefined; on gcc-x86, 1ul « var when var is 32 will be not zero but one! This led to a small-scale audit of variable-amount bitshifts in the gPXE source, but I didn't find any code that would cause problems with this undefined behavior.

I received a new e1000 card, and was able to use it to restore the flash on the old one following a procedure that I've outlined on the ROM burning page. The issue was indeed one of option ROM overflow; gPXE loads to segment CC00, meaning it has exactly 80kB of ROM space on my test system. The ROM that had caused trouble was about 90kB.

I found a regression in the 802.11 code caused by recent changes to process_add() to ensure the same process is not added twice. The changes assume that all callers use process_init_stopped() to initialize all fields of the process structure, instead of setting just step and refcnt manually (which has worked fine in the past). The 802.11 code used the later method, and now does not start the association process at all. I pushed a two-line fix to staging as wiprocfix, and it probably will be merged tomorrow.

I rebased my linker branch against recent changes and pushed it to staging.

I updated my firmware branch to use the new symbol requirement macros defined in linker, and pushed it to my personal repository as firmware-pretty. It will go to staging after linker is merged, since it depends on the macros in linker.

Priorities for the rest of the week:

  • Write a page for 802.11 users and a page for driver developers
  • Post a brain-dump of the 802.11 knowledge I've gained working on this project (about halfway done writing it)
  • Once linker is merged, rebase and push firmware and wireless branches
  • Start working on flash-stub large ROM idea

Regarding the last bullet point, I think I'm going to try using the PCI ROM BAR before device-specific flash code. Video cards almost universally have very large memory regions compared to a typical flash size, so it should be easy enough to look for a BAR larger than the flash size, disable it, and map the flash in its place for long enough to copy its contents to RAM. Disabling the BAR doesn't affect the card's internal operation, so as long as we don't output anything while the flash is mapped this method should work. (If anyone reading this knows something I don't about PCI architecture and can see that this is a stupid idea, please let me know.)

Tuesday, 11 August

Wrote some documentation for users of the 802.11 code and driver writers.

Updated wireless branch to cope with a quirk of Linksys routers' WPA support; they don't accept 4-Way Handshake packets that include the optional capabilities field. Since we don't advertise any capabilities, there's no reason to include the field, so don't.

Updated firmware branch to clean up the makefile changes a bit.

Fixed my wiprocfix fix to properly set the reference count for the process object (no behavioral change, but the correctness is more intuitive) and saw it merged. bigrom-oremanj was also merged.

Rebased linker in staging against recent changes to mainline, and then wireless and firmware in my personal repository against linker.

Thought about future driver support; it seems that the combination of b43, ath9k, and iwlwifi should support almost all currently-unsupported cards in common use. Each of these drivers is over 20,000 lines of C in the Linux kernel, though, so this won't be an easy task.

Wednesday, 12 August

Mostly worked on the ROM-from-PCI loader, which can be found as branch xrom in my personal repository for the curious. I doubt I'll be cleaning this up for mainline, as there seems to be no really safe way of doing it, and newer systems with PCI3.0 and PMM get the benefit for free.

Thursday, 13 August

Wrote some implementation notes for the wireless code.

Updated branch linker in staging to add the line number to symbols generated by REQUIRE_SYMBOL(foo), so that they now look like e.g. __require_foo_47. This fixes an issue with REQUIRE_OBJECT() multiple times in the same file (e.g. with both GDBUDP and GDBSERIAL defined); now that the symbols that macro introduces are initialized data rather than common, the compiler refuses to allow two with the same name.

Discovered that a stock e1000 gPXE ROM does not work on my development system (very recent BIOS, PCI3.0/BBS/PMM). It seems the BIOS will refuse to hand out 1MB or more at a time using PMM, and since gPXE keeps requesting larger allocations until it gets one aligned to 2MB, gPXE doesn't use PMM at all. The ROM is 70656 bytes, and it's relocated 71680 bytes below the end of option ROM space. Loading gPXE from the POST-time prompt works fine (except that e820 is not yet available there on my system); loading it as a boot device freezes immediately after “gPXE starting execution…”, and I get garbage onscreen after 10 seconds or so. I suspect some other card is trampling on our tail and hopelessly confusing the decompressor.

It turns out my test system exhibits the same underlying problem (BIOS won't give out 1MB or more via PMM); it just has slightly more option ROM space. When I tried to test a fix that would accept A20-set allocations if the BIOS had set up the A20 line properly, I managed to brick my e1000 again. Forty minutes of shuffling around PCI cards later, I fixed things, and verified that the state of the A20 line during POST is no indication of the state of A20 when our BEV or int19h is called. The BIOS disables it before booting.

Fix that would allow us to use PMM (and thus larger ROMs) on such limited BIOSes: accept any PMM buffer address, and set the A20 gate ourselves in the BEV. The code to do this is rather messy, though, and might not be worth it.

Also discovered a small bug in src/arch/i386/firmware/pcbios/gateA20.c:

#define A20_KBC_RETRIES         (2^21)

“You keep on using that operator. I do not think it means what you think it means.” :-)

Friday, 14 August

On my test system, if we make use of a PMM buffer with A20 set, we don't even get to the BEV entry point to have a chance to set the A20 gate up properly. Adding a

ljmp    $0xf000, $0xfff0

immediately after bev_entry:, which reboots the system at that point on a PMMless gPXE, does not prevent the freeze. There may be a subtler issue here.

Started taking a look at the Linux b43 (Broadcom wireless) driver. It's quite well-written and -commented, especially for a reverse-engineered driver, but the hardware is really a mess. Some models have the 30-bit DMA restriction Stefan dealt with during his SoC last year. The hardware uses an SSB interface, which seems to be on the level of a whole different bus bridged to PCI. And then there's this line:

        err = request_firmware(&blob, ctx->fwname, ctx->dev->dev->dev);

dev→dev→dev? Seriously? :-)


QR Code
QR Code soc:2009:oremanj:journal:week12 (generated for current page)