Joshua Oreman: 802.11 wireless development

Journal Week 11

Monday, 3 August

We have a mailing list regular who's interested in using gPXE on OS X. Since cross-gcc is very strict about printf specifiers, this led to several patches to fix compilation under both i386 and x86_64. Also, the recent “startpxe” command addition broke EFI builds by unconditionally dragging in real-mode UNDI code; fixed by making PXE_CMD a default only for pcbios builds.

Also, my sky2 driver (used by many Macs) has been merged:

I updated the Building on OS X page to reflect some suggestions and the new driver availability.

Before SoC, I submitted a patch to enable debugging over FireWire, but it was very ad-hoc and somewhat ugly (the user had to enter an address displayed by gPXE into the program that would try to connect over FireWire). Since gPXE is loaded in high memory, both on pcbios and EFI architectures, it's infeasible to scan through memory (the only thing a FireWire client can do, since we use the physical-DMA interface) to find anything. We have no idea how much memory is installed, and even over FireWire, 2GB to 4GB takes a long time to scan through.

To solve this problem in a hopefully generic way, I've implemented a function umalloc_low() to allocate memory that is guaranteed to fall below 640k. On EFI, we can allocate EfiConventionalMemory through a boot services call; on pcbios, though, the only segment that's safe to use is the one we've already taken up with our 16-bit text and data. Thus, on pcbios I implemented umalloc_low() like malloc(), allocating data out of a heap in BSS; the only difference is that the heap is linked into the bss16, i.e. low memory. Also, because the expected usage pattern involves a persistent need to interface with something, there is no ufree_low(); memory allocated is kept until gPXE shuts down. This lets the allocator itself be extremely tiny.

For the FireWire side of the equation, I decided on the concept of a “portal structure” aligned to a 16-byte boundary within low memory. It contains 8 bytes worth of magic, and fields “request” “reply” and “address”, that a debugging host can use to connect to some FireWire-accessible service and gain access to a service-specific communication structure (containing e.g. ring buffers and state fields). It's implemented in a way that avoids races if multiple debugging hosts try to connect at the same time (which is probably overkill, but it's the Right Thing). Currently I've implemented three services over the FireWire debug link, two of which are broadly useful on machines that don't have a serial port:

  • GDB over FireWire (gdbfire), with a host-side utility program firegdb (used to be firebug, but the name's already taken by a popular Firefox extension) that can either connect GDB automatically for you or listen for TCP debugger connections and proxy them over the FireWire link;
  • Console over FireWire (fwconsole), with a host-side utility program fireconsole (compiled from the same source as firebug due to the high level of similarity between the two) that acts as a simple interactive terminal emulator, optionally printing all of gPXE's output to a local file; and
  • File transfer over FireWire (fwload), which I wrote for my own use while developing sky2; I wanted a way to load a new gPXE quickly onto a machine whose only supported booting mechanism was a CD-ROM drive. I burned a version of gPXE with fwload support onto a CD, booted off it, and chained the gPXE I wanted to test over FireWire. I don't expect this will be generally useful.

It should be easy enough to do other things (IP over FireWire, anyone?) if people feel the need for them. :-)

Tuesday, 4 August

Spent most of today figuring out the idiosyncracies of the linker as it relates to gPXE.

Currently we have two macros for managing “I want to pull in this object”: PROVIDE_SYMBOL() and REQUIRE_SYMBOL(). (REQUIRE_OBJECT() just does REQUIRE_SYMBOL() on a special obj_objectname symbol provided automatically by compiler.h for each object.) I discovered recently that REQUIRE_OBJECT() doesn't actually require anything; it will pull in the object if it exists, but if not a linker error will not be produced. While this can be useful behavior, it doesn't fit the semantics of the word “require”.

In effect, REQUIRE_SYMBOL(foo) generates assembler code like the following:

        .equ    __need_foo, foo

That creates an absolute (not directly associated with a piece of memory) symbol called __need_foo whose value is that of the symbol foo. Since foo is not defined in the same file, that creates an entry in the symbol table for the file it shows up in marking foo as undefined. The linker will try to resolve such references at link-time, and searches through all the gPXE object files for a symbol named foo. If it finds one, that object file gets pulled into the link and its functionality will be available to gPXE at runtime.

However, an interesting thing about the above line may have occurred to you. The special __need_foo absolute symbol is never actually used. If foo remains undefined despite the linker's searching, __need_foo will be undefined too… but so what? There's no reason for the linker to stop linking just because a symbol is undefined, if it's not going to impact the code. When it runs, gPXE doesn't even have the symbol table; why would it matter what's in it?

The replacement for REQUIRE_SYMBOL(foo) (the old behavior has been renamed to REQUEST_SYMBOL(foo)) should clarify:

  extern char foo;
  static char * __require_foo __attribute__ (( section ( ".discard" ), used )) = &foo;

This doesn't just define an absolute symbol; it defines a global variable, a symbol with storage space attached, that stores the value of symbol foo. (Symbol values to the linker are like variable addresses to the compiler.) The variable (__require_foo in this case) is placed in a special output section, .discard, which we can tell the linker to throw away in the final linking stage (so that the result doesn't take up precious bytes in the final gPXE). It's marked used so the compiler doesn't throw it away thinking it's never used. This time, when the linker goes to resolve its undefined symbols and can't find any foo, it'll notice there's a relocation on it—an instruction to the linker that says “I don't know what the address of foo is yet, because it's not in this file; when you figure it out, please put it here in the variable __require_foo.” The same sort of thing is used when you call an external function; the linker knows how to interpret the machine code and change the address being jumped to. And when there's a relocation the linker can't satisfy, it has to refuse to link the program, since its execution without part of its code or data set properly would be undefined. Thus, this formulation of REQUIRE_SYMBOL() really requires.

I've also added macros EXPORT_SYMBOL() and IMPORT_SYMBOL(), that can be used for REQUEST_SYMBOL()-like behavior in cases where you actually want to use the symbol being requested. It's necessary for some cooperation from the file providing the symbol (saying EXPORT_SYMBOL(symname)), because there's no way to do it otherwise (it'd be necessary for the same undefined symbol to be both strong and weak, which is impossible). I leave it to the curious to look at the code to see how these work :-)

Finally, I spent several hours working on a desirable functionality called REQUIRE_IF()—pull in one object file only if another is already being compiled in. This could be used for “pull in WEP if 802.11 is compiled in”, “pull in undiheader if undiprefix is compiled in”, etc. Unfortunately, the limitations of linker script syntax and a particularly braindead way of handling undefined symbols (refusing to search libraries for them) combine to make the only possible solution I could find extremely ugly. If I do wind up implementing it, the gory details will wind up on this page, but I'm hoping we can agree to use a simpler method requiring slightly more human intervention. :-)

I'm not going to push these changes to staging until we have a solution to the REQUIRE_IF() fiasco, but here's the commit so far:

Wednesday, August 5

Back to debugging ath5k…

I solved the problem I was having with ath5k: I was processing the status bits in a way that is only suitable for interrupt-driven use of the card. Making a fairly obvious fix (removing the “is interrupt pending” check) allows it to work in polling mode. With this, I'm able to scan for networks and associate with a WPA network, including both sending and receiving packets.

Unfortunately, there's a memory corruption bug somewhere, of an extremely difficult-to-track variety. When I DHCP with my neighbor's network, gPXE locks up after receiving the DHCPOFFER. It appears that it's the card's DMA that's doing the corruption, as the data structures I set up to track things (in hopes of reading them out over FireWire after the lockup) get zeroed without triggering gdb watchpoints. I added a check that every RX buffer was exactly 2400 bytes long and formatted like an io_buffer (*(bufbase + 2408) == bufbase) and it never failed.

Incidentally, my debugging led to a rather nifty use for FireWire: with -finstrument-functions (which asks gcc to insert calls to special functions at the beginning and end of every function) and a very small amount of code, one can keep track of function calls and wind up with a backtrace on demand, even after a lockup (as long as it doesn't zero out memory like this one is!) [The reboot-variety of crash is already trappable with gdb.] It's not always perfect due to inline functions and optimizations, but it works very well. Take a look:

  #0  in 0x242a7 <bios_putchar>
  #1  0x2d5be <putchar+3a> at /home/oremanj/dev/gpxe/src/core/console.c:28
  #2  0x16321 <printf_putchar+18> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:390
  #3  0x15ed3 <cputchar+1a> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:154
  #4  0x15fa7 <vcprintf+45> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:179
  #5  0x16192 <vprintf+29> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:405
  #6  0x161cb <printf+22> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:420
  #7  0x09943 <ath5k_hw_noise_floor_calibration+b9> at /home/oremanj/dev/gpxe/src/drivers/net/ath5k/ath5k_phy.c:1161
  #8  0x0c7cf <ath5k_hw_reset+d23> at /home/oremanj/dev/gpxe/src/drivers/net/ath5k/ath5k_reset.c:1122
  #9  0x00958 <ath5k_reset+4f> at /home/oremanj/dev/gpxe/src/drivers/net/ath5k/ath5k.c:1550
  #10  0x00be9 <ath5k_chan_set+9d> at /home/oremanj/dev/gpxe/src/drivers/net/ath5k/ath5k.c:708
  #11  0x0f4ed <ath5k_config+e9a1> at /home/oremanj/dev/gpxe/src/net/80211/net80211.c:1428
  #12  0x0f4ed <net80211_probe_step+f7> at /home/oremanj/dev/gpxe/src/net/80211/net80211.c:1428
  #13  0x1032e <net80211_step_associate+118> at /home/oremanj/dev/gpxe/src/net/80211/net80211.c:1756
  #14  0x13b22 <step+35> at /home/oremanj/dev/gpxe/src/core/process.c:79
  #15  0x2d60e <getchar+34> at /home/oremanj/dev/gpxe/src/core/console.c:104
  #16  0x2de2a <getkey+16> at /home/oremanj/dev/gpxe/src/core/getkey.c:67
  #17  0x3b423 <readline+83> at /home/oremanj/dev/gpxe/src/hci/readline.c:101
  #18  0x38c6c <shell+25> at /home/oremanj/dev/gpxe/src/hci/shell.c:96
  #19  0x2ed99 <main+d5> at /home/oremanj/dev/gpxe/src/core/main.c:90
  #20  0x2ccff <prot_to_real+??> at comboot_call.c:0

Strangely, -finstrument-functions causes gcc to report phantom “may be used uninitialized in this function” that don't show up without that option.

This is definitely a hack, and I don't think it'd be suitable for inclusion into the main tree as it is - too invasive - but it's cool to play around with. :-)

Thursday, 7 August

ath5k works! I still don't know what the problem, as none of the small changes I made should've fixed it, but when I rebased against git master the memory corruption went away. I have cleaned up my ath5k branch and pushed it to staging, and it's ready for mainline review.

While I was still trying to figure out the issue, I figured a malloc() error analysis might be helpful, so I added specially-formatted debugging statements to core/malloc.c that printed backtraces using the -finstrument-functions backtrace code I developed yesterday, and wrote a couple of small Perl scripts, one to look for alloc/free inconsistencies and print the backtraces for them, one to resolve addresses into function names and file line numbers (using the binutils program addr2line). The result was that I could capture gPXE console output and do

% ./util/gpxegrind.pl ../ath5k-memory.log | ./util/resolveaddr.pl | less -R

and get valgrind-style output about the locations of double frees, memory leaks, and so forth. It didn't help with ath5k, but it did let me catch a small memory leak in the net80211 code, which is included in the ath5k branch in staging. Currently this is very much a hack, but I'll work on cleaning it up for conditional use in mainline.

When compiling an ath5k ROM, I ran into a problem: the linker tries to put the uncompressed sector count into a one-byte field, expecting the compressor to subtract from it as necessary based on the compression achieved. If the uncompressed length is over 128k, the linker will complain of a truncated relocation, even if the compressor fixup would have made everything work. I've worked around this by adding a new type of fixup (ADDx) that adds the compressed length to a field, to complement the current SUBx fixup that subtracts the compression delta from a field. In my tests it worked fine for ROMs that would have worked under the old system; the larger ROM I tried to flash caused the adventure described below, but I don't know if that's the fault of its size.

After trying and failing to find a way to do REQUIRE_IF() using the linker table system, as suggested by mcb30, I have implemented the config/config_objname.o solution where objname.c will automatically REQUEST_OBJECT(config_objname). It seems to work well. I've pushed this and my earlier linker change to branch linker in staging, and it's also ready for review.

I received an e1000 NIC, and had the chance to do some wireless ROM tests. Using memdisk, I was able to flash a dual-driver e1000/rtl8185 gPXE, and boot off the wireless using it - huzzah! (The e1000 has a 128KB EEPROM, so it's particularly good for this.) Unfortunately, my flash of an ath5k ROM produced a card that would freeze the system during option ROM scanning by the BIOS (when gPXE normally prints the “Press Ctrl+B to configure…”). Three hours from installation to bricking… not so good. I'm not sure why the ath5k flash failed, but it may have to do with my use of an iSCSI DOS boot instead of the prior use of memdisk. It's possible that IBAUTIL does something strange that's not respectful of gPXE's low-memory state, and causes iSCSI reads through the int 13h interface to return corrupted. It's also possible, since this is the first ROM I flashed that required my uncompressed-size-over-128k modifications, that there's an issue (either architecturally or with gPXE's implementation) with pushing the envelope in that way.

I'd be able to reflash the e1000 easily enough if I could get into IBAUTIL, but since the system won't boot with it installed, that doesn't work. Since I don't have another ROMable card, the only way I could see of fixing it was plugging it into the PCI slot after the system had started. Yes, I did try it, and I was incredibly lucky not to fry either the card or the slot — folks, regular PCI cards are not meant to be hot-plugged. (It locked up the system and wouldn't recognize the card until I power-cycled.) I believe the solution to this one lies in getting another card, flashing it properly with a basic gPXE, putting it in a PCI slot that gets scanned before the bricked card, and using gPXE's “Press Ctrl+B to configure…” escape hatch to get IBAUTIL loaded before the bad option ROM gets scanned. IBAUTIL has an option to disable the flash.

Friday, 8 August

I discovered that my test system's video card takes up 55k after it initializes, meaning my 90k gPXE ROM would cause an option ROM overflow. Seems my test system's BIOS doesn't handle this gracefully. That's one mystery solved, though it doesn't make my e1000 any less bricked…

On Michael's suggestion, I implemented a generic means for placing variables in base memory, in much the manner of __data16 for pcbios builds (it actually uses __data16 if it can). On EFI it places the variables in a special section that is relocated into a freshly allocated bit of base memory at init time. This is used by the FireWire debug code to place its portal structure low, where the debugging host can scan for it. It's a much nicer mechanism than the umalloc_low() of my original implementation.

With that, the firewire branch is (again) ready to be reviewed:

I spent a while cleaning up my 802.11 crypto changes, and have created a branch for them and the iwmgmt commands using the new config_subsystem.o mechanism for object-dependent configuration:

I've separated out my EAP code (WPA Enterprise) into a different branch. It's currently useless without any EAP authentication methods implemented, but the structure is there if someone (perhaps me at a later date) wants to implement some.

Saturday, 8 August

Meeting today, the majority of which was spent discussing an idea I had for loading large ROMs in crowded option ROM environments: have a small ROM stub that loads the rest of the ROM to an area not subject to the 128k option ROM limit. There are several ways of implementing this:

  • My initial idea: program the PCI ROM BAR to map the full ROM to an area in high memory. This has some serious practical issues of the “where do you put it?” variety, because one would need to walk both the e820 memory map and the PCI bus to find an area not used by RAM or any other memory-mapped I/O device in the system. And there are some devices, such as the APICs, that don't show up in either.
    • On the other hand, it may be safe to do this by looking for a sufficiently PCI memory BAR mapping (e.g. video card), disabling that mapping while we access the ROM, and reenabling it later - bears trying, at least.
  • Michael's idea: use the NVS subsystem to access the flash directly, scan for a gPXE image embedded within it, expose it via int13h, and boot it. The only practical issue here is the fact that most supported NICs don't have an NVS driver. The result may also be rather larger than accessing the PCI BARs, but tiny code that doesn't work is useless.

This will be an interesting project to hack on over the next week. :-)

Things left to do this summer:

  • Make sure all relevant commits described on this page get merged or fixed to be mergeable;
  • Clean up the firmware branch to use new linker macros, and push to staging;
  • Document, document, document!
  • With remaining time, work on some of the dangling threads:
    • Flash stub
    • EAP
    • Something else?

QR Code
QR Code soc:2009:oremanj:journal:week11 (generated for current page)