This is an old revision of the document!


A PCRE internal error occured. This might be caused by a faulty plugin

====== Joshua Oreman: 802.11 wireless development ====== ===== Journal Week 11 ===== ==== Monday, 3 August ==== We have a mailing list regular who's interested in using gPXE on OS X. Since cross-gcc is very strict about printf specifiers, this led to several patches to fix compilation under both i386 and x86_64. Also, the recent "startpxe" command addition broke EFI builds by unconditionally dragging in real-mode UNDI code; fixed by making PXE_CMD a default only for pcbios builds. Also, my sky2 driver (used by many Macs) has been merged: * On **gPXE mainline**: * [[http://git.etherboot.org/?p=gpxe.git;a=commit;h=993f8ba9bcfad8e5f557c22b5382eca401d49966| [pci] Add definitions for Advanced Error Reporting registers]] * [[http://git.etherboot.org/?p=gpxe.git;a=commit;h=70f47e675e09271f66a2dfd44c4119d61b171258| [sky2] Add support for Marvell Yukon-II gigabit Ethernet cards]] I updated the [[:macbuild|Building on OS X]] page to reflect some suggestions and the new driver availability. Before SoC, I submitted a patch to enable debugging over FireWire, but it was very ad-hoc and somewhat ugly (the user had to enter an address displayed by gPXE into the program that would try to connect over FireWire). Since gPXE is loaded in high memory, both on pcbios and EFI architectures, it's infeasible to scan through memory (the only thing a FireWire client can do, since we use the physical-DMA interface) to find anything. We have no idea how much memory is installed, and even over FireWire, 2GB to 4GB takes a long time to scan through. To solve this problem in a hopefully generic way, I've implemented a function ''umalloc_low()'' to allocate memory that is guaranteed to fall below 640k. On EFI, we can allocate ''EfiConventionalMemory'' through a boot services call; on pcbios, though, the only segment that's safe to use is the one we've already taken up with our 16-bit text and data. Thus, on pcbios I implemented ''umalloc_low()'' like ''malloc()'', allocating data out of a heap in BSS; the only difference is that the heap is linked into the ''bss16'', i.e. low memory. Also, because the expected usage pattern involves a persistent need to interface with something, there is no ''ufree_low()''; memory allocated is kept until gPXE shuts down. This lets the allocator itself be extremely tiny. For the FireWire side of the equation, I decided on the concept of a "portal structure" aligned to a 16-byte boundary within low memory. It contains 8 bytes worth of magic, and fields "request" "reply" and "address", that a debugging host can use to connect to some FireWire-accessible service and gain access to a service-specific communication structure (containing e.g. ring buffers and state fields). It's implemented in a way that avoids races if multiple debugging hosts try to connect at the same time (which is probably overkill, but it's the Right Thing). Currently I've implemented three services over the FireWire debug link, two of which are broadly useful on machines that don't have a serial port: * GDB over FireWire (''gdbfire''), with a host-side utility program ''firegdb'' (used to be ''firebug'', but the name's already taken by a popular Firefox extension) that can either connect GDB automatically for you or listen for TCP debugger connections and proxy them over the FireWire link; * Console over FireWire (''fwconsole''), with a host-side utility program ''fireconsole'' (compiled from the same source as ''firebug'' due to the high level of similarity between the two) that acts as a simple interactive terminal emulator, optionally printing all of gPXE's output to a local file; and * File transfer over FireWire (''fwload''), which I wrote for my own use while developing ''sky2''; I wanted a way to load a new gPXE quickly onto a machine whose only supported booting mechanism was a CD-ROM drive. I burned a version of gPXE with ''fwload'' support onto a CD, booted off it, and chained the gPXE I wanted to test over FireWire. I don't expect this will be generally useful. It should be easy enough to do other things (IP over FireWire, anyone?) if people feel the need for them. :-) * On branch **firewire** in **gpxe-staging** (ready for mainline review): * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=db7f91149692fe021c58ccb83ccd3e27a3b6142b| [umalloc] Add umalloc_low() to allocate conventional memory]] * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=a1ff8e39cadffb0572d736551016a17a404dbe8d| [firewire] Add FireWire debug interface]] * On branch **fwtrans** in my personal repository: the above, plus * [[http://git.etherboot.org/?p=people/oremanj/gpxe.git;a=commit;h=40dd6ba9604c58a03c971188fac252ced515cde1| [firewire] Add file transfer over FireWire debug interface]] ==== Tuesday, 4 August ==== Spent most of today figuring out the idiosyncracies of the linker as it relates to gPXE. Currently we have two macros for managing "I want to pull in this object": ''PROVIDE_SYMBOL()'' and ''REQUIRE_SYMBOL()''. (''REQUIRE_OBJECT()'' just does ''REQUIRE_SYMBOL()'' on a special ''obj_objectname'' symbol provided automatically by ''compiler.h'' for each object.) I discovered recently that ''REQUIRE_OBJECT()'' doesn't actually require anything; it will pull in the object if it exists, but if not a linker error will not be produced. While this can be useful behavior, it doesn't fit the semantics of the word "require". In effect, ''REQUIRE_SYMBOL(foo)'' generates assembler code like the following: .equ __need_foo, foo That creates an absolute (not directly associated with a piece of memory) symbol called ''_''''_need_foo'' whose value is that of the symbol ''foo''. Since ''foo'' is not defined in the same file, that creates an entry in the symbol table for the file it shows up in marking ''foo'' as undefined. The linker will try to resolve such references at link-time, and searches through all the gPXE object files for a symbol named ''foo''. If it finds one, that object file gets pulled into the link and its functionality will be available to gPXE at runtime. However, an interesting thing about the above line may have occurred to you. The special ''_''''_need_foo'' absolute symbol is never actually //used//. If ''foo'' remains undefined despite the linker's searching, ''_''''_need_foo'' will be undefined too... but so what? There's no reason for the linker to stop linking just because a symbol is undefined, if it's not going to impact the code. When it runs, gPXE doesn't even have the symbol table; why would it matter what's in it? The replacement for ''REQUIRE_SYMBOL(foo)'' (the old behavior has been renamed to ''REQUEST_SYMBOL(foo)'') should clarify: extern char foo; static char * __require_foo __attribute__ (( section ( ".discard" ), used )) = &foo; This doesn't just define an absolute symbol; it defines a //global variable//, a symbol with storage space attached, that stores the value of symbol ''foo''. (Symbol values to the linker are like variable addresses to the compiler.) The variable (''_''''_require_foo'' in this case) is placed in a special output section, ''.discard'', which we can tell the linker to throw away in the final linking stage (so that the result doesn't take up precious bytes in the final gPXE). It's marked ''used'' so the compiler doesn't throw it away thinking it's never used. This time, when the linker goes to resolve its undefined symbols and can't find any ''foo'', it'll notice there's a //relocation// on it---an instruction to the linker that says "I don't know what the address of ''foo'' is yet, because it's not in this file; when you figure it out, please put it here in the variable ''_''''_require_foo''." The same sort of thing is used when you call an external function; the linker knows how to interpret the machine code and change the address being jumped to. And when there's a relocation the linker can't satisfy, it has to refuse to link the program, since its execution without part of its code or data set properly would be undefined. Thus, this formulation of ''REQUIRE_SYMBOL()'' really requires. I've also added macros ''EXPORT_SYMBOL()'' and ''IMPORT_SYMBOL()'', that can be used for ''REQUEST_SYMBOL()''-like behavior in cases where you actually want to use the symbol being requested. It's necessary for some cooperation from the file providing the symbol (saying ''EXPORT_SYMBOL(symname)''), because there's no way to do it otherwise (it'd be necessary for the same undefined symbol to be both strong and weak, which is impossible). I leave it to the curious to look at the code to see how these work :-) Finally, I spent several hours working on a desirable functionality called ''REQUIRE_IF()''---pull in one object file only if another is already being compiled in. This could be used for "pull in WEP if 802.11 is compiled in", "pull in undiheader if undiprefix is compiled in", etc. Unfortunately, the limitations of linker script syntax and a particularly braindead way of handling undefined symbols (refusing to search libraries for them) combine to make the only possible solution I could find extremely ugly. If I do wind up implementing it, the gory details will wind up on this page, but I'm hoping we can agree to use a simpler method requiring slightly more human intervention. :-) I'm not going to push these changes to staging until we have a solution to the ''REQUIRE_IF()'' fiasco, but here's the commit so far: * On branch **linker** in my personal repository: * [[http://git.etherboot.org/?p=people/oremanj/gpxe.git;a=commit;h=1d79217179cd9100cf319d4d5172e26e382c60c8| [linker] Expand and correct symbol requirement macros]] ==== Wednesday, August 5 ==== Back to debugging ath5k... I solved the problem I was having with ath5k: I was processing the status bits in a way that is only suitable for interrupt-driven use of the card. Making a fairly obvious fix (removing the "is interrupt pending" check) allows it to work in polling mode. With this, I'm able to scan for networks and associate with a WPA network, including both sending and receiving packets. Unfortunately, there's a memory corruption bug somewhere, of an extremely difficult-to-track variety. When I DHCP with my neighbor's network, gPXE locks up after receiving the DHCPOFFER. It appears that it's the card's DMA that's doing the corruption, as the data structures I set up to track things (in hopes of reading them out over FireWire after the lockup) get zeroed without triggering gdb watchpoints. I added a check that every RX buffer was exactly 2400 bytes long and formatted like an io_buffer (*(bufbase + 2408) == bufbase) and it never failed. Incidentally, my debugging led to a rather nifty use for FireWire: with -finstrument-functions (which asks gcc to insert calls to special functions at the beginning and end of every function) and a very small amount of code, one can keep track of function calls and wind up with a backtrace on demand, even after a lockup (as long as it doesn't zero out memory like this one is!) [The reboot-variety of crash is already trappable with gdb.] It's not always perfect due to inline functions and optimizations, but it works very well. Take a look: #0 in 0x242a7 <bios_putchar> #1 0x2d5be <putchar+3a> at /home/oremanj/dev/gpxe/src/core/console.c:28 #2 0x16321 <printf_putchar+18> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:390 #3 0x15ed3 <cputchar+1a> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:154 #4 0x15fa7 <vcprintf+45> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:179 #5 0x16192 <vprintf+29> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:405 #6 0x161cb <printf+22> at /home/oremanj/dev/gpxe/src/core/vsprintf.c:420 #7 0x09943 <ath5k_hw_noise_floor_calibration+b9> at /home/oremanj/dev/gpxe/src/drivers/net/ath5k/ath5k_phy.c:1161 #8 0x0c7cf <ath5k_hw_reset+d23> at /home/oremanj/dev/gpxe/src/drivers/net/ath5k/ath5k_reset.c:1122 #9 0x00958 <ath5k_reset+4f> at /home/oremanj/dev/gpxe/src/drivers/net/ath5k/ath5k.c:1550 #10 0x00be9 <ath5k_chan_set+9d> at /home/oremanj/dev/gpxe/src/drivers/net/ath5k/ath5k.c:708 #11 0x0f4ed <ath5k_config+e9a1> at /home/oremanj/dev/gpxe/src/net/80211/net80211.c:1428 #12 0x0f4ed <net80211_probe_step+f7> at /home/oremanj/dev/gpxe/src/net/80211/net80211.c:1428 #13 0x1032e <net80211_step_associate+118> at /home/oremanj/dev/gpxe/src/net/80211/net80211.c:1756 #14 0x13b22 <step+35> at /home/oremanj/dev/gpxe/src/core/process.c:79 #15 0x2d60e <getchar+34> at /home/oremanj/dev/gpxe/src/core/console.c:104 #16 0x2de2a <getkey+16> at /home/oremanj/dev/gpxe/src/core/getkey.c:67 #17 0x3b423 <readline+83> at /home/oremanj/dev/gpxe/src/hci/readline.c:101 #18 0x38c6c <shell+25> at /home/oremanj/dev/gpxe/src/hci/shell.c:96 #19 0x2ed99 <main+d5> at /home/oremanj/dev/gpxe/src/core/main.c:90 #20 0x2ccff <prot_to_real+??> at comboot_call.c:0 Strangely, -finstrument-functions causes gcc to report phantom "may be used uninitialized in this function" that don't show up without that option. This is definitely a hack, and I don't think it'd be suitable for inclusion into the main tree as it is - too invasive - but it's cool to play around with. :-) ==== Thursday, 7 August ==== ath5k works! I still don't know what the problem, as none of the small changes I made should've fixed it, but when I rebased against git master the memory corruption went away. I have cleaned up my ath5k branch and pushed it to staging, and it's ready for mainline review. While I was still trying to figure out the issue, I figured a malloc() error analysis might be helpful, so I added specially-formatted debugging statements to core/malloc.c that printed backtraces using the -finstrument-functions backtrace code I developed yesterday, and wrote a couple of small Perl scripts, one to look for alloc/free inconsistencies and print the backtraces for them, one to resolve addresses into function names and file line numbers (using the binutils program addr2line). The result was that I could capture gPXE console output and do % ./util/gpxegrind.pl ../ath5k-memory.log | ./util/resolveaddr.pl | less -R and get valgrind-style output about the locations of double frees, memory leaks, and so forth. It didn't help with ath5k, but it did let me catch a small memory leak in the net80211 code, which is included in the ath5k branch in staging. Currently this is very much a hack, but I'll work on cleaning it up for conditional use in mainline. When compiling an ath5k ROM, I ran into a problem: the linker tries to put the uncompressed sector count into a one-byte field, expecting the compressor to subtract from it as necessary based on the compression achieved. If the uncompressed length is over 128k, the linker will complain of a truncated relocation, even if the compressor fixup would have made everything work. I've worked around this by adding a new type of fixup (ADDx) that adds the compressed length to a field, to complement the current SUBx fixup that subtracts the compression delta from a field. In my tests it worked fine for ROMs that would have worked under the old system; the larger ROM I tried to flash caused the adventure described below, but I don't know if that's the fault of its size. After trying and failing to find a way to do ''REQUIRE_IF()'' using the linker table system, as suggested by mcb30, I have implemented the ''config/config_//objname//.o'' solution where ''//objname//.c'' will automatically ''REQUEST_OBJECT(config_//objname//)''. It seems to work well. I've pushed this and my earlier linker change to branch **linker** in staging, and it's also ready for review. I received an e1000 NIC, and had the chance to do some wireless ROM tests. Using memdisk, I was able to flash a dual-driver e1000/rtl8185 gPXE, and boot off the wireless using it - huzzah! (The e1000 has a 128KB EEPROM, so it's particularly good for this.) Unfortunately, my flash of an ath5k ROM produced a card that would freeze the system during option ROM scanning by the BIOS (when gPXE normally prints the "Press Ctrl+B to configure..."). Three hours from installation to bricking... not so good. I'm not sure why the ath5k flash failed, but it may have to do with my use of an iSCSI DOS boot instead of the prior use of memdisk. It's possible that IBAUTIL does something strange that's not respectful of gPXE's low-memory state, and causes iSCSI reads through the int 13h interface to return corrupted. It's also possible, since this is the first ROM I flashed that required my uncompressed-size-over-128k modifications, that there's an issue (either architecturally or with gPXE's implementation) with pushing the envelope in that way. I'd be able to reflash the e1000 easily enough if I could get into IBAUTIL, but since the system won't boot with it installed, that doesn't work. Since I don't have another ROMable card, the only way I could see of fixing it was plugging it into the PCI slot after the system had started. Yes, I did try it, and I was incredibly lucky not to fry either the card or the slot --- folks, regular PCI cards are //not// meant to be hot-plugged. (It locked up the system and wouldn't recognize the card until I power-cycled.) I believe the solution to this one lies in getting another card, flashing it properly with a basic gPXE, putting it in a PCI slot that gets scanned before the bricked card, and using gPXE's "Press Ctrl+B to configure..." escape hatch to get IBAUTIL loaded before the bad option ROM gets scanned. IBAUTIL has an option to disable the flash. * On branch **linker** in staging: * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=77fdc539b9a3bf33e2541ff7858500ee5940eb7d| [linker] Expand and correct symbol requirement macros]] * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=c78419aab1b33c6184abe9d10cacc24a438b3790| [linker] Add mechanism for subsystem-dependent configuration options]] * On branch **ath5k** in staging: * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=9512ea70ce18e185d247e245201a31a4e2e798fc| [802.11] Enhance support for driver PHY differences]] * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=cc938ee7c676b91bbadad3f08638f10a704dd2bc| [802.11] Set channels early on to avoid tuning to an undefined channel]] * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=43d6bc871bbdd2258815e67b4d398922b7f73952| [802.11] Fix maximum packet length]] * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=da292ec3b2789a4f3c59dfaf1156d479aac222e7| [802.11] Fix memory leak on unsuccessful probes]] * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=7f34c1e7ab7cf498febe6ada5fd3891c52d59a4c| [legal] Add MIT licence declaration]] * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=0e5b32e4d3971e7b3e624f0b0f2927f8a3b816d9| [ath5k] Add support for non-802.11n Atheros wireless NICs]] * [[http://git.etherboot.org/?p=gpxe-staging.git;a=commit;h=e9942ca44fa11eb5f675a72db51b9bbc45fb3e1e| [rom] Allow ROM images to have uncompressed size greater than 128k]]


Navigation

* [[:start|Home]] * [[:about|About our Project]] * [[:download|Download]] * [[:screenshots|Screenshots]] * Documentation * [[:howtos|HowTo Guides]] * [[:appnotes|Application Notes]] * [[:faq:|FAQs]] * [[:doc|General Doc]] * [[:talks|Videos, Talks, and Papers]] * [[:hardwareissues|Hardware Issues]] * [[:mailinglists|Mailing lists]] * [[http://support.etherboot.org/|Bugtracker]] * [[:contributing|Contributing]] * [[:editing_permission|Wiki Edit Permission]] * [[:wiki:syntax|Wiki Syntax]] * [[:contact|Contact]] * [[:relatedlinks|Related Links]] * [[:commerciallinks|Commercial Links]] * [[:acknowledgements|Acknowledgements]] * [[:logos|Logo Art]]

QR Code
QR Code soc:2009:oremanj:journal:week11 (generated for current page)