I can hardly believe there's only a week left of SoC. It's been a wonderful experience working with such talented developers, and I hope my coursework in the fall will leave me enough time to continue contributing
Also, I believe this IRC message needs to find a permanent record here:
10:45 <mcb30> At the point you're talking about, the system is not fully initialised. On many systems, the memory map is not yet valid. If running normal BIOS-level code is marked with "Here be dragons", running during POST is marked with "Here be huge, ugly, vindictive, sociopathic dragons with a mean sense of humour"
Well put indeed!
Not too much gPXE work today. I pushed a cleaned-up version of the large-ROM fix from my ath5k branch to staging as bigrom-oremanj (following the new staging tree protocol). A suggestion by Michael for making some of the condition checks for overflow more intuitive revealed the rather surprising fact that bit-shifting in C by more places than the size of the variable is undefined; on gcc-x86, 1ul « var
when var is 32 will be not zero but one! This led to a small-scale audit of variable-amount bitshifts in the gPXE source, but I didn't find any code that would cause problems with this undefined behavior.
I received a new e1000 card, and was able to use it to restore the flash on the old one following a procedure that I've outlined on the ROM burning page. The issue was indeed one of option ROM overflow; gPXE loads to segment CC00
, meaning it has exactly 80kB of ROM space on my test system. The ROM that had caused trouble was about 90kB.
I found a regression in the 802.11 code caused by recent changes to process_add()
to ensure the same process is not added twice. The changes assume that all callers use process_init_stopped()
to initialize all fields of the process structure, instead of setting just step
and refcnt
manually (which has worked fine in the past). The 802.11 code used the later method, and now does not start the association process at all. I pushed a two-line fix to staging as wiprocfix, and it probably will be merged tomorrow.
I rebased my linker branch against recent changes and pushed it to staging.
I updated my firmware branch to use the new symbol requirement macros defined in linker, and pushed it to my personal repository as firmware-pretty. It will go to staging after linker is merged, since it depends on the macros in linker.
Priorities for the rest of the week:
Regarding the last bullet point, I think I'm going to try using the PCI ROM BAR before device-specific flash code. Video cards almost universally have very large memory regions compared to a typical flash size, so it should be easy enough to look for a BAR larger than the flash size, disable it, and map the flash in its place for long enough to copy its contents to RAM. Disabling the BAR doesn't affect the card's internal operation, so as long as we don't output anything while the flash is mapped this method should work. (If anyone reading this knows something I don't about PCI architecture and can see that this is a stupid idea, please let me know.)
Wrote some documentation for users of the 802.11 code and driver writers.
Updated wireless branch to cope with a quirk of Linksys routers' WPA support; they don't accept 4-Way Handshake packets that include the optional capabilities field. Since we don't advertise any capabilities, there's no reason to include the field, so don't.
Updated firmware branch to clean up the makefile changes a bit.
Fixed my wiprocfix fix to properly set the reference count for the process object (no behavioral change, but the correctness is more intuitive) and saw it merged. bigrom-oremanj was also merged.
Rebased linker in staging against recent changes to mainline, and then wireless and firmware in my personal repository against linker.
Thought about future driver support; it seems that the combination of b43, ath9k, and iwlwifi should support almost all currently-unsupported cards in common use. Each of these drivers is over 20,000 lines of C in the Linux kernel, though, so this won't be an easy task.
Mostly worked on the ROM-from-PCI loader, which can be found as branch xrom in my personal repository for the curious. I doubt I'll be cleaning this up for mainline, as there seems to be no really safe way of doing it, and newer systems with PCI3.0 and PMM get the benefit for free.
Wrote some implementation notes for the wireless code.
Updated branch linker in staging to add the line number to symbols generated by REQUIRE_SYMBOL(foo)
, so that they now look like e.g. _
_require_foo_47
. This fixes an issue with REQUIRE_OBJECT()
multiple times in the same file (e.g. with both GDBUDP
and GDBSERIAL
defined); now that the symbols that macro introduces are initialized data rather than common, the compiler refuses to allow two with the same name.
Discovered that a stock e1000 gPXE ROM does not work on my development system (very recent BIOS, PCI3.0/BBS/PMM). It seems the BIOS will refuse to hand out 1MB or more at a time using PMM, and since gPXE keeps requesting larger allocations until it gets one aligned to 2MB, gPXE doesn't use PMM at all. The ROM is 70656 bytes, and it's relocated 71680 bytes below the end of option ROM space. Loading gPXE from the POST-time prompt works fine (except that e820 is not yet available there on my system); loading it as a boot device freezes immediately after “gPXE starting execution…”, and I get garbage onscreen after 10 seconds or so. I suspect some other card is trampling on our tail and hopelessly confusing the decompressor.
It turns out my test system exhibits the same underlying problem (BIOS won't give out 1MB or more via PMM); it just has slightly more option ROM space. When I tried to test a fix that would accept A20-set allocations if the BIOS had set up the A20 line properly, I managed to brick my e1000 again. Forty minutes of shuffling around PCI cards later, I fixed things, and verified that the state of the A20 line during POST is no indication of the state of A20 when our BEV or int19h is called. The BIOS disables it before booting.
Fix that would allow us to use PMM (and thus larger ROMs) on such limited BIOSes: accept any PMM buffer address, and set the A20 gate ourselves in the BEV. The code to do this is rather messy, though, and might not be worth it.
Also discovered a small bug in src/arch/i386/firmware/pcbios/gateA20.c
:
#define A20_KBC_RETRIES (2^21)
“You keep on using that operator. I do not think it means what you think it means.”
On my test system, if we make use of a PMM buffer with A20 set, we don't even get to the BEV entry point to have a chance to set the A20 gate up properly. Adding a
ljmp $0xf000, $0xfff0
immediately after bev_entry:
, which reboots the system at that point on a PMMless gPXE, does not prevent the freeze. There may be a subtler issue here.
Started taking a look at the Linux b43
(Broadcom wireless) driver. It's quite well-written and -commented, especially for a reverse-engineered driver, but the hardware is really a mess. Some models have the 30-bit DMA restriction Stefan dealt with during his SoC last year. The hardware uses an SSB interface, which seems to be on the level of a whole different bus bridged to PCI. And then there's this line:
err = request_firmware(&blob, ctx->fwname, ctx->dev->dev->dev);
dev→dev→dev
? Seriously?
Figured out a possible solution for the problem for xrom that we can't know about the devices like APICs that don't have their mappings in PCI BARs: just read the entire space we're going to cover with our mapping before we map it. The standard on x86 is for unmapped memory to read all-ones, and designers of MMIO interfaces actively avoid all-ones being normal in a register. If all 128k or whatever read as 0xFF
, plus we find no overlap in BARs or e820, it's almost certainly safe to map.
The ROM-mapping logic could also be used for UNDI.
Split up the FireWire branch into a more logical separation of commits (first the generic interface, then the gPXE code that uses it, then the host-side utilities to make it useful). Pushed it as firewire to my personal repository and removed it from staging, as I have other code there that I think is more important (specifically linker and the various things depending on it).
Well, Summer of Code is over, and what an adventure it's been. I've immensely enjoyed working on such a mature and well-developed codebase, with a great many talented people, and in a very interesting field with lots of room for innovation. Thank you to everyone who's helped to make it possible!
Things I'd like to still get done, in rough order of priority:
ath9k
or b43
or iwlwifi
from Linux. The latter two require firmware loading, and all are something of a mess.Final sanity check of local git branches related to the work I've done:
ath5k Merged (ath5k wireless driver) bigrom-oremanj Merged (small patch to support big ROMs) sky2 Merged (sky2 wired NIC driver) mainline-review Merged (initial bout of wireless code) wiprocfix Merged (small patch to wireless code) linker In staging (improve linker macros, object-specific config) firewire Waiting (debugging interface over FireWire) firmware-pretty Waiting (firmware image embedding and loading) wireless-pretty Waiting (wireless crypto and improvements) eap To-do (802.1X authentication, WPA Enterprise) xrom To-do (load ROM from the PCI card) ath5k-old History (superseded by ath5k) firewire-old History (superseded by firewire) firewire-really-old History (superseded by firewire) wireless History (superseded by wireless-pretty) fwtrans Academic interest (load files over firewire debug link)
And so we go, again.
Thank you to everyone who's made this summer great, and I hope to be able to continue contributing!