I (rwcr) have been working on a rather extensive modification of gPXE, to allow images and SAN devices (and eventually files on filesystems) to be treated with more unity. This resolves a great many “ugly hack” comments, makes SAN booting less architecture-dependent, and allows one to SAN-boot ISO images (to name a few possibilities). The cost is a tiny size increase in image type codesize due to an additional layer of indirection, and a more significant size increase in block device codesize for the same reason. This page is meant to summarize the changes, since a commit message can only be so long.
A new abstraction is introduced, that of a “data source” (struct source
), that can support random access and splitting and blocking of reads. In the case of an image already in memory it reduces to constructions like copy_from_user()
; to preserve size (about 800 bytes) in ROM images, it is possible to define MEM_SOURCE
in config/general.h
such that this reduction occurs at compile-time. Normally, though, a layer of indirection in core/source.c
is kept around to support SAN devices and eventually files on a filesystem, which may not be always resident in memory, may have requirements that they are accessed in fixed-size blocks, and may only support reading or writing a certain number of blocks at a time.
A data source is an implementation-specific structure (struct download
, struct scsi_device
, etc) that contains a struct source
by value. The containing structure must be reference-counted, and source.refcnt
points to that reference counter. One fills in source.read
and optionally source.write
with appropriate functions, optionally defines source.blkshift
and source.blkburst
to restrict the alignment and length of requests they can receive, and sets source.len
to the length of the data source in bytes. After this point, the data source is passed around as a pointer to the struct source
; the implementation-specific containing structure can be retrieved with container_of()
, and it will automatically be freed when the last reference to its source is dropped. (References taken against the data source increment the reference counter in the containing structure.)
Data sources support two additional features. First, they can be loaded, to allow for anything that needs the whole source in memory to work with it but doesn't particularly care where in memory it goes. (Loaded sources wind up on the external heap like downloaded images.) Second, they can be attached, using platform-specific handlers to make the contents of the source available (as an emulated disk or otherwise) to a booted operating system. Both INT13 hooks and iBFT/aBFT/sBFT filling are implemented as source attachers. The code requesting that a source be attached doesn't need to know how that attachment is done, which keeps things as platform-independent as possible. Both loading and attaching can be done recursively, so one can attach a SAN disk, boot from it (which will attach, execute, detach), and if the boot fails, still have the disk attached when gPXE exits; this is a cleaner way of achieving the “keep-san” functionality. One fills in source.data
with a user pointer to indicate a source already resident in memory (loading and unloading become a no-op), or sets source.loaded
to a nonzero integer while keeping source.data
null to indicate a source that cannot sensibly be loaded in its entirety (e.g. a SAN disk).
Size impact: source.o +792 unless MEM_SOURCE
minimalist option enabled
Currently, a downloader downloads into an image, and calls a custom function to “register” (or register-and-load, or register-and-execute, or …) that image if the download succeeds. The entry point for this is create_downloader()
, and it is only called by the user-level function imgfetch()
. Changes:
struct download
, acts as a trivial implementation of an image source; it simply serves reads and writes by access to a block of memory.create_downloader()
downloads into a download structure instead of an image.image_register
parameter is consequently dropped; this can be done by the caller.imgfetch()
are separated into a new function, download_uri()
, in usr/dlmgmt.c
.download_uri()
directly, imgfetch()
calls vfs_fetch_uri()
, which does some magic multiplexing so you can imgfetch
a SAN disk or eventually a file on a filesystem as well as a downloadable URI. The reference to vfs_fetch_uri()
is weak, so unless vfs.c
is linked in by a common feature in the API of SAN protocols and filesystem types, it will reduce to download_uri()
at compile time.Size impact: dlmgmt.o +166, imgmgmt.o -29, downloader.o +120, net +257.
Currently, image types access the contents of an image by direct reference to the area of user memory at image→data
of length image→len
. To support the new data source abstraction, these fields are replaced with a pointer image→source
to a data source. One can access image→source→len
as a direct replacement for image→len
; to get at data, one can either use source_load()
and then access image→source→data
(remember to source_unload()
when you're done!) or use {source_read()
, source_read_user()
} instead of {copy_from_user()
, memcpy_user()
} respectively. The latter is preferred, if one remembers it is now possible for these functions to return errors. (In my patch, to save on code expansion, small reads of header structures are not error-checked because an erroneous read will be detected by an invalid signature later on, but reads of the bulk of an image are checked for error return.)
A new image API function, image_set_source()
, can be used to set or change the data source associated with an image. It handles reference counting properly, and an image releases its reference to its data source when freed.
Size impact: image.o +41, image_cmd.o -13
image type | mem - old | full - mem | net |
---|---|---|---|
bootsector | +121 | +74 | +195 |
bzimage | +98 | +39 | +137 |
com32 | +25 | +9 | +34 |
comboot | +17 | -4 | +13 |
elf | +30 | +12 | +42 |
elfboot | +3 | +2 | +5 |
multiboot | +47 | +22 | +69 |
pxe_image | +7 | -4 | +3 |
script | +38 | +19 | +57 |
Totals | +386 | +169 | +555 |
Most of the mem - old
impact is from the 64-bitness of image→source→len
and the additional level of indirection required to access the fields of image→source
. The full - mem
impact is from the fact that source_read_user()
takes two more parameters, including one 64-bit one, than the memcpy()
that copy_from_user()
reduced to before.
Currently, each SAN boot protocol has four components (example): the block device protocol (scsi.c
), the networked backend transport (iscsi.c
), the firmware table creator (ibft.c
), and the boot glue (iscsiboot.c
). The latter two are OS-specific, and the boot glue is the entry point; it creates a block device of the appropriate type, calls the networked backend to “attach” it, calls the firmware table creator to fill in data about it, hooks the device via int13h, attempts to boot it, and undoes all of that if keep-san isn't set and the boot fails. This is all rather undesirable, as it involves a lot of code duplication and makes SAN booting inherently platform-specific because that's where its entry point lies.
In the new system, SAN booting is not a special case; any data source that looks like a hard disk or CD can be booted, thanks to a new bootsector
image format (a semi-thin wrapper around the existing call_bootsector()
) and a generalization of gPXE's ElTorito support. One can chain
or imgfetch
a SAN disk in the same way as a URI, and sanboot
would be identical to chain
were it not for the need to keep legacy support for the keep-san
setting. The boot glue is removed entirely in the unity patch. The firmware table creator is extended with a small glue function to make it work as a data source attacher, so SAN protocol code need not know about its existence directly; this allows the SAN code to remain platform-independent. The block device protocol provides a data source interface instead of a struct blockdev
interface (blockdev
and ramdisk
are both done away with) and the network backend transport provides a VFS binding (see below) to continue the existing URI-like syntax for lookups.
Attachment of a data source now occurs in three places: before attempting a SAN boot if keep-san
is set; just before executing a bootsector or ElTorito image (and detached if execution fails); and when the user explicitly requests it using a new attach
command. The traditional use-case for keep-san
, a Windows install, is replaced simply by
gPXE> attach -f iscsi:1.2.3.4::::iqn.2009-06.com.example.host:wininst gPXE> exit
and can be automated by serving a gPXE script with the “attach” line in it. (The -f
/–fetch
option asks to create an image for a URI and attach that, instead of attaching an already-fetched image.) Also, attach
now supports an option -t extra
to attach the source as an “extra” disk (numbered after existing hard drives) instead of the default of a “boot” disk (first hard drive, pushing others down). You can even attach a “boot” disk that's blank, an “extra” disk containing WinPE, boot the “extra” disk, and use it to install Windows onto the blank iSCSI target
Size impact:
object | size change |
---|---|
autoboot | +33 |
int13 | +320 |
keepsan | -128 |
abft | +23 |
ibft | +29 |
aoe | +121 |
iscsi | +105 |
aoeboot | -427 |
iscsiboot | -453 |
ata | +149 |
scsi | +350 |
Total | +122 |
How does one acquire a data source in the first place? Well, if you're downloading it, you get it using download_uri()
, which calls create_downloader()
, which calls xfer_open()
. It would be a mistake to try to fit random-access storage into the xfer_interface
framework; that framework does a marvelous job of handling network sockets, but it's very stream-oriented. So URI openers will stay download-only. How do we fit in SAN protocols, and eventually filesystem access?
The unity patch introduces the concept of a binding, an object that lets one look up a URI and get a data source back. Bindings are registered with a name, and when one attempts to fetch a URI with that name as the scheme, it gets looked up in the appropriate binding instead of downloaded. SAN boot protocols are implemented as global bindings named iscsi
, aoe
, ib_srp
, etc, so when you do
gPXE> imgfetch iscsi:1.2.3.4::::iqn.2009-06.com.example.host:mydisk
it's passing a URI to iscsi_lookup()
that has scheme
set to iscsi
and opaque
set to 1.2.3.4::::iqn.2009-06.com.example.host:mydisk
. The fact that a full URI is passed allows something like HTTPDisk or NFS to work intuitively; you can (assuming proper implementation of an httpdisk
SAN boot protocol)
gPXE> chain httpdisk://my.server/myimage.hdd
and it'll work exactly like chain http://...
except the whole image won't be downloaded before it's booted.
Looking ahead a bit, this patch implements the concept of a binding type, a way of creating bindings that are based on some data source instead of being global. For instance, if ext2
is a binding type, you can do
gPXE> imgfetch -n disk aoe:e0.0 gPXE> attach disk gPXE> bind -t ext2 disk bootfs disk on bootfs type ext2 gPXE> chain bootfs:/boot/vmlinuz
The explicit attach
, to fill the aBFT for the kernel, will probably become unnecessary.
A binding is memory-managed much like a network device; the allocation for its structure contains some amount of private data requested by the binding type creating it, and binding→priv
points at that private data. Sources looked up in the binding hold a reference to it, and a reference is taken when it is registered with a name as well. The binding holds a reference to the source it's based on. This system keeps sources and bindings around as long as anyone is using them.
A global binding is created as a struct global_binding
, which serves as the template for an autoregistered struct binding
at init-time. Data source attachers can specify a struct global_binding
to limit themselves to, so only AoE disks will be recorded in the aBFT, etc.
There's also a special URI syntax for recursive binding in a single command:
gPXE> chain ext2(part(aoe:e0.0):1):/boot/bzimage
If both ext2 filesystems and partition tables can be autodetected, that reduces to
gPXE> chain ((aoe:e0.0):1):/boot/bzimage
This is rather obtuse, but it does allow a complicated boot path to be specified in a single DHCP filename option.
Size impact: uri.o
+56, vfs.o
+895, vfs_cmd.o
+2037.