Alan Shieh, Linux UNDI Driver

Comparing a UNDI driver to server-side initrd selection

It is possible to achieve the same effect as an UNDI driver with other approaches, such as loading all possible drivers into an initrd, or selecting an initrd based on MAC address, thus allowing Linux to load the right module. Such a design of this is much more conservative, as it relies on driver code that has been used by a larger population

There are some disadvantages with each approach. Having a larger initrd may increase the booting time or RAM requirements. Selecting initrd based on MACs requires a MAC⇒NIC database. This may need to be collected manually, or make assumptions based on MAC⇒Manufacturer mappings. For instance, one could provide all Intel drivers for all NICs with a MAC that is assigned to Intel.

UNDI has its advantages and disadvantages. New deployments that install Etherboot stacks on supported NICs will automatically support the UNDI driver, without configuring any other software. A boot process that uses the UNDI driver can use arbitrary userspace / kernel code to talk to network and figure out how to load the right drivers for a particular machine.

Though the UNDI driver is about an order of magnitude slower than the Linux driver, it can still download most modules in <1s on an emulated NE2K-PCI.

Deliverables and Timeline

Note: Since I am working with Etherboot 5.4.x, I am going directly for 16:32 UNDI stack support. As of 7/30, the UNDI driver works with the NE2K-PCI, which uses PIO to send data to/from the card.

Here are the remaining deliverables:

  • Implement support for memory mapped registers
  • Test on alternate Etherboot hardware, including real hardware
  • * Test card that uses PIO to set up DMA
  • * Test card that uses memory mapped registers to set up DMA
  • Test with full network boot (LTSP, NFS root)
  • Experiment with getting other other PXE stacks – inference of segment lengths via E820 holes.

These steps are done

  • Implement memory map functionality for Linux
  • Set up UNDI Probe memory map
  • Find UNDI ROM
  • Make sure E820 Map is sane IRC Logs for E820 issue. I am Here (6/15/2006). Estimated completion time 6/20/2006
  • Hard code segment descriptor & location. 16:32 downcall (est 6/27/2006)
  • Test UNDI calls, see proposal for details (est 7/4/2006)
  • Integration with TUN/TAP device; transmit data with Linux (est 7/11/2006)
  • PXE Extensions for segment descriptor & location
  • Interrupt processing cleanup (est 7/18/2006)


UNDI proposal


= Goals =
* Support both 16:16 and 16:32 protected mode UNDI stacks

= Phase 1: 16:16 UNDI stack =

Most work should be reusable in the 16:32 mode. The main difference
will be the page table and LDT/GDT setup, which will be driven by the
PXE extensions to provide the necessary information to the kernel.

== Linux UNDI execution process  ==

The process will interact with a network driver to gain access to the
kernel send and receive queues, and to perform interrupt processing.

The 16:16 version will make assumptions about page table / memory
layout. This restriction will be removed in the 16:32 version.

High level requirements
1. Page table will be initialized for two "regions":
   a) PXE execution environment: the physical/virtual address range that
Etherboot is known to reside in.
   b) Area for Linux process
2. Implement 16:32=>16:16 thunks between Linux process and PXE code.
** 16:16 parameter passing area for all the parameter structures
3. Poll for interrupts using PXENV_UNDI_ISR_IN_START.
4. Implement bottom-half processing using

The processing flowchart is provided in the PXE 2.1 specification.

== Implement stub driver for NIC ==
Prototype will support only PCI devices.

1. Perform PCI probe using PCI_IDs specified as module
2. Initialize GDT using !PXE information.
3. Pump packets between UNDI execution process and kernel
transmit/receive queues.
4. Rmmod should clean up properly so that a full driver can be loaded

== Communications between UNDI execution process and driver ==
The driver will export two pipes under /proc with the following interfaces:

* TxPacket(char data[len], int len); // Kernel requests to send a packet

Maybe mii/ethtool/ifconfig?

* RxPacket(char data[len], int len); // Process received packet, tell
  kernel to queue it for network stack

== Etherboot & boot process modifications ==
1. Modify Etherboot to report 16:16 segment descriptors via !PXE.
2. Add a configuration flag to Etherboot to prevent it from unloading UNDI
3. Pass PCI_IDs to kernel
4. Linux:
    Before switching to protected mode, reserve !PXE.SegDescCnt
    descriptors in GDT, and set !PXE.FirstSelector to the appropriate

    The descriptors will not be copied from !PXE structure until the
    stub driver is loaded. Hopefully, the PXE stack will not gain
    control in the interim and crash because the segment descriptors
    have not been initialized.

5. Use PCI_IDs to initialize NIC stub driver

== Initialization and testing ==
I. Internal testing from within the driver process.
   Implement an interactive debugging console?


    (stats are useful for debugging later functionality)

    (sanity check the driver and execution environment)




4. IPC pipes
5. Interrupt polling, Rx/Tx handling

II. Linux milestones & tests
1. Transmit & receive
3. ARP
4. TCP & NFS
5. Simulated boot process:
    a) fetch network module from remote host, e.g. via HTTP or NFS
    b) unload UNDI module, kill UNDI execution process
    c) load network module
    d) network tests (as before)

= Phase 2: 16:32 mode =

==  Extend PXE interface ==

This mode will require additional information from the PXE stack, such
as the precise page table / memory layout and LDT/GDT entries for the
PXE stack.

This information will be returned via a new UNDI op-code
PXENV_GET_UNDI_ENV32. This entry point will be specially coded so that
it can execute in 16:16 mode (e.g. with KEEP_IT_REAL compile/link
options) via !PXE.EntryPointSP.

This will be the only op-code supported via that entry point, so the
vast majority of Etherboot will use 16:32.

typedef struct s_PXENV_GET_UNDI_ENV32
/* Outputs */
ADDR32 PageDirectoryBase;
SEGOFF16 EntryPoint16_32;

/* Inputs */
UINT32 DescriptorBufferSize;
ADDR32 DescriptorBuffer;


All UNDI op-codes will be accessible only through EntryPoint16_32.

PageDirectoryBase will be used to pass the memory map to the kernel.
The format is the native IA-32 2-level page table. The AVAIL fields of
the PTE and PDEs will be used to convey additional information about
each page:

000 = Normal page, can be relocated in physical memory
001 = PCI DMA page, can only be relocated if the relocation is transparent to the PCI device.
010 = PCI MMIO page. Must use these exact physical addresses.

min(!PXE.SegDescCnt, DescriptorBufferSize / sizeof(descriptor))
entries will be set in DescriptorBuffer.

Each descriptor entry is 64 bits, in the native IA-32 segment
descriptor format. This way, any arbitrary set of descriptors can be

== Delta from 16:16 prototype ==
1. Page table, LDT/GDT initialization
2. Thunks will call into 16:32 if available, otherwise 16:16

 End of base functionality goals

= Compatibility improvements =

1. Interrupt-driven operation for quirky cards

Polling is much easier to deal with, however according to comments in
undi.c, this doesn't work for all cards. So interrupt-driven operation
will increase the compatibility of the driver.

This will be quite tricky, since incorrect top-half handling can
easily jam the system.

After PCI probe, the kernel module will install a small interrupt handler.
The interrupt handler will need to use
not readily accessible, as it resides in the UNDI execution process.

If the NIC IRQ line is not shared, then this is trivial. The IRQ line
can be disabled while invoking the PXEENV_UNDI_ISR_IN_START in the
UNDI execution process, and the re-enabled when the process returns.

If the IRQ line is shared, we'll still need to mask the IRQ line while
dispatching to the ISR. However, this will also mask the other
devices, which might be needed to execute the process (e.g. disk IRQs
for demand paging). To decrease the likelihood of problems, the entire
process should be pinned in memory, linked statically,
etc. Alternatively, the kernel could switch to a polling strategy on
all IRQs while waiting for the user application to return.

Another solution would be to pull an UNDI execution environment into
the kernel context, then dispatching directly to the UNDI ISR, however
this would require changes to the kernel memory map and probably end
up being messy.

== New kernel<=>user pipe commands ==

* UNDI_Int(t_PXEENV_UNDI_ISR isr); // UNDI interrupt received by top-half

* UNDI_Int_Ack(); // Acknowledge interrupt, which will reenable the IRQ line

= Support for other PXE stacks =
** Experience & experiment with other PXE stack
*** how they use memory (DMA, code locations)
*** tricks to support generic PXE stack
    (paging, IOMMU, Linux kernel layout modifications)
*** Attempt to support unmodified PXE stacks

QR Code
QR Code soc:alanshieh (generated for current page)