Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
soc:alanshieh [2006/06/12 10:03] ashieh |
soc:alanshieh [2006/08/11 13:31] (current) ashieh |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Alan Shieh, Linux UNDI Driver ====== | ====== Alan Shieh, Linux UNDI Driver ====== | ||
- | IRC logs, e-mails, and development notes coming soon! | + | ===== Comparing a UNDI driver to server-side initrd selection ===== |
+ | |||
+ | It is possible to achieve the same effect as an UNDI driver with other approaches, such as loading all possible drivers into an initrd, or selecting an initrd based on MAC address, thus allowing Linux to load the right module. Such a design of this is much more conservative, as it relies on driver code that has been used by a larger population | ||
+ | |||
+ | There are some disadvantages with each approach. Having a larger initrd may increase the booting time or RAM requirements. Selecting initrd based on MACs requires a MAC=>NIC database. This may need to be collected manually, or make assumptions based on MAC=>Manufacturer mappings. For instance, one could provide all Intel drivers for all NICs with a MAC that is assigned to Intel. | ||
+ | |||
+ | UNDI has its advantages and disadvantages. New deployments that install Etherboot stacks on supported NICs will automatically support the UNDI driver, without configuring any other software. A boot process that uses the UNDI driver can use arbitrary userspace / kernel code to talk to network and figure out how to load the right drivers for a particular machine. | ||
+ | |||
+ | Though the UNDI driver is about an order of magnitude slower than the Linux driver, it can still download most modules in <1s on an emulated NE2K-PCI. | ||
+ | |||
+ | ===== Deliverables and Timeline ===== | ||
+ | |||
+ | Note: Since I am working with Etherboot 5.4.x, I am going directly for 16:32 UNDI stack support. As of 7/30, the UNDI driver works with the NE2K-PCI, which uses PIO to send data to/from the card. | ||
+ | |||
+ | Here are the remaining deliverables: | ||
+ | |||
+ | * Implement support for memory mapped registers | ||
+ | * Test on alternate Etherboot hardware, including real hardware | ||
+ | ** Test card that uses PIO to set up DMA | ||
+ | ** Test card that uses memory mapped registers to set up DMA | ||
+ | * Test with full network boot (LTSP, NFS root) | ||
+ | |||
+ | * Experiment with getting other other PXE stacks -- inference of segment lengths via E820 holes. | ||
+ | |||
+ | These steps are done | ||
+ | |||
+ | * Implement memory map functionality for Linux | ||
+ | * Set up UNDI Probe memory map | ||
+ | * Find UNDI ROM | ||
+ | * Make sure E820 Map is sane [[E820IRC:IRC Logs for E820 issue]]. I am Here (6/15/2006). Estimated completion time 6/20/2006 | ||
+ | * Hard code segment descriptor & location. 16:32 downcall (est 6/27/2006) | ||
+ | * Test UNDI calls, see proposal for details (est 7/4/2006) | ||
+ | * Integration with TUN/TAP device; transmit data with Linux (est 7/11/2006) | ||
+ | * PXE Extensions for segment descriptor & location | ||
+ | * Interrupt processing cleanup (est 7/18/2006) | ||
+ | |||
+ | |||
+ | ===== Resources ===== | ||
+ | [[Alan's test / development infrastructure]] | ||
+ | ===== UNDI proposal ===== | ||
+ | |||
+ | [[OldUNDIProposal]] | ||
+ | |||
+ | <file> | ||
+ | = Goals = | ||
+ | * Support both 16:16 and 16:32 protected mode UNDI stacks | ||
+ | |||
+ | = Phase 1: 16:16 UNDI stack = | ||
+ | |||
+ | Most work should be reusable in the 16:32 mode. The main difference | ||
+ | will be the page table and LDT/GDT setup, which will be driven by the | ||
+ | PXE extensions to provide the necessary information to the kernel. | ||
+ | |||
+ | == Linux UNDI execution process == | ||
+ | |||
+ | The process will interact with a network driver to gain access to the | ||
+ | kernel send and receive queues, and to perform interrupt processing. | ||
+ | |||
+ | The 16:16 version will make assumptions about page table / memory | ||
+ | layout. This restriction will be removed in the 16:32 version. | ||
+ | |||
+ | High level requirements | ||
+ | 1. Page table will be initialized for two "regions": | ||
+ | a) PXE execution environment: the physical/virtual address range that | ||
+ | Etherboot is known to reside in. | ||
+ | b) Area for Linux process | ||
+ | 2. Implement 16:32=>16:16 thunks between Linux process and PXE code. | ||
+ | ** 16:16 parameter passing area for all the parameter structures | ||
+ | 3. Poll for interrupts using PXENV_UNDI_ISR_IN_START. | ||
+ | 4. Implement bottom-half processing using | ||
+ | PXEENV_UNDI_ISR.PXEENV_UNDI_ISR_IN_PROCESS, | ||
+ | PXEENV_UNDI_ISR.PXEENV_UNDI_ISR_GET_NEXT | ||
+ | |||
+ | The processing flowchart is provided in the PXE 2.1 specification. | ||
+ | |||
+ | == Implement stub driver for NIC == | ||
+ | Prototype will support only PCI devices. | ||
+ | |||
+ | 1. Perform PCI probe using PCI_IDs specified as module | ||
+ | parameters. | ||
+ | 2. Initialize GDT using !PXE information. | ||
+ | 3. Pump packets between UNDI execution process and kernel | ||
+ | transmit/receive queues. | ||
+ | 4. Rmmod should clean up properly so that a full driver can be loaded | ||
+ | later. | ||
+ | |||
+ | == Communications between UNDI execution process and driver == | ||
+ | The driver will export two pipes under /proc with the following interfaces: | ||
+ | |||
+ | kernel_to_process: | ||
+ | * TxPacket(char data[len], int len); // Kernel requests to send a packet | ||
+ | |||
+ | Maybe mii/ethtool/ifconfig? | ||
+ | |||
+ | process_to_kernel: | ||
+ | * RxPacket(char data[len], int len); // Process received packet, tell | ||
+ | kernel to queue it for network stack | ||
+ | |||
+ | == Etherboot & boot process modifications == | ||
+ | 1. Modify Etherboot to report 16:16 segment descriptors via !PXE. | ||
+ | 2. Add a configuration flag to Etherboot to prevent it from unloading UNDI | ||
+ | 3. Pass PCI_IDs to kernel | ||
+ | 4. Linux: | ||
+ | Before switching to protected mode, reserve !PXE.SegDescCnt | ||
+ | descriptors in GDT, and set !PXE.FirstSelector to the appropriate | ||
+ | location. | ||
+ | |||
+ | The descriptors will not be copied from !PXE structure until the | ||
+ | stub driver is loaded. Hopefully, the PXE stack will not gain | ||
+ | control in the interim and crash because the segment descriptors | ||
+ | have not been initialized. | ||
+ | |||
+ | 5. Use PCI_IDs to initialize NIC stub driver | ||
+ | |||
+ | == Initialization and testing == | ||
+ | I. Internal testing from within the driver process. | ||
+ | Implement an interactive debugging console? | ||
+ | |||
+ | 1. UNDI_OPEN, UNDI_CLOSE, UNDI_GET_STATE | ||
+ | |||
+ | UNDI_GET_INFORMATION, UNDI_GET_STATISTICS, | ||
+ | UNDI_CLEAR_STATISTICS | ||
+ | (stats are useful for debugging later functionality) | ||
+ | |||
+ | UNDI_INITIATE_DIAGS | ||
+ | (sanity check the driver and execution environment) | ||
+ | |||
+ | 2. UNDI_TRANSMIT | ||
+ | 3. UNDI_RECEIVE | ||
+ | |||
+ | Optional: UNDI_SET_STATION_ADDRESS, UNDI_GET_NIC_TYPE, UNDI_GET_IFACE_INFO | ||
+ | |||
+ | Ignored: UNDI_SET_PACKET_FILTER, UNDI_SET_MULTICAST_ADDRESS, UNDI_FORCE_INTERRUPT, UNDI_GET_MULTICAST_ADDRESS | ||
+ | |||
+ | 4. IPC pipes | ||
+ | 5. Interrupt polling, Rx/Tx handling | ||
+ | |||
+ | II. Linux milestones & tests | ||
+ | 1. Transmit & receive | ||
+ | 2. DHCP | ||
+ | 3. ARP | ||
+ | 4. TCP & NFS | ||
+ | 5. Simulated boot process: | ||
+ | a) fetch network module from remote host, e.g. via HTTP or NFS | ||
+ | b) unload UNDI module, kill UNDI execution process | ||
+ | c) load network module | ||
+ | d) network tests (as before) | ||
+ | |||
+ | = Phase 2: 16:32 mode = | ||
+ | |||
+ | == Extend PXE interface == | ||
+ | |||
+ | This mode will require additional information from the PXE stack, such | ||
+ | as the precise page table / memory layout and LDT/GDT entries for the | ||
+ | PXE stack. | ||
+ | |||
+ | This information will be returned via a new UNDI op-code | ||
+ | PXENV_GET_UNDI_ENV32. This entry point will be specially coded so that | ||
+ | it can execute in 16:16 mode (e.g. with KEEP_IT_REAL compile/link | ||
+ | options) via !PXE.EntryPointSP. | ||
+ | |||
+ | This will be the only op-code supported via that entry point, so the | ||
+ | vast majority of Etherboot will use 16:32. | ||
+ | |||
+ | typedef struct s_PXENV_GET_UNDI_ENV32 | ||
+ | { | ||
+ | /* Outputs */ | ||
+ | PXENV_STATUS Status; | ||
+ | ADDR32 PageDirectoryBase; | ||
+ | SEGOFF16 EntryPoint16_32; | ||
+ | |||
+ | /* Inputs */ | ||
+ | UINT32 DescriptorBufferSize; | ||
+ | ADDR32 DescriptorBuffer; | ||
+ | |||
+ | } t_PXENV_GET_UNDI_ENV32; | ||
+ | |||
+ | All UNDI op-codes will be accessible only through EntryPoint16_32. | ||
+ | |||
+ | PageDirectoryBase will be used to pass the memory map to the kernel. | ||
+ | The format is the native IA-32 2-level page table. The AVAIL fields of | ||
+ | the PTE and PDEs will be used to convey additional information about | ||
+ | each page: | ||
+ | |||
+ | 000 = Normal page, can be relocated in physical memory | ||
+ | 001 = PCI DMA page, can only be relocated if the relocation is transparent to the PCI device. | ||
+ | 010 = PCI MMIO page. Must use these exact physical addresses. | ||
+ | |||
+ | min(!PXE.SegDescCnt, DescriptorBufferSize / sizeof(descriptor)) | ||
+ | entries will be set in DescriptorBuffer. | ||
+ | |||
+ | Each descriptor entry is 64 bits, in the native IA-32 segment | ||
+ | descriptor format. This way, any arbitrary set of descriptors can be | ||
+ | specified. | ||
+ | |||
+ | == Delta from 16:16 prototype == | ||
+ | 1. Page table, LDT/GDT initialization | ||
+ | 2. Thunks will call into 16:32 if available, otherwise 16:16 | ||
+ | |||
+ | ========================== | ||
+ | End of base functionality goals | ||
+ | ========================== | ||
+ | |||
+ | = Compatibility improvements = | ||
+ | |||
+ | 1. Interrupt-driven operation for quirky cards | ||
+ | |||
+ | Polling is much easier to deal with, however according to comments in | ||
+ | undi.c, this doesn't work for all cards. So interrupt-driven operation | ||
+ | will increase the compatibility of the driver. | ||
+ | |||
+ | This will be quite tricky, since incorrect top-half handling can | ||
+ | easily jam the system. | ||
+ | |||
+ | After PCI probe, the kernel module will install a small interrupt handler. | ||
+ | The interrupt handler will need to use | ||
+ | PXEENV_UNDI_ISR.PXEENV_UNDI_ISR_IN_START. However, this is | ||
+ | not readily accessible, as it resides in the UNDI execution process. | ||
+ | |||
+ | If the NIC IRQ line is not shared, then this is trivial. The IRQ line | ||
+ | can be disabled while invoking the PXEENV_UNDI_ISR_IN_START in the | ||
+ | UNDI execution process, and the re-enabled when the process returns. | ||
+ | |||
+ | If the IRQ line is shared, we'll still need to mask the IRQ line while | ||
+ | dispatching to the ISR. However, this will also mask the other | ||
+ | devices, which might be needed to execute the process (e.g. disk IRQs | ||
+ | for demand paging). To decrease the likelihood of problems, the entire | ||
+ | process should be pinned in memory, linked statically, | ||
+ | etc. Alternatively, the kernel could switch to a polling strategy on | ||
+ | all IRQs while waiting for the user application to return. | ||
+ | |||
+ | Another solution would be to pull an UNDI execution environment into | ||
+ | the kernel context, then dispatching directly to the UNDI ISR, however | ||
+ | this would require changes to the kernel memory map and probably end | ||
+ | up being messy. | ||
+ | |||
+ | == New kernel<=>user pipe commands == | ||
+ | |||
+ | kernel_to_process: | ||
+ | * UNDI_Int(t_PXEENV_UNDI_ISR isr); // UNDI interrupt received by top-half | ||
+ | |||
+ | process_to_kernel: | ||
+ | * UNDI_Int_Ack(); // Acknowledge interrupt, which will reenable the IRQ line | ||
+ | |||
+ | = Support for other PXE stacks = | ||
+ | ** Experience & experiment with other PXE stack | ||
+ | *** how they use memory (DMA, code locations) | ||
+ | *** tricks to support generic PXE stack | ||
+ | (paging, IOMMU, Linux kernel layout modifications) | ||
+ | *** Attempt to support unmodified PXE stacks | ||
+ | </file> |