Etherboot Project GSoC Ideas

Etherboot Project GSoC Ideas

This list is not exhaustive, and we welcome new suggestions. Some of these ideas are not in themselves complete projects; feel free to ask us how much work is likely to be involved, and how many ideas you might sensibly attempt as a Summer of Code project. The approximate difficulty level of each idea has been marked with one or more symbols; the more , the more difficult the project is expected to be.

Device drivers

gPXE is always in need of more device drivers. In the case of network card drivers, existing drivers and Linux kernel drivers are available as starting points. Data sheets are also generally available for most NIC variations, though such documentation is sometimes unreliable. You could:

Update some of the old Etherboot drivers to work with the new gPXE driver API. All drivers are written in C, and you will need to have hardware (network cards, server and client computers) to test driver changes. (We may be able to provide some of the required cards for development and testing.)

Add support for newer network card variants to an existing gPXE driver.

Add a device driver for a new, currently-unsupported network card to gPXE.

If you are feeling more adventurous, and have access to appropriate hardware, you could:

Add a driver for a wireless network card. (gPXE has an 802.11 stack similar to a simplified version of Linux's, but wireless networking cards tend to be trickier than wired ones.)

Add a non-Ethernet driver, e.g. a driver for an Infiniband card. (gPXE does have a working Infiniband subsystem.)

Fix up the support for legacy bus types such as ISAPnP. These devices are allegedly supported, though most have not been tested for many years, and it is unlikely that the current code is in a working state.

Add support for a new bus type, e.g. PCMCIA or USB.

Automated regression testing

gPXE has a relatively large feature set given the code size. Many features are rarely used, and there has been a tendency for parts of the code to suffer from bit-rot. The measures we have taken so far will ensure that we never end up with unbuildable code; we now avoid the use of #ifdef wherever possible, and have automated tests in place to identify missing or redundant symbols. However, we do not have any systematic method for functional testing.

We would ideally like to be able to run a series of tests to verify different functional units (e.g. http download, Linux kernel booting, PXE booting, serial console support, etc.). Most of these tests can be carried out inside a virtual machine such as bochs or qemu. Some tests (e.g. specific device driver tests) will need to be carried out on real hardware. The tests should be fully automated, and should produce a clear pass/fail status indicator. It should be possible for a developer to simply run “make test” and, some time later, receive an overall pass/fail status, together with a list of any failed tests.

You would design and create an infrastructure for automated testing of gPXE. Your test harness would have to set up the environment required for the particular test (e.g. building the gPXE image to be tested, configuring the DHCP server), initiate the test (which may involve starting up an emulator such as qemu, or powering-on a test machine), identify and record the test result, then move on to the next test. Test results should be collated and reported to the developer.

Having such an automated test suite would enable us to offer quality control guarantees; we could then be confident that upgrades would not break existing functionality.

Improved TCP performance

gPXE includes support for the Transmission Control Protocol underlying most of the Internet's traffic, enabling network boot files to be loaded over reliable protocols like HTTP and iSCSI. To keep the code small, though, gPXE's TCP stack is very simple, and does not support many TCP features such as out-of-order packet recovery, selective ACK, window scaling, or congestion control. Implementing some of these features would allow much better performance in downloads of large network boot images.

You would analyze the performance benefits and code size costs of several TCP features, and choose a few to implement in gPXE's TCP stack to best improve performance without compromising gPXE's ability to fit into ROM.

Security improvements

() gPXE currently supports loading boot files over a TLS-secured HTTP connection (https:// URI), but the implementation is sufficiently skeletal that its security is much less than that of a typical Web browser:

gPXE has no boot-time source of entropy, so its random numbers are not really random and could be guessed fairly easily by an attacker. You would implement a cryptographically strong random number generator (algorithms for several are publicly available), using entropy from timing jitter in the system clock or the timing of packet arrivals on the network.
We do not verify the server's certificate, so there is no way to be sure traffic to the secure server is not being hijacked by a third party. You would implement support for compiling gPXE with a root Certificate Authority, such that it would only allow secured connections to servers bearing certificates signed by that authority. This would require parsing the x509 certificate's ASN1 representation to extract the cryptographic data necessary for verification, and performing the appropriate signature verification to ensure the certificate really was signed by the CA.

Either of these projects would require either a preexisting familiarity with cryptography or a week or two of research into the necessary methods and data formats. Familiarity with C is required, and a moderate mathematical background probably helpful. The results would enable sites with stringent data security requirements to begin using gPXE to boot their systems over the network.

Linux Distribution network installation

Most Linux distributions include some support for installation over the network. In most cases, this is designed to work with a standard PXE stack supporting only TFTP, but requires only a few small tweaks (and several days of testing) in order to work directly over HTTP. Installation via HTTP would provide a much smoother and simpler experience for the user.

Some distributions also provide support for installation directly to an iSCSI target. This support tends to be fragile and difficult to use, and the instructions necessary to get it to work tend to be complex. It would be nice if installation to an iSCSI target worked at least as well as it currently does in Windows Server 2008.

You would work to improve the network installation and iSCSI target installation capabilities of several of the major Linux distributions (Fedora/CentOS/RHEL, Ubuntu, etc). The installers tend to vary substantially between distributions, so work done on one distribution will not usually be directly usable on another. You would liaise with the relevant distribution maintainers to get your changes merged upstream into the next releases of each distribution.

Having this support would make life easier for users attempting to install Linux over the network, and would provide an incentive for NIC and motherboard vendors to ship gPXE in place of a legacy PXE ROM.

IPv6 Support

gPXE currently contains an aborted attempt at an IPv6 implementation. Several other attempts have been made over the past few years; none have been of sufficient quality to be merged into the main tree.

gPXE is structured to allow easy addition of IPv6: the IPv4 layer is cleanly separated from both the transport layers (TCP and UDP) and the link layers (Ethernet and others). Adding IPv6 support would require implementation of the basic IPv6 network layer protocol plus any ancillary protocols required for IPv6 operation such as NDP. The existing DNS protocol support should be extended to cover IPv6 AAAA records, and it would also potentially be useful to support DHCPv6.

ProxyDHCP server for Linux

ProxyDHCP provides a mechanism for supplying DHCP options to clients independently of IP addresses. It is a PXE extension to DHCP, and is already supported by gPXE. It is potentially useful in situations such as adding a network booting infrastructure to a network that already has a DHCP server that cannot be reconfigured. (This is a fairly typical problem in corporate networks.)

dnsmasq is currently the only open-source DHCP server that supports Proxy DHCP, but it is not designed to be scalable to very large networks such as those found at corporate installations. It would be desirable to extend ISC dhcpd to provide this functionality as well. ISC dhcpd already has a rich configuration file syntax including the ability to perform conditional behavior depending on the contents of packets it receives. It currently lacks the ability to offer DHCP options without simultaneously offering an IP address.

You would extend ISC dhcpd to be able to support operation as a ProxyDHCP server. This would include designing appropriate extensions to the dhcpd.conf syntax, implementing and testing the changes, and working with the dhcpd maintainers to get your changes integrated upstream.

Having this ability would make it easier for users to deploy gPXE in large corporate networks with restrictive policies on changes to the DHCP infrastructure.

Enhanced scripting language

gPXE has a command language that allows users to boot interactively and is also used for scripts. Scripts allow users to customize gPXE behavior for site-specific network and boot configurations.

For last year's Summer of Code, Lynus Vaz added support for looping, conditional branches, and arithmetic and string manipulation operators to allow more powerful scripts to be written; there is a description of features added available. The idea was to make it possible for users to implement advanced boot policies without modifying gPXE's source code. Unfortunately, these features came with a large code size cost, and a single-driver build of gPXE generally has to fit in 64kB to be useful to ROM users.

You would modify the implementation of this advanced scripting language with an eye towards achieving the minimum code size possible. You would be able to modify the design of the language as necessary to be amenable to these constraints, with input from the gPXE community. Large sections of the code would probably need to be rewritten. This task will require good C programming skills, ideally some experience in the design and implementation of simple languages, and a strong appreciation for the real-world challenge of fitting a powerful language into an environment with a tiny code size budget.

Having a powerful scripting language would make it possible to customize network boot behavior without being an expert in low-level C programming and gPXE internals.

Improved debugging support

gPXE, running as it does in an environment with nothing by way of memory protection or operating system services, can be very difficult to debug. We have a GDB stub to allow remote debugging over a serial cable or UDP, but for architectural reasons it's impossible to detect invalid memory accesses or interrupt infinite loops with it. There are a couple of ways of making this work:

Allow gPXE to run with paging enabled, identity-mapping only those parts of the address space that are valid. This would require significant expertise in low-level x86 internals; gPXE interacts with them even more than a typical operating system, because it has to regularly switch between real and protected mode to perform BIOS calls.

Allow gPXE to run as a user-mode application under Linux. Obviously network card drivers would be difficult to test in this environment, but most other parts of gPXE could be run in an environment with built-in memory protection and useful tools like valgrind easily available. The cleanest way to do this would be to create another x86 “platform”, alongside the existing PC-BIOS and EFI, that performs low-level Linux system calls to perform platform-specific operations. A Linux kernel module could be used to enable DMA for testing network card drivers. The prospective implementor would need a fair amount of low-level Linux kernel experience.

What you will need

For almost all gPXE development work, you will need to have:

A development machine, running Linux, that you have root access to. You will edit and compile gPXE on this machine, and you may also need to set up software such as a DHCP server etc. (We will talk you through getting your machine set up for development; the important thing is that you must have a machine available.)

A testing machine, which you can reboot very frequently. You will test gPXE on this machine. This machine cannot be the same as your development machine; it must be a separate computer (or a virtual machine).

A working network between the two machines.

Access to IRC (Internet Relay Chat), so that you can talk to us.

We use git for source code management, so you will need to learn to create and manipulate your own git repository.
- Using git for gPXE development

Depending on the project idea that you choose, you may also need:

One or more specific network cards, if you are writing a driver for that card. We may be able to provide these.

Support for specific bus types (e.g. PCI express, PCMCIA) on the testing machine, if your project idea involves working with these buses.

It is useful, though not always essential, to have:

Serial ports on both the development and testing machines, and a null-modem cable to connect them.

A digital camera (for taking screenshots; we have found situations in which this is the only sensible way to report diagnostic information!)

How to apply

First, take a look at the above list of project ideas and see if anything takes your fancy. If you have an idea of your own that isn't listed above, that's great. Take some time to think about your idea; how might you approach the problem, how long do you think it would take, how interesting do you find it, how useful will it be to other people, etc.?

Second, introduce yourselves to us. Preferably via the IRC channel, though e-mail is perfectly acceptable. We will want to spend some time talking to you about your proposal. We will base our decision mostly on the interactions we have with you, rather than on the project proposal you submit via the GSoC web interface.

Thirdly, once you've talked to us on IRC or via e-mail, submit a project proposal via the official Summer of Code web interface. Your proposal should clearly state the project idea you'd like to work on, and should give some background information on your experience (including code samples if possible), and a rough overview of how you might approach the problem.

We will interview all applicants via IRC in a private channel. The interview will start with a discussion about your past coding experience, including a brief review of any code samples that you submitted along with your proposal. We will then move on to two or three coding exercises for you to complete during the interview; you can view a sample of one exercise used in last year's interviews. We will also talk with you briefly about your proposed project.

Our primary interest in the interview is to establish whether or not you are capable of writing clean, efficient C code (or another language, if appropriate for your project). We will also want to see how you react to criticisms, hints, and suggestions. Our decision on whom to accept will be based primarily on the interview, rather than your written proposal.

Hints and tips

Do get in touch with us. Unlike the larger projects, we will take the time to talk to you personally. If we haven't spoken to you, we're very unlikely to accept you as a student.

Do ask for help when you need it. You're not in class, and you won't be penalised for not knowing the answer immediately.

Potentially useful links

Etherboot IRC Channel

#etherboot on the FreeNode network (irc.freenode.net)

Mentor Email

You can reach us at soc-mentors@etherboot.org, though IRC is preferred for most interactions.

We hope you have enjoyed reading about the Etherboot Project, and we look forward to meeting you and discussing your project ideas.

Table of Contents