Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dev:devmanual [2006/07/16 09:44]
stockholm
dev:devmanual [2006/07/16 10:01] (current)
stockholm
Line 276: Line 276:
  
 ==== Booting the code from a floppy ==== ==== Booting the code from a floppy ====
 +Use the rule for ''​bin/​driver.fd0''​ to write another instance of the driver to the floppy for testing. Use lots of printf statements to track where execution has reached and to display the status of various variables and registers in the code. You should expect to do this dance with the development machine, floppy disk and target machine many many times. ​
 +
 ==== Booting the test code with another EtherBoot ROM ==== ==== Booting the test code with another EtherBoot ROM ====
 +There is another method of testing ROM images that does not involve walking a floppy disk between the machines and is much nicer. Set up a supported NIC with a boot ROM. Put the target NIC on the same machine but at a non-conflicting I/O location. That is to say, your test machine has two NICs and two connections to the LAN. Then when you are ready to test a boot image, use the utility ''​mknbi-rom''​ to create a network bootable image from the ROM image, and set up bootpd/​DHCPD and tftpd to send this over the when the machine netboots. ​
 +
 +Using Etherboot to boot another version of itself is rather mind-boggling I know. 
 +
 +
 +
 +
 ==== Writing the code ==== ==== Writing the code ====
 +First set up the various required services, i.e. BOOTP/DHCP, tftp, etc. on the development machine. You should go through the setup process with a supported adapter card on a test machine so that you know that the network services are working and what to expect to see on the test machine.
 +
 +If you are starting from a Linux driver, usually the hardest part is filtering out all the things you do not need from the Linux driver. Here is a non-exhaustive list:
 +  * You do not use interrupts.
 +  * You do not need more than one transmit buffer.
 +  * You do not need to use the most efficient method of data transfer.
 +  * You do not need to implement statistics counting.
 +In general it is a good idea to use the latest Linux driver as base, because it usually supports newer cards and has more bugs fixed than older versions. If the driver is written by Donald Becker, it is probably best to start from the version available on [[http://​www.scyld.com/​page/​support/​network/​|this page]], since the driver in the official kernel may be behind. See also the Section called *FIXME* Keeping your driver in sync with the Linux version.
 +
 +Generally speaking, the probe routine is relatively easy to translate from the Linux driver. The exception is when you need to handle media and speed switching. The transmit is usually straightforward,​ and the receive a bit more difficult. The main problem is that in the Linux driver, the work is split between routines called from the kernel and routines triggered by hardware interrupts. As mentioned before, Etherboot does not use interrupts so you have to bring the work of transmitting and receiving back into the main routines. The disable routine is straightforward if you have the hardware commands. ​
 +
 +When coding, first get the probe routine working. You will need to refer to the programmer'​s guide to the adapter when you do this. You can also get some information by reading a Linux or FreeBSD driver. You may also need to get the disable routine working at this time. 
 +
 +Next, get the transmit routine working. To check that packets are going out on the wire, you can use tcpdump or ethereal on the development machine to snoop on the Ethernet. The first packet to be sent out by Etherboot will be a broadcast query packet, on UDP port 67.
 +
 +Note that you do not need interrupts at all. You should ensure the packet is fully transmitted before returning from this routine. You may also wish to implement a timeout to make sure the driver doesn'​t get stuck inside transmit if it fails to complete. A couple of timer routines are available for implementing the timeout, see ''​timer.h''​. You use them like this (in pseudo-code):​
 +<​file>​
 +        for (load_timer2(TIMEOUT_VALUE);​
 +                transmitter_busy && (status = timer2_running());​ )
 +                ;
 +        if (status == 0)
 +                transmitter_timed_out;​
 +</​file>​
 +The timeout value should be 1193 per millisecond of wait. The maximum value is 65535, which is about 54 milliseconds of timeout. If you just need to delay a short time without needing to do other checks during the timeout, you can call ''​waiton_timer2(TIMEOUT_VALUE)''​ which will load, then poll the timer, and return control on timeout.
 +
 +Please do not use counting loops to implement time-sensitive delays. Such loops are CPU speed dependent and can fail to give the right delay period when run on a faster machine. ​
 +
 +Next, get the receive routine working. If you already have the transmit routine working correctly you should be getting a reply from the BOOTP/DHCP server. Again, you do not need interrupts, unlike drivers from Linux and other operating systems. This means you just have to read the right register on the adapter to see if a packet has arrived. Note that you should NOT loop in the receive routine until a packet is received. Etherboot needs to do other things so if you loop in the poll routine, it will not get a chance to do those tasks. Just return 0 if there is no packet awaiting and the main line will call the poll routine again later. ​
 +
 +Finally, get the disable routine working. This may simply be a matter of turning off something in the adapter.
 +
 +
 +
 ==== Things to watch out for ==== ==== Things to watch out for ====
 +Things that may complicate your coding are constraints imposed by the hardware. Some adapters require buffers to be on word or doubleword boundaries. See ''​rtl8139.c''​ for an example of this. Some adapters need a wait after certain operations.
 +
 ==== Tidying up ==== ==== Tidying up ====
 +When you get something more or less working, release early. People on the mailing lists can help you find problems or improve the code. Besides you don't want to get run over by a bus and then the world never gets to see your masterpiece,​ do you? :-)
 +
 +Your opus should be released under GPL, BSD or a similar Open Source license, otherwise people will have problems using your code, as most of the rest of Etherboot is GPLed.
 +
 ==== Keeping your driver in sync with the Linux version ==== ==== Keeping your driver in sync with the Linux version ====
 +Most Etherboot drivers are derived from their Linux counterparts. Sooner or later the Linux driver where you started from will get updated. This may include bugfixes and support for new NICs. Unfortunately there is no way to keep the Etherboot driver automatically up to date, but if you keep the following rules in mind while porting it can be made easier: ​
 +  * State the version of the Linux driver in a comment in your Etherboot driver. This makes it easy to check to which Linux driver the Etherboot driver corresponds ​
 +  * Etherboot drivers should be small. Often a Linux driver contains way too much code, but in many cases you can reuse some utility functions from the Linux driver without blowing up the code size too much. In this case, keep the order in which they are defined where possible, and avoid code reformatting in general. (re-indenting is ok, because diff can ignore it with the -b switch). This makes it easy to compare later versions with utilities like mgdiff. ​
 +  * You can get the pci ids of all NIC drivers of a particular Linux kernel with the utility ''​get-pci-ids''​ in util and compare them to the entries in ''​bin/​NIC''​ in order to find out wether the Linux kernel supports some new NICs.
 +
 +
 +
 ===== A potted history of Etherboot ===== ===== A potted history of Etherboot =====
 +In Linux circles Netboot appeared first. According to the docs Jamie Honan was the person who coded up the first version and specified the format of the tagged image files. This version used assembler code taken from [[http://​www.crynwr.com/​|packet drivers]] to interface to the hardware, only Western Digital (now SMC) NICs in the first instance. It also required a DOS environment to compile. Later on Gero Kuhlmann took over the development of Netboot and made tremendous improvements to it. Among other things he created a harness that would simulate just enough of a DOS environment so that unmodified packet driver binaries could be used in a boot ROM. This allows any NIC on the market that has a packet driver to be used immediately. He also migrated the development to a Linux (Unix) platform.
 +
 +Etherboot was ported from FreeBSD by Markus Gutschke. He made it compile under Linux and added code to support tagged images in addition to NFS boot. Since tagged images are a more general mechanism and requires less boot rom code, this has become the preferred loading method. Markus has also coded most of the additional features between 2.0 and 3.0, such as additional bootp tags, ANSI screen escapes, etc. Many of the features common to Etherboot and Netboot, such as the tagged image format, the support programs such as mknbi, and support in the Linux kernel for diskless booting, are by Gero or Markus. ​
 +
 +[[http://​www.geocities.com/​ken_yap_aus/​|Ken Yap]] came to Etherboot a bit later. His original objective was to produce a 16 bit version that could be used to netboot ELKS and other OSes on older CPUs. As these things happen, he got enticed into improving the code, doing structural rearrangement,​ and merging contributions from others, and is now the primary maintainer of Etherboot. ​
 +
 +In the early days, the Etherboot web page was hosted by the [[http://​www.slug.org.au/​|Sydney Linux Users Group]] web site. Quite coincidentally and unrelatedly,​ Jamie Honan is one of the founding members of SLUG, so the story has come a full circle here. Since April 2000, Etherboot has been hosted at [[http://​sourceforge.net/​|Sourceforge]]. Sourceforge has provided superb facilities for hosting community Open Source development.
 +
 +
 +
 +
 ===== Appendix A. Draft Net Boot Image Proposal 0.3, June 15, 1997 ===== ===== Appendix A. Draft Net Boot Image Proposal 0.3, June 15, 1997 =====
 +Jamie Honan and Gero Kuhlmann, gero AT minix PERIOD han PERIOD de 
 +
 +In order to provide more functionality to the boot rom code Jamie'​s draft has been changed a little bit. All of Gero Kuhmnann'​s changes are preceded and followed by (gk). All of Ken Yap's changes are preceded and followed by (ky).
 +
 ==== Preamble - the why ==== ==== Preamble - the why ====
 +Whilst researching what other boot proms do (at least those implementing TCP/IP protocols) it is clear that each 'does their own thing' in terms of what they expect in a boot image. ​
 +
 +If we could all agree on working toward an open standard, O/S suppliers and boot rom suppliers can build their products to this norm, and be confident that they will work with each other. ​
 +
 +This is a description of how I will implement the boot rom for Linux. I believe it to be flexible enough for any OS that will be loaded when a PC boots from a network in the TCP/IP environment. ​
 +
 +It would be good if this could be turned into some form of standard. ​
 +
 +This is very much a first draft. I am inviting comment. ​
 +
 +The ideas presented here should be independant of any implementation. In the end, where there is a conflict between the final of this draft, and an implementation,​ this description should prevail. ​
 +
 +The terms I use are defined at the end. 
 +
 +(gk)IMPORTANT NOTE: The scope of this document starts at the point where the net boot process gains control from the BIOS, to where the booted image reaches a state from which there is no return to the net boot program possible.(gk)
 +
 ==== The target ==== ==== The target ====
 +The target is to have a PC retrieve a boot image from a network in the TCP/IP environment. ​
 +
 +(gk)The boot may take place from a network adaptor rom, from a boot floppy.(gk)
 +
 ==== Net Boot Process Description ==== ==== Net Boot Process Description ====
 +(gk)The net boot process is started as a result of the PC boot process. The net boot program can reside on a rom, e.g. on an adaptor card, or in ram as a result of reading off disk.(gk) ​
 +
 + The boot process may execute in any mode (e.g. 8086, 80386) it desires. When it jumps to the start location in the boot image, it must be in 8086 mode and be capable of going into any mode supported by the underlying processor. ​
 +
 +The image cannot be loaded into address spaces below 10000h, or between A0000h through FFFFFh, or between 98000h through 9FFFFh. (gk)Only when the image is not going to return to the boot process, all the memory is available to it once it has been started, so it can relocate parts of itself to these areas.(gk) ​
 +
 +The boot process must be capable of loading the image into all other memory locations. Specifically,​ where the machine supports this, this means memory over 100000h. ​
 +
 +The net boot process must execute the bootp protocol, followed by the tftp protocol, as defined in the relevant rfc'​s. ​
 +
 +The file name used in the tftp protocol must be that given by the bootp record. ​
 +
 +If less than 512 bytes are loaded, the net boot process attempts to display on the screen any ascii data at the start of the image. The net boot process then exits in the normal manner. For a boot prom, this will allow normal disk booting. (gk)Reference to DOS deleted.(gk) ​
 +
 +When the first 512 bytes have been loaded, the boot process checks for an initial magic number, which is defined later. If this number is present, the net process continues loading under the control of the image format. The image, which is described later, tells the net boot process where to put this record and all subsequent data. 
 +
 +If no initial magic number is present the net boot process checks for a second magic number at offset 510. If the magic number 510 = 55h, 511 = AAh, then the net process continues. If this second magic number is not present, then the net boot process terminates the tftp protocol, displays an error message and exits in the normal manner. ​
 +
 +If no initial magic number is present and the second one is, the net boot process relocates the 512 bytes to location 7c00h. The net boot process continues to load any further image data to 10000h up. This data can overwrite the first 512 boot bytes. If the image reaches 98000h, then any further data is continued to be loaded above 100000h. When all the data has been loaded, the net boot process jumps to location 0:​7c00. ​
 +(gk)When the net boot program calls the image, it places 2 far pointers onto the stack, in standard intel order (e.g. segment:​offset representation). The first far pointer which immediately follows the return address on the stack, points to the loaded boot image header. The second far pointer which is placed above the first one, shows to the memory area where the net boot process saved the bootp reply. ​
 +
 +If the boot image is flagged as being returnable to the boot process, the boot program has to provide the boot image with interrupt vector 78h. It's an interface to services provided by the net boot program (see below for further description). ​
 +
 +If the boot image is not flagged as being returnable to the boot process, before the boot image is called, the boot program has to set the system into a state in which it was before the net boot process has started.(gk)
 +
 +
 ==== Image Format with Initial Magic Number ==== ==== Image Format with Initial Magic Number ====
 +The first 512 bytes of the image file contain the image header, and image loading information records. This contains all the information needed by the net boot process as to where data is to be loaded. ​
 +
 +The magic number (in time-honoured tradition (well why not?)) is: 
 +<​file>​
 +        0 = 36h
 +        1 = 13h
 +        2 = 03h
 +        3 = 1Bh
 +</​file>​
 +Apart from the two magic numbers, all words and double words are in PC native endian. ​
 +
 +Including the initial magic number the header record is: 
 +<​file>​
 +        +---------------------+
 +        |                     |
 +        | Initial Magic No.   ​| ​ 4 bytes
 +        +---------------------+
 +        |                     |
 +        | Flags and length ​   |  double word
 +        +---------------------+
 +        |                     |
 +        | Location Address ​   |  double word in ds:bx format
 +        +---------------------+
 +        |                     |
 +        | Execute Address ​    ​| ​ double word in cs:ip format
 +        +---------------------+
 +</​file>​
 +The Location address is where to place the 512 bytes. The net boot process does this before loading the rest of the image. The location address cannot be one of the reserved locations mentioned above, but must be an address lower than 100000h. ​
 +
 +The rest of the image must not overwrite these initial 512 bytes, placed at the required location. The writing of data by the net boot process into these 512 bytes is deprecated. These 512 bytes must be available for the image to interogate once it is loaded and running. ​
 +
 +The execute address is the location in cs:ip of the initial instruction once the full image has been loaded. This must be lower than 100000h, since the initial instructions will be executed in 8086 mode. When the jump (actually a far call) is made to the boot image, the stack contains a far return address, with a far pointer parameter above that, pointing to the location of this header. ​
 +(ky) If bit 31 in the flags is set, then the execute address is interpreted as a linear 32-bit address, and a call is made to this address. There is no restriction on the range of the execute address. The arguments to the routine are: a pointer to an Etherboot specific header, a pointer to the tagged image header, and a pointer to the bootp reply. The called routine may return to the boot loader.(ky) ​
 +
 +The flags and length field is broken up in the following way: 
 +  * Bits 0 to 3 (lowest 4 bits) define the length of the non-vendor header in double words. Currently the value is 4. 
 +  * Bits 4 to 7 define the length required by the vendor extra information in double words. A value of zero indicates no extra vendor information. ​
 +  * (gk)Bit 8 is set if the boot image can return to the net boot process after execution. If this bit is not set the boot image does never return to the net boot process, and the net boot program has to set the system into a clean state before calling the boot image (gk) 
 +  * (ky)Bit 31 is set if the execute address of the boot image is a linear 32-bit address to be called. The boot image may return to the boot loader.(ky) ​
 +(gk+ky)Bits 9 to 30 are reserved for future use and must be set to zero.(gk+ky) ​
 +
 +After this header, and any vendor header, come the image loading information records. These specify where data is to be loaded, how long it is, and communicates to the loaded image what sort of data it is. 
 +
 +The format of each image loading information record is :
 +<​file>​
 +          +---------------------+
 +          | Flags, tags and     ​| ​ double word
 +          | lengths ​            |
 +          +---------------------+
 +          |                     |
 +          | Load Address ​       |  double word
 +          +---------------------+
 +          |                     |
 +          | Image Length ​       |  double word
 +          +---------------------+
 +          |                     |
 +          | Memory Length ​      ​| ​ double word
 +          +---------------------+
 +</​file>​
 +Each image loading information record follows the previous, or the header. ​
 +
 +The memory length, image length and load address fields are unsigned 32 numbers. They do not have the segment:​offset format used by the 8086. 
 +
 +The flags, tags and lengths field is broken up as follows: ​
 +  * Bits 0 to 3 (lowest 4 bits) are the length of the non vendor part of this header in double words. Currently this value is 4. 
 +  * Bits 4 to 7 indicate the length of any vendor information,​ in double words. ​
 +  * Bits 8 to 15 are for vendor'​s tags. The vendor tag is a private number that the loaded image can use to determine what sort of image is at this particular location. ​
 +  * Bits 16 to 23 are for future expansion and should be set to zero. 
 +  * Bits 24 to 31 are for flags, which are defined later. ​
 +
 +Vendors may place further information after this information record, and before the next. Each information record may have a different vendor length. ​
 +
 +There are two restrictions on vendor information. ​
 +  * One is that the header and all information records that the net boot process is to use fall within the first 512 bytes. ​
 +  * The second restriction is that the net boot process must ignore all vendor additions. The net boot process may not overwrite vendor supplied information,​ or other undefined data in the initial 512 bytes. ​
 +
 +The flags are used to modify the load address field, and to indicate that this is the last information record that the net boot process should use. 
 +
 +Bit 24 works in conjunction with bit 25 to specify the meaning of the load address.
 +<​file>​
 +   ​B24 ​   B25
 +
 +   ​0 ​    ​0 ​   load address is an absolute 32 number
 +
 +   ​1 ​    ​0 ​   add the load address to the location one past the last byte
 +              of the memory area required by the last image loaded.
 +              If the first image, then add to 512 plus the location
 +              where the 512 bytes were placed
 +
 +   ​0 ​    ​1 ​   subtract the load address from the one past the
 +              last writeable location in memory. Thus 1 would
 +              be the last location one could write in memory.
 +
 +   ​1 ​    ​1 ​   load address is subtracted from the start of
 +              the last image loaded. If the first image, then
 +              subtract from the start of where the 512 bytes were
 +              placed
 +</​file>​
 +(For convenience bit 24 is byte 0 of the flag field) ​
 +
 +Bit 26 is the end marker for the net boot process. It is set when this is the last information record the net boot process should look at. More records may be present, but the net boot process will not look at them. (Vendors can continue information records out past the 512 boundary for private use in this manner). ​
 +
 +The image length tells the net boot process how many bytes are to be loaded. Zero is a valid value. This can be used to mark memory areas such as shared memory for interprocessor communication,​ flash eproms, data in eproms. ​
 +
 +The image length can also be different from the memory length. This allows decompression programs to fluff up the kernel image. It also allows a file system to be larger then the loaded file system image. ​
 +
 +Bits 27 through 31 are not defined as yet and must be set to zero until they are.
 +
 ==== Boot prom entry points ==== ==== Boot prom entry points ====
 +(gk)As mentioned above the net boot process has to provide interrupt 78h as an entry point in case, the returnable flag (bit 9 of the flags field in the image header) of the boot image has been set. When calling this interface interrupt, the caller has to load the AH register with a value indicating the type of operation requested: ​
 +<​file>​
 +00h  -  Installation check
 +        Input: ​ none
 +        Output: AX  -  returns the value 474Bh
 +                BX  -  flags indicating what further services are
 +                       ​provided by the net boot program:
 +                        Bit 0 - packet driver interface (see below)
 +                        Bits 1 to 15 are unused and have to be zero
 +
 +01h  -  Cleanup and terminate the boot process services. This will
 +        also remove the services provided by interrupt 87h.
 +        Input: ​ none
 +        Output: none
 +</​file>​
 +   ​Further functions are not yet defined. These functions are only available to boot images which have the first magic number at the beginning of the image header, and have the returnable flag set in the flags field. ​
 +
 +In order to provide compatibility with net boot programs written to match an earlier version of this document, the loaded image should check for the existence of interrupt 78h by looking at it's vector. If that's 0:0, or if it does not return a proper magic ID after calling the installation check function, the boot image has to assume that the net boot program does not support this services interrupt. ​
 +
 +If the bit 0 of register BX of function 00h is set, the boot program has to provide a packet driver interface at interrupt 79h as described in the packet driver interface standard, version 1.09, published by FTP Software, Inc., which is not repeated here. It serves as an interface to the system'​s network card. It is important to note that the net boot process has to provide a clean packet driver interface without any handles being defined when the boot image gets started. It is expected that the boot image sets up it's own TCP/IP or other network'​s stack on top of this packet driver interface. When the boot image returns to the net boot process, it has to return a clean packet driver interface as well, without any handles being defined.(gk)
 +
 ==== Example of a boot image ==== ==== Example of a boot image ====
 +Here is an example of how the boot image would look for Linux: ​
 +<​file>​
 +  0x1B031336, ​ /* magic number */
 +  0x4,         /* length of header is 16 bytes, no vendor info */
 +  0x90000000, ​ /* location in ds:bx format */
 +  0x90000200, ​ /* execute address in cs:ip format */
 +
 +               /* 2048 setup.S bytes */
 +  0x4,         /* flags, not end, absolute address, 16 bytes this
 +                  record, no vendor info */
 +  0x90200, ​    /* load address - note format */
 +  0x800, ​      /* 4 8 512 byte blocks for linux */
 +  0x800,
 +
 +               /* kernel image */
 +  0x4,         /* flags, not end, absolute address, 16 bytes this
 +                  record, no vendor info */
 +  0x10000, ​    /* load address - note format */
 +  0x80000, ​    /* 512K (this could be shorter */
 +  0x80000,
 +
 +               /* ramdisk for root file system */
 +  0x04000004, ​ /* flags = last, absolute address, 16 bytes this
 +                  record, no vendor info *//
 +  0x100000, ​   /* load address - in extended memory */
 +  0x80000, ​    /* 512K for instance */
 +  0x80000,
 +               /* Then follows linux specific information */
 +</​file>​
 +
 ==== Terms ==== ==== Terms ====
 +When I say 'the net boot process',​ I mean the act of loading the image into memory, setting up any tables, up until the jump to the required location in the image. ​
 +
 +The net booting program executes the net boot process. The net boot program may be a rom, but not neccassarily. It is a set of instructions and data residing on the booting machine. ​
 +
 +The image, or boot image, consists of the data loaded by the net boot process. ​
 +
 +When I say 'the PC boot process',​ I mean the general PC rom bios boot process, the setting up of hardware, the scanning for adaptor roms, the execution of adaptor roms, the loading in of the initial boot track. The PC boot process will include the net boot process, if one is present. ​
 +
 +When I say client, I mean the PC booting up. 
 +
 +When I say 'image host', I mean the host where the boot image is comming from. This may not have the same architecture as the client. ​
 +
 +The bootp protocol is defined in RFC951 and RFC1084. The tftp protocol is defined in RFC783. These are available on many sites. ​
 +
 +A bootp server is the machine that answers the bootp request. It is not neccessarily the image host. 
 +
 +"​Can"​ and "​may"​ means doesn'​t have to, but is allowed to and might. "​Must"​ means just that. "​Cannot"​ means must not.
 ==== References ==== ==== References ====
 + ​Stevens,​ W.R 1990, Unix Network Programming,​ Prentice Hall, Englewood Cliffs, N.J., 1990
 ===== Appendix B. Compression algorithm, 1 July 1997 ===== ===== Appendix B. Compression algorithm, 1 July 1997 =====
 +Markus Gutschke, gutschk AT math PERIOD uni-muenster PERIOD de 
 +
 +The compressor achieves an average compression rate of 60% of the original size which is on par with "​gzip"​. It seems that you cannot do much better for compressing compiled binaries. This means that the break even point for using compressed images is reached, once the uncompressed size approaches 1.5kB. We can stuff more than 12kB into an 8kB EPROM and more than 25kB into an 16kB EPROM. As there is only 32kB of RAM for both the uncompressed image and its BSS area, this means that 32kB EPROMs will hardly ever be required. ​
 +
 +The compression algorithm uses a 4kB ring buffer for buffering the uncompressed data. Before compression starts, the ring buffer is filled with spaces (ASCII character 0x20). The algorithm tries to find repeated input sequences of a maximum length of 60 bytes. All 256 different input bytes plus the 58 (60 minus a threshold of 2) possible repeat lengths form a set of 314 symbols. These symbols are adaptively Huffman encoded. The algorithm starts out with a Huffmann tree that assigns equal code lengths to each of the 314 symbols (slightly favoring the repeat symbols over symbols for regular input characters),​ but it will be changed whenever the frequency of any of the symbols changes. Frequency counts are kept in 16bit words until the total number of compressed codes totals 2^15. Then, all frequency counts will be halfed (rounding to the bigger number). For unrepeated characters (symbols 0..255) the Huffman code is written to the output stream. For repeated characters the Huffmann code, which denotes the length of the repeated character sequence, is written out and then the index in the ring buffer is computed. From this index, the algorithm computes the offset relative to the current index into the ring buffer. Thus, for typical input data, one would expect that short to medium range offsets are more frequent than extremely short or medium range to long range offsets. Thus the 12bit (for a 4kB buffer) offset value is statically Huffman encoded using a precomputed Huffman tree that favors those offset values that are deemed to be more frequent. The Huffman encoded offset is written to the output data stream, directly following the code that determines the length of repeated characters. ​
 +
 +This algorithm, as implemented in the C example code, looks very good and its operating parameters are already well optimized. This also explains why it achieves compression ratios comparable with "​gzip"​. Depending on the input data, it sometimes excells considerably beyond what "gzip -9" does, but this phenomenon does not appear to be typical. There are some flaws with the algorithm, such as the limited buffer sizes, the adaptive Huffman tree which takes very long to change, if the input characters experience a sudden change in distribution,​ and the static Huffman tree for encoding offsets into the buffer. The slow changes of the adaptive Huffman tree are partially counteracted by artifically keeping a 16bit precision for the frequency counts, but this does not come into play until 32kB of compressed data is output, so it does not have any impact on our use for "​etherboot",​ because the BOOT Prom does not support uncompressed data of more then 32kB (c.f. doc/​spec.doc). ​
 +
 +Nonetheless,​ these problems do not seem to affect compression of compiled programs very much. Mixing object code with English text, would not work too well though, and the algorithm should be reset in between. Actually, we might gain a little improvement,​ if text and data segments were compressed individually,​ but I have not experimented with this option, yet.
 ===== Appendix C. Copying conditions of the compression code, 1 July 1997 ===== ===== Appendix C. Copying conditions of the compression code, 1 July 1997 =====
 +Markus Gutschke, gutschk AT math PERIOD uni-muenster PERIOD de 
 +
 +The compression code as implemented in "​lzhuf.c"​ was taken from a BBS program written by Joachim Schurig <​jschurig AT zedat PERIOD fu-berlin PERIOD de>. He states that the code can be used freely for programs that are covered by a "​freeware"​ license. This probably includes both BSD style licenses and the GPL. 
 +
 +The code in "​zloader.asm"​ is a reimplementation of the uncompressor. It has been written from scratch and is hereby placed under the conditions of the GNU General Public License (GPL). The algorithm is outlined in Appendix B. 
 +
 +Thus, there are no copyright problems with using this code, but there still might be difficulties with software patents. These patents are not legal in most parts of the world, but if you live in a country that honors software patents then you should verify that using these algorithms is legally permitted. Unless you are absolutely sure, that there are no legal obstacles, you should use the code for educational purposes only (this assumes that your educational institution is exempted from patent laws). The author cannot be held responsible for using the program code in violation of applicable local laws. 
 +
 +If you are aware of patents that might affect the legality of using the code in some parts of the world, please let me know. 

Navigation

* [[:start|Home]] * [[:about|About our Project]] * [[:download|Download]] * [[:screenshots|Screenshots]] * Documentation * [[:howtos|HowTo Guides]] * [[:appnotes|Application Notes]] * [[:faq:|FAQs]] * [[:doc|General Doc]] * [[:talks|Videos, Talks, and Papers]] * [[:hardwareissues|Hardware Issues]] * [[:mailinglists|Mailing lists]] * [[http://support.etherboot.org/|Bugtracker]] * [[:contributing|Contributing]] * [[:editing_permission|Wiki Edit Permission]] * [[:wiki:syntax|Wiki Syntax]] * [[:contact|Contact]] * [[:relatedlinks|Related Links]] * [[:commerciallinks|Commercial Links]] * [[:acknowledgements|Acknowledgements]] * [[:logos|Logo Art]]

QR Code
QR Code dev:devmanual (generated for current page)