Stefan Hajnoczi: GDB Remote Debugging

Week 7

Milestones:

  • [b44] Tested and clean for mainline review.
  • [gpxelinux.0] Merge Award BIOS return-to-PXE workaround.

Tue Jul 8

Git commits:

Progress on e820 memory map mangler. I finally made the push for an e820 memory map mangler that can clip regions into fragments. The existing e820 memory map mangler works well when gPXE hides the beginning and/or end of a memory region. The new mangler supports hidden memory regions anywhere, and any number of them. In the worst case, this means splitting a memory region into two or more fragments.

The existing e820 mangler has the nice property that it works on-the-fly. It does not need to take a snapshot of the entire e820 memory map. Instead, it does the necessary clipping at each point during a sequence of e820 calls.

MEMDISK has a different e820 mangler. It takes a snapshot of the entire e820 memory map and performs clipping once. The real benefit I see is that the actual e820 handler code is very simple; it just reads the next memory region from the map. If we did something similar in gPXE, it would mean that all the clipping and hiding code would be written in C, with only a small e820 handler in 16-bit assembly.

In the end, I didn't opt for the MEMDISK approach since you need to worry about storage for the e820 memory map snapshot. It would also involve rewriting more of our memory map code than simply extending what is already there.

The new algorithm works as follows:

def int_e820():
       for real_region in e820_memory_map:
               nfrags = 0
               for i in [0..len(hidden_regions) - 2]:
                       region = Region(real_region.start, real_region.end)
                       clipped = False
                       if hidden_regions[i].end_overlaps(region):
                               region.start = hidden_regions[i].end
                               clipped = True
                       if hidden_regions[i + 1].start_overlaps(region):
                               region.end = hidden_regions[i + 1].start
                               clipped = True
                       if hidden_regions[i].completely_overlaps(region):
                               region.start = region.end
                               clipped = True
                       if clipped:
                               nfrags += 1
                               if not region.is_empty():
                                       yield region
               # If no fragments were clipped, return the original region
               if nfrags == 0:
                       yield real_region

For every e820 region, the algorithm steps through each hidden region. Actually, it clips using the “current” hidden region and the “next” hidden region. The concept of ordered current and next hidden regions requires that hidden regions are sorted by start address.

The e820 region is clipped to the end of the current region and the start of the next region. If there was an overlap and a fragment was clipped, then it is returned.

If all hidden regions have been checked but no fragments were clipped, then the original region is unchanged and must be returned.

The hidden regions list has a [0x0, 0x0) region at the beginning and a [0xffffffff, 0xffffffff) region at the end. These dummy values make clipping against the first and last hidden regions easy, otherwise we would need special cases.

The pseudocode above is written with a continuous thread of control. However, the actual int 15h, e820 handler needs to be called for each fragment, so the full assembly code needs to manually manage iteration state and do continuation.

Wed Jul 9

Git commit:

Ported gpxelinux.0 changes to my new gpxelinux branch. SYSLINUX uses undionly.kpxe (keep UNDI loaded, no PCI support) with PXELINUX as an embedded image. HPA has added a workaround for buggy BIOSes that do not support int 18h from PXE NBPs.

We want to merge the workaround into mainline. On Monday, mcb30 and I discussed fingerprinting the buggy BIOS so gPXE can decide whether to exit via int 18h or by returning to PXE. A flag gets set when a buggy BIOS is detected. On exit, we check this flag and return via PXE if necessary.

Overall, the steps to get gpxelinux.0 cleanly merged are:

  1. Merge return-to-PXE code from gpxelinux.0, add buggy BIOS fingerprinting.
  2. Detecting an overwritten stack. We cannot return to PXE if the stack has been corrupted (say, in the attempt to boot an image).
  3. More control over shutdown() to distinguish between passing control to a successfully loaded image or asking for the next device to boot on failure. When passing control to an image that does not need gPXE services, we will unload everything. When asking for the next device to boot, we may keep the underlying PXE and UNDI.

Detecting an overwritten stack is a bit wierd. We want to return to PXE because int 18h is broken. In order to return, we need to make sure the stack has not been overwritten. If we determine the stack is unusable, then we are stuck - int 18h is broken and return to PXE is impossible! The policy I have coded for now is to reboot the machine.

Testing images available here:

  • gpxelinux.0 with buggy BIOS detection and return-to-PXE.
  • gpxelinuxf.0 with forced return-to-PXE. (The above image should only return-to-PXE on a buggy Award BIOS. Use this image to try return-to-PXE or if the detection isn't working.)

Please give them a shot. You can tell that return-to-PXE is being used when there is no message about freeing UNDI memory. Let me know whether buggy BIOS detection works and whether return-to-PXE correctly tries to boot from the next device.

For some reason my test machine hangs instead of booting the next image. I tried a dummy NBP with an instruction to set the return type and an lret. Perhaps my BIOS/PXE (Insyde + Intel BC + Broadcom UNDI) is buggy.

Fri Jul 11

Git commit:

Just had a productive meeting with mdc and mcb30. I am still learning new things about gPXE and the world it lives in every week.

The b44 driver itself has been sitting still for a few weeks. The reason for this is that it requires changes to gPXE's memory management. Getting the driver working has turned into a journey through DMA mapping, gPXE's memory allocators (malloc and umalloc), hidden memory regions, and into the E820 memory map mangler.

Today's meeting has brought me one step closer to what our memory management needs to look like in order to support devices with addressing limitations, like the BCM4401 NIC.

Next week

On to Week 8.


QR Code
QR Code soc:2008:stefanha:journal:week7 (generated for current page)