Table of Contents
Stefan Hajnoczi: GDB Remote Debugging
Week 4
Milestones:
- Get latest GDB stub work into mainline.
- Modern bzImage prefix for gPXE.
Mon Jun 16
The gdbstub2
branch is now ready for mainline review. Diffs against gPXE master
are here. Once it is merged I will update the documentation and encourage others to use GDB.
gPXE needs modern bzImage support so that GRUB, lilo, and SYSLINUX can load it. This is my next piece of work after the GDB stub. There is already code in etherboot to make a bzImage. The old code doesn't work by default on today's popular bootloaders since the Linux bzImage header it supplies is outdated. I am investigating what needs to be done for GRUB, lilo, SYSLINUX, etherboot, and gPXE to load a gPXE bzImage.
Tue Jun 17
Git commit:
I am trying out bootloaders on gpxe.lkrn
images. We were afraid that the outdated Linux zImage prefix no longer works with modern bootloaders. Here are results for unmodified gPXE (I have not yet attempted to implement bzImage):
- GRUB boots
gpxe.lkrn
successfully. Here is a script to create a GRUB/gPXE boot floppy:
#!/bin/sh set -e dd if=/dev/zero of=grub.img bs=1024 count=1440 losetup /dev/loop0 grub.img mkfs /dev/loop0 mount /dev/loop0 /mnt mkdir -p /mnt/boot/grub cp /boot/grub/stage1 /boot/grub/stage2 /mnt/boot/grub/ cat >/mnt/boot/grub/menu.lst <<EOF title=gPXE root (fd0) kernel /boot/gpxe.lkrn EOF cp bin/gpxe.lkrn /mnt/boot/ umount /mnt grub --device-map=/dev/null <<EOF device (fd0) /dev/loop0 root (fd0) setup (fd0) quit EOF losetup -d /dev/loop0
- SYSLINUX boots
gpxe.lkrn
successfully. Here is a script to create a boot floppy:
#!/bin/sh set -e dd if=/dev/zero of=syslinux.img bs=1024 count=1440 mkfs.msdos syslinux.img mount -o loop syslinux.img /mnt cp bin/gpxe.lkrn /mnt/gpxe.zi cat >/mnt/SYSLINUX.CFG <<EOF default gpxe.zi EOF umount /mnt syslinux syslinux.img
- lilo boots
gpxe.lkrn
unsuccessfully. QEMU stops with a triple-fault. I still need to look into this. Here is a script to create a boot floppy:
#!/bin/sh set -e dd if=/dev/zero of=lilo.img bs=1024 count=1440 losetup /dev/loop0 lilo.img mkfs /dev/loop0 mount /dev/loop0 /mnt mkdir /mnt/etc /mnt/boot cp bin/gpxe.lkrn /mnt/gpxe.zi cat >/mnt/etc/lilo.conf <<EOF boot =/dev/loop0 disk =/dev/loop0 bios =0x00 # 1.44MB disk geometry sectors =18 heads =2 cylinders =80 install =/mnt/boot/boot.b map =/mnt/boot/map backup =/dev/null image =/mnt/gpxe.zi EOF /tmp/lilo/sbin/lilo -C /mnt/etc/lilo.conf umount /mnt losetup -d /dev/loop0
- gPXE boots
gpxe.lkrn
unsuccessfully since only the newer bzImage and not the old zImage format is supported. Testing was easy:
qemu -bootp gpxe.lkrn -tftp bin bin/gpxe.usb
Updated lkrnprefix.S
to zImage 2.07. The image is still only a zImage since the non-real code loads at 0x10000. A bzImage loads non-real code at 0x100000, i.e. right after the 1 MB low memory. Perhaps gpxe.lkrn
can be a full bzImage, but I think that the A20 line will prevent us from accessing 0x100000.
- GRUB boots successfully.
- Lilo still fails. I need to investigate this, probably I'm not using it properly.
- SYSLINUX boots successfully.
- gPXE boots successfully with a small patch to
bzimage.c
. Need to discuss this with mcb30. - Etherboot boots successfully.
Wed Jun 18
Lilo still triple-faults when loading gpxe.lkrn
. I set up a virtual machine with Damn Small Linux to ensure a clean environment. The DSL kernel is boots successfully while gpxe.lkrn
fails. Here is the triple fault information from QEMU:
qemu: fatal: triple fault EAX=60000000 EBX=0000fee8 ECX=00002900 EDX=00001d8a ESI=0001ffff EDI=0000ff51 EBP=0000f9c4 ESP=0000f96e EIP=0000074c EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0018 00000000 ffffffff 00cf9300 CS =0008 0000f600 0000ffff 00009b00 SS =0010 00090000 0000ffff 00009309 DS =0018 00000000 ffffffff 00cf9300 FS =0018 00000000 ffffffff 00cf9300 GS =0018 00000000 ffffffff 00cf9300 LDT=0000 00000000 0000ffff 00008000 TR =0000 00000000 00000000 00000000 GDT= 0009f99c 0000001f IDT= 00000000 000003ff CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000 CCS=00000000 CCD=0000f97e CCO=ADDB FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 Aborted
I don't see an obvious clue in the crash dump, so I'll wait until after speaking with mcb30 about bzImage. If we decide to go in a different direction then I'd waste time debugging this.
In the meantime I'll investigate real-mode GDB debugging. I already tried set architecture i8086
for 16-bit disassembly. GDB still treats memory as a flat 32-bit space and will probably require some address translation inside the GDB stub.
Another thought I'm holding is that loading gpxe.lkrn
recursively fails. That potentially means you cannot load another zImage after gPXE has been loaded from gpxe.lkrn
. My theory is that gPXE has been loaded to the default zImage load address, i.e. 0x10000. If gPXE then tries to load another image there, it overwrites itself and crashes. It looks unlikely that gPXE is overwriting itself because it relocates as high up as possible.
Thu Jun 19
Git commit: [b44] Create skeleton driver for Broadcom 4401 NIC
Brought up ROM-o-matic for Etherboot top-of-git-tree. I have been occasionally assisting mdc with his ROM-o-matic.net online boot ROM generator. He recently enabled ROM-o-matic for gPXE top-of-git-tree. That way users can get ROMs for the latest development version of gPXE without having to set up a development environment and build from source. This is now also possible for Etherboot.
Beginning work to port Linux b44 (Broadcom 4401) driver. My laptop has a BCM4401-B0 NIC and is currently not supported by gPXE. The idea is to port the Linux driver to gPXE. I am looking forward to learning more about network drivers and device driver development in general.
Fri Jun 20
Git commit: [b44] Minimal TX path
The b44 driver is transmitting Ethernet frames. Thanks to Michael Decker's excellent gPXE Driver API Documentation I got the skeleton for the driver working very quickly last night. This morning I started porting the Linux b44 driver code.
After getting the initialization working (mainly by copy-paste) and reading the MAC address from the card, I decided to pursue the TX path. Getting transmit working early is useful since gPXE will attempt to do DHCP automatically and therefore needs to send packets.
Copy-pasting the Linux driver was not a good tactic since the Linux code is much more complex. Eventually I just focused on understanding how the hardware supports transmitting frames (there is no public documentation available!), and then implemented a simple TX path resembling the gPXE natsemi driver.
Sat Jun 21
Git commit: [b44] Working RX path
The b44 driver is receiving Ethernet frames. I just booted PXELINUX and HTTP-booted Linux 2.6.25 on this card for the first time! Getting the RX path working has been painful.
I think some of the Linux driver code is misleading/incorrect. Luckily there are drivers for OpenBSD, FreeBSD, and Solaris. Those drivers might even be based on the Linux driver, but they do some things differently and it helps to compare them to each other. My main issue with the RX path was a comment in the Linux driver claiming that the hardware writes a header structure 30 bytes before the DMA address of the I/O buffer.
This is false. The Linux driver does offset the DMA address by 30 bytes, but it also offsets the IO buffer by 30 bytes. In the end, it makes no difference and all that has happened is that 30 bytes of the IO buffer have been wasted. The header structure gets written to the DMA address, not before it.
The next steps for the b44 driver are cleaning it up, making it robust, and testing. Most of the initialization code is straight from the Linux driver. I want to get to grips with it and then simplify it for gPXE.
I have omitted performance optimizations from the Linux driver. The Linux driver has a “copy threshold” which dictates whether to copy a received packet to a fresh IO buf to hand off to the network stack, or whether to remove the current IO buf from the RX ring and pass it straight to the network stack (and allocating a fresh IO buf for the RX ring). I'll talk to Balaji about performance measurement since he's been optimizing his USB driver.
Lilo bzImage debugging still underway. I made a little bit of progress tonight by determining that the triple-fault happens in the call to install
. I think that EIP goes crazy somewhere inside install
and hence the triple fault. I'm sure the issue triggers inside install
since I've placed infinite loops before and after the call. The loop after the call never happens.
My current debugging cycle is by booting up Damn Small Linux in QEMU and copying over my latest gpxe.lkrn
, running lilo
, and rebooting into gPXE. This is slow and frustrating. I need to script it but my DSL install seems to be read-only.
Next week
On to Week 5.