This is an old revision of the document!
====== Stefan Hajnoczi: GDB Remote Debugging ====== ===== Week 4 ===== **Milestones:** * Get latest GDB stub work into mainline. * Modern bzImage prefix for gPXE. ==== Mon Jun 16 ==== **The ''gdbstub2'' branch is now ready for mainline review**. Diffs against gPXE ''master'' are [[http://etherboot.org/share/stefanha/gdbstub2.diff|here]]. Once it is merged I will update the documentation and encourage others to use GDB. **gPXE needs modern bzImage support so that GRUB, lilo, and SYSLINUX can load it**. This is my next piece of work after the GDB stub. There is already code in etherboot to make a bzImage. The old code doesn't work by default on today's popular bootloaders since the Linux bzImage header it supplies is outdated. I am investigating what needs to be done for GRUB, lilo, SYSLINUX, etherboot, and gPXE to load a gPXE bzImage. ==== Tue Jun 17 ==== Git commit: * [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=bfd885802fd6af9938f2b703f6c48a9259cd7657|[bzImage] Make gpxe.lkrn a zImage 2.07]] **I am trying out bootloaders on ''gpxe.lkrn'' images**. We were afraid that the outdated Linux zImage prefix no longer works with modern bootloaders. Here are results for unmodified gPXE (I have not yet attempted to implement bzImage): * **GRUB** boots ''gpxe.lkrn'' successfully. Here is a script to create a GRUB/gPXE boot floppy: <code> #!/bin/sh set -e dd if=/dev/zero of=grub.img bs=1024 count=1440 losetup /dev/loop0 grub.img mkfs /dev/loop0 mount /dev/loop0 /mnt mkdir -p /mnt/boot/grub cp /boot/grub/stage1 /boot/grub/stage2 /mnt/boot/grub/ cat >/mnt/boot/grub/menu.lst <<EOF title=gPXE root (fd0) kernel /boot/gpxe.lkrn EOF cp bin/gpxe.lkrn /mnt/boot/ umount /mnt grub --device-map=/dev/null <<EOF device (fd0) /dev/loop0 root (fd0) setup (fd0) quit EOF losetup -d /dev/loop0 </code> * **SYSLINUX** boots ''gpxe.lkrn'' successfully. Here is a script to create a boot floppy: <code> #!/bin/sh set -e dd if=/dev/zero of=syslinux.img bs=1024 count=1440 mkfs.msdos syslinux.img mount -o loop syslinux.img /mnt cp bin/gpxe.lkrn /mnt/gpxe.zi cat >/mnt/SYSLINUX.CFG <<EOF default gpxe.zi EOF umount /mnt syslinux syslinux.img </code> * **lilo** boots ''gpxe.lkrn'' unsuccessfully. QEMU stops with a triple-fault. I still need to look into this. Here is a script to create a boot floppy: <code> #!/bin/sh set -e dd if=/dev/zero of=lilo.img bs=1024 count=1440 losetup /dev/loop0 lilo.img mkfs /dev/loop0 mount /dev/loop0 /mnt mkdir /mnt/etc /mnt/boot cp bin/gpxe.lkrn /mnt/gpxe.zi cat >/mnt/etc/lilo.conf <<EOF boot =/dev/loop0 disk =/dev/loop0 bios =0x00 # 1.44MB disk geometry sectors =18 heads =2 cylinders =80 install =/mnt/boot/boot.b map =/mnt/boot/map backup =/dev/null image =/mnt/gpxe.zi EOF /tmp/lilo/sbin/lilo -C /mnt/etc/lilo.conf umount /mnt losetup -d /dev/loop0 </code> * **gPXE** boots ''gpxe.lkrn'' unsuccessfully since only the newer bzImage and not the old zImage format is supported. Testing was easy: <code> qemu -bootp gpxe.lkrn -tftp bin bin/gpxe.usb </code> * **Etherboot 5.4.3** boots ''gpxe.lkrn'' successfully. I used [[http://freshmeat.net/projects/wraplinux/|wraplinux]] to make an NBI file from ''gpxe.lkrn''. **Updated ''lkrnprefix.S'' to zImage 2.07**. The image is still only a zImage since the non-real code loads at 0x10000. A bzImage loads non-real code at 0x100000, i.e. right after the 1 MB low memory. Perhaps ''gpxe.lkrn'' can be a full bzImage, but I think that the A20 line will prevent us from accessing 0x100000. * **GRUB** boots successfully. * **Lilo** still fails. I need to investigate this, probably I'm not using it properly. * **SYSLINUX** boots successfully. * **gPXE** boots successfully with a small patch to ''bzimage.c''. Need to discuss this with mcb30. * **Etherboot** boots successfully. ==== Wed Jun 18 ==== **Lilo still triple-faults when loading ''gpxe.lkrn''**. I set up a virtual machine with [[http://damnsmalllinux.org/|Damn Small Linux]] to ensure a clean environment. The DSL kernel is boots successfully while ''gpxe.lkrn'' fails. Here is the triple fault information from QEMU: <code> qemu: fatal: triple fault EAX=60000000 EBX=0000fee8 ECX=00002900 EDX=00001d8a ESI=0001ffff EDI=0000ff51 EBP=0000f9c4 ESP=0000f96e EIP=0000074c EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0018 00000000 ffffffff 00cf9300 CS =0008 0000f600 0000ffff 00009b00 SS =0010 00090000 0000ffff 00009309 DS =0018 00000000 ffffffff 00cf9300 FS =0018 00000000 ffffffff 00cf9300 GS =0018 00000000 ffffffff 00cf9300 LDT=0000 00000000 0000ffff 00008000 TR =0000 00000000 00000000 00000000 GDT= 0009f99c 0000001f IDT= 00000000 000003ff CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000 CCS=00000000 CCD=0000f97e CCO=ADDB FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 Aborted </code> I don't see an obvious clue in the crash dump, so I'll wait until after speaking with mcb30 about bzImage. If we decide to go in a different direction then I'd waste time debugging this. In the meantime I'll investigate real-mode GDB debugging. I already tried ''set architecture i8086'' for 16-bit disassembly. GDB still treats memory as a flat 32-bit space and will probably require some address translation inside the GDB stub. Another thought I'm holding is that loading ''gpxe.lkrn'' recursively fails. That potentially means you cannot load another zImage after gPXE has been loaded from ''gpxe.lkrn''. <del>My theory is that gPXE has been loaded to the default zImage load address, i.e. 0x10000. If gPXE then tries to load another image there, it overwrites itself and crashes</del>. It looks unlikely that gPXE is overwriting itself because it relocates as high up as possible. Next steps: * Update [[:dev:gdbstub|GDB stub page]] and screencast when UDP code is merged into mainline. See [[http://grub.enbug.org/DebuggingWithGDB|GRUB GDB wiki page]] for inspiration. * gPXE bzImage support. * Real-mode GDB stub.