====== Stefan Hajnoczi: GDB Remote Debugging ====== ===== Week 3 ===== **Milestones:** * GDB remote debugging over UDP. * Hardware watchpoint support for trapping read/write/execute on memory. ==== Mon Jun 9 ==== I had a refreshing weekend with only a little bit of Etherboot hacking on Saturday. The weather was excellent and I was able to recharge batteries a bit :-). The past two weeks of Summer of Code have been an excellent experience. I've had the chance to look behind the scenes of Etherboot and to pitch in and help out. My current focus is on GDB remote debugging over UDP. I think this feature will be very useful since it let's you debug gPXE without using a serial cable. It opens the door to letting developers analyze errors over the internet without having to reproduce them locally from a bug report. I'd really like to see this used in cases where developers do not have hardware on which the error occurs. Today I implemented UDP send and receive without using the network stack. This means hand-crafting Ethernet, IP, UDP packets and parsing them. One detail I hadn't though of is that ARP reply code is also necessary to advertise gPXE's MAC/IP addresses. GDB can already talk to gPXE via UDP. The conversation does not go well though. It deadlocks after a few exchanges and before handshaking is complete when neither side wants to say any more. I think this is either because of bugs in my UDP send/receive code or due to flow control issues when using UDP. The GDB protocol is sensitive to packet ordering. I have tried to implement the GDB protocol in a robust way, so I suspect the error lies in the UDP code. Tomorrow I'll look into this more. Meanwhile, here is a screenshot from [[http://www.wireshark.org/|Wireshark]]: {{:soc:2008:stefanha:journal:udp.png|GDB and gPXE trying to talk to each other over UDP.}} ==== Tue Jun 10 ==== Yesterday's issues were caused by bugs in the UDP send/receive code. UDP debugging now works and passes the test suite. It's fun to watch because the test suite completes so much faster than when using serial. My code for UDP still needs a lot of clean-up and some policy decisions. Here are the open questions: * How to set the network device and IP/port on which to listen? Compile-time option or gPXE shell command? * How to split the serial and UDP transports? Allow both to be compiled in? Choose at runtime or try to use both? I have an idea for avoiding recursion when breakpoints are placed in code used by the GDB stub. Implement hardware/memory breakpoint support in the GDB stub so that we know where breakpoints are set. Upon entering the interrupt handler, temporarily clear all breakpoints. Now all code can be run without fear of recursing into the breakpoint interrupt handler. Upon leaving the interrupt handler, restore all breakpoints. ==== Wed Jun 11 ==== Git commit: [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=6f668fd80ada1d796c482106139139ada25de663|6f668fd80ada1d796c482106139139ada25de663]] **Committed remote debugging over UDP**. The GDB stub can now be built with serial and/or UDP support. The ''config.h'' options are ''GDBSERIAL'' and ''GDBUDP''. To enter the debugger, execute the gPXE shell command: gdbstub [...] Where '''' is ''serial'' or ''udp''. For ''udp'', the name of a configured network device is required, e.g.: ifopen net0 set net0/ip 192.168.0.2 gdbstub udp net0 The GDB stub listens on UDP port 43770 by default. It was fun to run the test suite against a real machine with a VIA Rhine II NIC booting gPXE from USB. I am interested in feedback about the code. I'm eager to see what reactions it raises and how I can improve it. **In the meantime, hardware watchpoint support will keep me busy**. I got off to a slow start because remote watchpoints are broken in GDB 6.7.1 - 6.8 (at least). Figuring out the bug and how to fix it took a while. I also asked on ''#gdb'' and later found out that the bug is fixed in CVS head. Anyway, this means watchpoints will only work if you run GDB post-6.8. The watchpoint commands are now implemented. The x86 architecture has support for 4 hardware breakpoints. These can either be normal breakpoints or watchpoints. The GDB stub implementation does not support hardware breakpoints in order to keep things simple. I am holding off on a full-blown breakpoint implementation since that may involve implementing memory breakpoints too. Currently GDB manages memory breakpoints and the GDB stub is unaware of them. The original GDB stub code with serial-only support was very minimal. I like concise code. When I added UDP, the code needed to be refactored into more parts. Watchpoints, again, add new functionality. Hopefully we are close to feature-complete, because the GDB stub shouldn't turn into a subsystem of its own. I am going to write tests and polish the code tonight with a commit coming tomorrow. ==== Thu Jun 12 ==== Git commits: * [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=5a1db5bd3c8310f9797add6e75383b00dc9d2d15|[Drivers-via-rhine] read/write instead of in/out]] * [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=bbcd45d10048c95e009c5b5c69c5d93cd74f3cc6|[GDB] Add watch and rwatch hardware watchpoints]] Michael Decker formats commits nicely in [[:soc:2008:mdeck:journal:week3|his journal]] so I'm copying him :-). **Watchpoints now work in GDB**. The ''watch '' command sets a write watchpoint. The watchpoint fires when the given memory location is written. Similarly, the ''awatch '' command sets a read/write watchpoint and fires when the memory location is read or written. Watchpoints are orthogonal to breakpoints, they give you another dimension of trapping events during debugging. They are an excellent tool for tracking down variable accesses or memory corruption. Without watchpoints, you'd have to add ''printf()'' calls or periodically break into the debugger to check if memory has changed. And that's no fun. Why are ''watch'' and ''awatch'' supported but not ''rwatch''? The answer has to do with how watchpoints are implemented, so let's take a look... The x86 has hardware debugging support that you can access via the debug registers, ''dr0'' through ''dr7''. The control register, ''dr7'', lets you enable up to four hardware breakpoints. Hardware breakpoints can be normal execution breakpoints, write watchpoints, read/write watchpoints, or port I/O watchpoints. The x86 does not support pure read watchpoints. The GDB stub watchpoint code is almost capable of doing normal execution breakpoints. However, there is some extra behavior necessary involving setting the Resume Flag (RF) to avoid repeatedly breaking on the same instruction without advancing EIP. I don't see a need for normal execution breakpoints versus GDB's software breakpoints. Implementing the debug register code required me to rewire the way the GDB stub receives control on interrupt. Previously the interrupt handler would call the portable GDB stub directly. Now it calls an architecture-specific handler, which calls the portable GDB stub code after it has configured the debug registers. **Used watchpoints to find and fix a bug in the Via Rhine driver**. Last week I tracked down a memory corruption where 0x00000000 and 0x00000004 were being written to. At the time I wished I had watchpoints so I added them to the TODO list. The first thing I did once watchpoints were implemented is ''awatch _entry'', where ''_entry'' is the symbol at 0x00000000. This will detect NULL pointer reads/writes. To my surprise the watchpoint immediately fired! The backtrace showed that ''rhine_poll()'' was trying to read from 0x00000000. It turned out that the driver was using ''readb()''/''writeb()'' in a few places whereas it should be using ''inb()''/''outb()''. The ''readb()''/''writeb()'' functions do memory I/O while the ''inb()''/''outb()'' functions do port I/O. Implementing watchpoints has paid off ;-)! ==== Fri Jun 13 ==== Git commits: * [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=8ec13694a44779156d679af99a104aeb3bbfdb53|[GDB] Zero-extend 16-bit segment registers]] * [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=03d22bf5e31348e3f3ede48500c981761f367651|[GDB] UDP clean up and add netdev refcnt]] **Segment registers sometimes contained junk values**. The test suite reported that the ''DS'' segment register had the wrong value when running on real hardware. Most of my past development and testing has been in QEMU. Upon closer inspection the lower 16 bits of ''DS'' were correct. On older processors, the upper 16 bits are undefined whereas they are guaranteed to be zero on newer processors. gPXE runs correctly since only the lower 16 bits of segment selectors are used by the CPU. Although it is technically okay for the upper 16 bits to be undefined, I think it is nicer if we zero-extend segment registers when reporting their values to GDB. This makes it easier to write test cases and is less confusing for users. **Weekly meeting with mdc and mcb30**. Things are looking good for cleaning up and merging the second round of GDB stub work: * Remote debugging over UDP * Watchpoints * Atomic read/write for device memory * Continue on detach/kill from GDB Making the merge happen is my immediate goal. An interesting opportunity for another iteration of development is 16-bit real mode debugging. If GDB can hold up to the pressures of real mode, then I will implement stub support. ===== Next week ===== On to [[.:week4|week 4]].