====== Piotr JaroszyƄski: Usermode debugging under Linux ====== ===== Week 6 [ Jun 28 - Jul 4 2010 ] ===== In the spirit of not building up the journal slackpile even more I will try to keep it up-to-date since now and work on the past in the meantime. ==== valgrind ==== The last 2 weeks (including a break for exams at uni) I have been working on making the usermode gPXE valgrindable. You may ask what's there to do as valgrind usually works out of the box for most applications. And that's a very good question, the thing is that usermode gPXE is a bit different, it doesn't use the allocator provided by ''stdlib'' and hence valgrind has no way to know what's going on. So the task was to decorate the two means of memory allocation in gPXE (''malloc'' and ''umalloc'') with valgrind api calls so that it knows what's going on. I will write a detailed note about that soonish, but for now here is what I came up with: * [[http://git.etherboot.org/?p=people/peper/gpxe.git;a=shortlog;h=refs/heads/valgrind|valgrind branch]] And a quick demo to show why it's useful. $ valgrind --leak-check=full --show-reachable=yes --track-origins=yes ./bin-x86_64-linux/tap.linux --net tap,if=tap3,mac=52:54:00:12:34:56 ==6730== Memcheck, a memory error detector ==6730== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==6730== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info ==6730== Command: ./bin-x86_64-linux/tap.linux --net tap,if=tap3,mac=52:54:00:12:34:56 ==6730== gPXE initialising devices... gPXE 1.0.1+ -- Open Source Boot Firmware -- http://etherboot.org Features: HTTP DNS TFTP DHCP (net0 52:54:00:12:34:56).... ok http://root.piotrj.org/files/gpxe/1mb. ok ==6730== ==6730== HEAP SUMMARY: ==6730== in use at exit: 1,185 bytes in 7 blocks ==6730== total heap usage: 2,224 allocs, 2,217 frees, 3,458,584 bytes allocated ==6730== ==6730== 1 bytes in 1 blocks are definitely lost in loss record 1 of 7 ==6730== at 0x400D90: realloc (malloc.c:313) ==6730== by 0x401FD3: strndup (string.c:345) ==6730== by 0x4090F5: image_set_cmdline (image.c:106) ==6730== by 0x4060A6: imgfill_cmdline (image_cmd.c:73) ==6730== by 0x4062DE: T.66 (image_cmd.c:163) ==6730== by 0x408BBD: system (exec.c:223) ==6730== by 0x40554F: script_exec (script.c:69) ==6730== by 0x408FCF: image_exec (image.c:266) ==6730== by 0x40074C: main (main.c:83) ... I have skipped the rest of the output as the first suspicious block is already interesting. Especially so as valgrind reports it as ''definitely lost''. What that means is that there isn't a single pointer pointing at it in the whole memory when the program exits and yet it wasn't freed. The stacktrace shows us where it was allocated. We see that ''image_set_cmdline()'' sets ''image->cmdline'' to a newly allocated string freeing the old one. That's all good unless we get rid of the image with ''free_image()'', which doesn't clean up the ''image->cmdline''. A simple fix follows: --- a/src/core/image.c +++ b/src/core/image.c @@ -47,6 +47,7 @@ struct list_head images = LIST_HEAD_INIT ( images ); static void free_image ( struct refcnt *refcnt ) { struct image *image = container_of ( refcnt, struct image, refcnt ); + free ( image->cmdline ); uri_put ( image->uri ); ufree ( image->data ); image_put ( image->replacement ); Also included in the [[http://git.etherboot.org/?p=people/peper/gpxe.git;a=shortlog;h=refs/heads/leaks|leaks branch]]. (I might move the demo to a separate note at some point) ==== drivers in userspace ==== At last it's time for drivers in userspace. The idea is to allow developing gPXE drivers in userspace by mapping the gPXE driver API to kernel **somehow**. The '**somehow**' indicates that I need to do a lot of research to tackle this idea, results of which I will try and document on the wiki. For now I am reading [[http://www.amazon.com/Essential-Device-Drivers-Sreekrishnan-Venkateswaran/dp/0132396556|Essential Linux Device Drivers]] and [[http://lwn.net/Kernel/LDD3/|Linux Device Drivers]] to get the basics. In the meantime I am setting up the development environment which is [[http://exherbo.org/|exherbo]] (the distro doesn't matter too much though, just need to get it to boot and run sshd) run under qemu-kvm so that the upcoming kernel oopses don't crash my box :) qemu-kvm is also a good choice because it can emulate multiple nics at the same time so I can use one for networking with the native linux driver while playing with the other in any way I please.