Piotr Jaroszyński: Usermode debugging under Linux

Week 10 [ Jul 26 - Aug 1 2010 ]

drivers in userspace

The big issue

There is one thing that keeps bugging me in the drivers branch - UIO-DMA and malloc. UIO-DMA does the DMA mappings on a per device basis (as does kernel). I am not sure whether the mappings are really per device or global, but the API suggests the former or at least that it might change in the future (if it didn't with IOMMUs already). Anyway, because of that UIO-DMA needs a device to be initialized and that's why I added a very simple uio-dma-gpxe module which does just that. This isn't really an issue on its own (it might be even a plus as it forces to unbind any other drivers first), but it complicates the initialization logic in gPXE. Currently all the userspace setup of pci devices is done by the linux lpci driver. And that's way too late to setup malloc. To work around that I have added an option to use the last device setup with UIO-DMA for the mappings and introduced a UIO-DMA malloc_backend - here and here. The reason I use the tables API (and not single API) for the backends is that it needs to be able to fallback so that using the tap driver is still possible without the UIO-DMA modules being loaded and initialized).

I don't really like the solution and I am starting to think that doing a completely separate malloc with support for switching the memory pool used for DMA allocations (again, so that tap works) is the way to go. The reason I was reluctant to do so is that making the current allocator valgrindable took me a fair amount of time. Or I could look into making current malloc support that, but that will surely grow the code size.

Or I could make the malloc backend a single API (no need for fallback any more) and implement it in userspace like that:

  • in a __init_fn provide the malloc pool by mmap()ing a chunk of memory (like linux_umalloc does) - let's call it A
  • if an lpci device is going to be used:
    • allocate the same amount of memory as for A from UIO-DMA - let's call the new chunk B
    • copy A over to B
    • munmap() A (this isn't really needed as mremap() can take care of that)
    • mremap() B to the now free location of A

I like the last option most cause it uses the current malloc() implementation, works for tap and doesn't feel so hackish (just sophisticated ;). I am going to implement that soonish unless other ideas arise.

Update

“Sophisticated” didn't really work for Josh, but I have come up with something different. I have introduced separate memery pools for normal and DMA memory allocation with an API for switching the latter. See [malloc] Introduce memory pools and hide internal API

Smaller issues

  • currently gpxe.linux binary doesn't contain the tap driver - easily fixable
  • out/in* segfault in userspace if iopl() wasn't called - will probably have to check whether the ioports were initialized on each call
Update

These have been fixed. Moreover, slighty related to the first one, I have come up with a buildall branch, which allows building alldrivers builds on all supported arch/platform combinations and also adds a everything target that takes advantage of that and builds everything that I could think of. Should come in handy for testing patches. The changes are mostly trivial, but if you want to refresh your make-foo have a look at [build] Properly handle multiple goals per BIN directory.

Update

I have updated pretty much every commit in the drivers branch, adding comments and doing cleanup.


QR Code
QR Code soc:2010:peper:journal:week10 (generated for current page)