GSoC 2007 - Codesize Reduction Project

Things done so far

This is a summary of the things I did so far (including those that didn't work), in roughly chronological order.

  • Get LXR running and use it on the Etherboot source.

Status: Didn't seem as useful as I had hoped. Will revisit and investigate other options once I get to the full attribute makeover.

  • Go through other programs and check their Makefiles to discover potentially interesting compiler options.

Status: (Re)discovered the extremly useful stack alignment option in the Linux Kernel. Has been the biggest win so far, already merged.

  • check effects of some other options, write script for mass compiles with different sets, another one to compare the effects.

Status: useful options checked in to gccopt branch. separate branch for packed enums. (not merged yet) Not so useful options included -fno-inline (due to lots of always inline functions used - may still be a win for certain modules → keep in mind for the attributes) and -fpack-struct (potentially incompatible data structures, supposedly already used where possible). did some preliminary tests on inline limits and other inline options, too. will revisit later before actually using always-inline or noinline attributes.

  • prepare code for conversion to regparm option (and maybe rtd as well)

Status: went through the code, located the functions that need standard calling conventions. committed to gccopt branch, not merged (or reviewed) yet. hope i got them all. alternative may be attributing individual functions, but as i can't imagine a case where it's not a win, just attributing those where it must not be used should be better.

  • prepare zalloc changes, go through all occurences of malloc and see if that memory is cleared soon after.

Status: branch merged (think i got them all, had some false positives at first though → reverted)

  • go through make symcheck output, make appropriate changes to code

Status: went through a lot of output, picked the ones that seem not too controversial, checked in to symcheck branch. (not merged yet). Further plan is addressing the problem at the root and reduce the symcheck output by eliminating false positives, then address the remaining unclear ones (i do not claim to understand all the code i have read ;) No real solution discovered (yet), hackish ideas (eliminate by suffix, eliminate by exporting another symbol that says false positive) set aside for now. did research if better tools than nm and objdump exist, haven't found any yet.

  • test malloc attribute in preparation for the complete go-through all attributes.

Status: unfortunately no difference at all. local branch exists, may be included in the attr changes for completeness.

Things done in the 2nd half of SoC

  • Use LXR on current master, look into other tools if necessary (OpenGROK still looks nice, but I haven't set it up so far)

This took a bit longer than expected. OpenGROK required some serious Tomcat investigations to find out which variables come from where and still get overridden from where else. LXR is easier - once you figure out that the preliminary GIT support doesn't really work and set it up on a plain old source directory instead. Both don't seem to offer a way to easily see which functions are called most, but otherwise they're both pretty decent source code browsers. let me know if we want anything like that on rom.

  • repeat the symcheck work on current master. should be easier now. the new make symcheck has reduced the output substantially, and i still have the old docs. status: current state of affairs merged, some open questions.
  • redo the regparm work. shouldn't be too hard now that i know what annotations are really needed to make regparm work. status: merged.
  • Revisit Compiler Options. Status: Found two or three more options that reduce codesize. Investigating the effects of -finline-limit (8 seems like the optimal setting right now), -fgcse-after-reload seems to help a bit too. Got gcc 4.2.1 to compile and testing the new options. status: partly merged. (gcc 4.2.1 compile fixes, makefile cleanups) mcb30 felt that module specific optimizations were unmaintainable. inline-limit might go in later.
  • Break up string.c in used and unused functions. also set pure attributes. status: done, not yet merged.
  • optimizing memset. status: should be optimal in size now (also all the other functions in string.h) checked in as “memset” branch. this is an overall win in codesize (most notably for undinet, multiboot and monojob). warnings mentioned here earlier have been successfully silenced, also the code saw a lot more optimizations (memmove optimization, memcpy break-even checked and adjusted, cx load trick for all possible functions and values).
  • included attributes on hci subdir (branch for that is slightly incorrectly named “curses”). also split up curses a bit further, but only marginally (most of it isn't used anyway) - branch that includes this in addition to the former is “cursessplit”
  • biggest win with attributes was not to inline certain functions (as earlier results on -fno-inline and -finline-limit suggested). investigated all the functions that turned out to decrease in size with a more aggressive inline-limit and found several cases where the effect could be pinpointed to a function or two. set these to noinline (because otherwise, gcc's default inline limit with -Os seems mostly fine - i think it just doesn't take the lower call overhead with regparm properly into account). most interesting result was that for minimal codesize with the new regparm calling conventions, we actually don't want to have the list functions static inline anymore (even the really simple __list_del is only marginally a win inlined, the others aren't). branch named “inline” committed (which is based on the memset branch because before that memset/memmove would also have been candidates for noinline)
  • other than noinline in certain cases, attributes unfortunately turned out to be mostly insignificant for code size. nonnull helps only in special cases (by optimizing away any explicit pointer checks that might be inside), but as it gives us additional compile time checks, it's probably still a good idea (included them in the parts that were committed for other reasons, see above). pure mostly makes no difference either (haven't seen effects of more than +/- 2-3 bytes, seems to even out, and we don't have a lot of them - same). don't think i spotted a const function yet.
  • commited the malloc annotations (to strings branch, as that wasn't merged yet anyway)

Things to do now that we have "pencils down" for code

  • write up some docs for the conversions that helped, give some background, maybe illustrated by examples of the generated code before and after.
  • write up docs on when functions (or other symbols) should be attributed by what.

QR Code
QR Code people:holger (generated for current page)