Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
|
soc:2010:peper:notes:usermode_explained [2010/06/05 15:10] peper created |
soc:2010:peper:notes:usermode_explained [2010/06/14 17:00] (current) peper |
||
|---|---|---|---|
| Line 39: | Line 39: | ||
| ==== Linking to stdlib (glibc) ==== | ==== Linking to stdlib (glibc) ==== | ||
| + | |||
| + | **UPDATE**: That approach has been moved to a separate ''linuxlibc'' ''PLATFORM'' and is available on the [[http://git.etherboot.org/?p=people/peper/gpxe.git;a=shortlog;h=refs/heads/linuxlibc|linuxlibc branch]]. | ||
| Despite being non-trivial, forcing some compile flags to be disabled (namely ''-mrtd'' and ''-mregparm'' mentioned earlier) and having [[#the_other_problem_with_stdlib|some other problems]] linking to stdlib was still the quickest for prototyping. | Despite being non-trivial, forcing some compile flags to be disabled (namely ''-mrtd'' and ''-mregparm'' mentioned earlier) and having [[#the_other_problem_with_stdlib|some other problems]] linking to stdlib was still the quickest for prototyping. | ||
| Line 85: | Line 87: | ||
| } | } | ||
| </code> | </code> | ||
| + | |||
| + | === Prefix === | ||
| + | |||
| + | stdlib's ''_start'' takes care of everything so the prefix code is empty. | ||
| + | |||
| ==== Being self-contained ==== | ==== Being self-contained ==== | ||
| - | Work in progress. | + | To overcome the problems with linking to stdlib we need to implement some of its elementary features ourselves. |
| + | |||
| + | === Linker script === | ||
| + | |||
| + | A good read for starters is [[http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/gnu-linker/index.html|Using ld, the Gnu Linker]]. | ||
| + | With that backgrund the currently used linker scirpts (''arch/*/scripts/*.lds'') should make more sense. | ||
| + | |||
| + | As we are not going to be linking against stdlib, the linker script should be really simple. | ||
| + | In fact it turned out that there is already a simple enough linker script used for efi (''arch/x86/scripts/efi.lds'') that can be used more or less out of the box. | ||
| + | The only necessary modification is setting the start of the Text segment properly, because not every value works (you can try ''0x0'' and see :) | ||
| + | We can see what's the convention by looking at how the default linker script does it | ||
| + | by passing ''--verbose'' to ''ld'' while compiling a simple program in 32bit and 64bit mode. | ||
| + | |||
| + | <code> | ||
| + | $ gcc -m32 foo.c -o foo -Wl,--verbose | ||
| + | $ gcc -m64 foo.c -o foo -Wl,--verbose | ||
| + | </code> | ||
| + | |||
| + | From that we can gather that ''i386'' uses ''0x08048000'' and ''x86_64'' uses ''0x400000'' as the start address. | ||
| + | I haven't been able to find a good explanation on why these are used in particular. Moreover many other values also seem to be working. | ||
| + | Other way of figuring out the specific values is reading [[http://www.sco.com/developers/devspecs/abi386-4.pdf|i386 ABI]] (page 48) | ||
| + | and [[http://www.x86-64.org/documentation/abi.pdf|AMD64 ABI]] (page 26). | ||
| + | |||
| + | === Prefix (_start) === | ||
| + | |||
| + | ''_start'' being the default ''ENTRY'' point is the very first thing that's executed when a new process receives control. | ||
| + | What we want to do in ''_start'' is the minimal work necessary to actually call our ''main()'' function. | ||
| + | |||
| + | To accomplish that we need to know 3 things: | ||
| + | * What's the state of things when ''_start'' is executed | ||
| + | * How to actually call ''main()'' | ||
| + | * What to do when ''main()'' returns | ||
| + | |||
| + | The state of the stack and registers at the time of ''_start'' execution is descrbed in | ||
| + | [[http://www.sco.com/developers/devspecs/abi386-4.pdf|i386 ABI]] (page 54) | ||
| + | and [[http://www.x86-64.org/documentation/abi.pdf|AMD64 ABI]] (page 28). | ||
| + | |||
| + | The function calling convention is also desribed in the ABI docs: [[http://www.sco.com/developers/devspecs/abi386-4.pdf|i386 ABI]] (pages 36-38) | ||
| + | and [[http://www.x86-64.org/documentation/abi.pdf|AMD64 ABI]] (pages 15-23). A nice overview is [[http://www.agner.org/optimize/calling_conventions.pdf|calling conventions]]. | ||
| + | |||
| + | What we need to do after ''main()'' returns is to call the ''exit'' syscall. Details on that are in the next section. | ||
| + | |||
| + | To actually make use of all that information we need to learn GNU Assembler first though. | ||
| + | I haven't been able to find any too good docs on it and certainly nothing resembling a tutorial. | ||
| + | Look at [[http://sig9.com/articles/att-syntax|quick syntax]], [[ftp://ftp.estec.esa.nl/pub/ws/wsd/erc32/doc/as.pdf|manual]] and [[http://tigcc.ticalc.org/doc/gnuasm.html|manual2]]. | ||
| + | |||
| + | Following simplified ''_start''s should make sense now: | ||
| + | |||
| + | ''arch/i386/prefix/linuxprefix.S'': | ||
| + | <code asm> | ||
| + | _start: | ||
| + | xorl %ebp, %ebp // ABI wants us to zero the base frame | ||
| + | |||
| + | popl %esi // save argc | ||
| + | movl %esp, %edi // save argv | ||
| + | |||
| + | pushl %edi // argv -> C arg2 | ||
| + | pushl %esi // argc -> C arg1 | ||
| + | |||
| + | call main | ||
| + | |||
| + | movl %eax, %ebx // rc -> syscall arg1 | ||
| + | movl $__NR_exit, %eax | ||
| + | int $0x80 | ||
| + | </code> | ||
| + | ''arch/x86_64/prefix/linuxprefix.S'': | ||
| + | <code asm> | ||
| + | _start: | ||
| + | xorq %rbp, %rbp // ABI wants us to zero the base frame | ||
| + | |||
| + | popq %rdi // argc -> C arg1 | ||
| + | movq %rsp, %rsi // argv -> C arg2 | ||
| + | |||
| + | call main | ||
| + | |||
| + | movq %rax, %rdi // rc -> syscall arg1 | ||
| + | movq $__NR_exit, %rax | ||
| + | syscall | ||
| + | </code> | ||
| + | |||
| + | === Syscalls === | ||
| + | |||
| + | To provide the necessary kernel API (functions declared in ''include/linux_api.h'') we need a way to perform syscalls. | ||
| + | |||
| + | A simple way of doing that is implementing our own ''int syscall(int number, ...);'' | ||
| + | as ''long linux_syscall(int number, ...);'' and using that as the building block. | ||
| + | |||
| + | The syscall calling conventions is a bit different than normal function calling convention on both ''i386'' and ''x86_64''. | ||
| + | The [[http://www.x86-64.org/documentation/abi.pdf|AMD64 ABI]] (pages 123-124) is an informative section covering that for ''x86_64''. | ||
| + | For ''i386'' we can look at [[http://www.cin.ufpe.br/~if817/arquivos/asmtut/index.html#syscalls|i386 syscalls]]. | ||
| + | |||
| + | With that information we can implement our own ''syscall()''. | ||
| + | |||
| + | ''arch/i386/core/linux/linux_syscall.S'': | ||
| + | <code asm> | ||
| + | linux_syscall: | ||
| + | /* Save registers */ | ||
| + | pushl %ebx | ||
| + | pushl %esi | ||
| + | pushl %edi | ||
| + | pushl %ebp | ||
| + | |||
| + | movl 20(%esp), %eax // C arg1 -> syscall number | ||
| + | movl 24(%esp), %ebx // C arg2 -> syscall arg1 | ||
| + | movl 28(%esp), %ecx // C arg3 -> syscall arg2 | ||
| + | movl 32(%esp), %edx // C arg4 -> syscall arg3 | ||
| + | movl 36(%esp), %esi // C arg5 -> syscall arg4 | ||
| + | movl 40(%esp), %edi // C arg6 -> syscall arg5 | ||
| + | movl 44(%esp), %ebp // C arg7 -> syscall arg6 | ||
| + | |||
| + | int $0x80 | ||
| + | |||
| + | /* Restore registers */ | ||
| + | popl %ebp | ||
| + | popl %edi | ||
| + | popl %esi | ||
| + | popl %ebx | ||
| + | |||
| + | cmpl $-4095, %eax | ||
| + | jae 1f | ||
| + | ret | ||
| + | |||
| + | 1: | ||
| + | negl %eax | ||
| + | movl %eax, linux_errno | ||
| + | movl $-1, %eax | ||
| + | ret | ||
| + | </code> | ||
| + | |||
| + | ''arch/x86_64/core/linux/linux_syscall.S'': | ||
| + | <code asm> | ||
| + | linux_syscall: | ||
| + | movq %rdi, %rax // C arg1 -> syscall number | ||
| + | movq %rsi, %rdi // C arg2 -> syscall arg1 | ||
| + | movq %rdx, %rsi // C arg3 -> syscall arg2 | ||
| + | movq %rcx, %rdx // C arg4 -> syscall arg3 | ||
| + | movq %r8, %r10 // C arg5 -> syscall arg4 | ||
| + | movq %r9, %r8 // C arg6 -> syscall arg5 | ||
| + | movq 8(%rsp), %r9 // C arg7 -> syscall arg6 | ||
| + | |||
| + | syscall | ||
| + | |||
| + | cmpq $-4095, %rax | ||
| + | jae 1f | ||
| + | ret | ||
| + | |||
| + | 1: | ||
| + | negq %rax | ||
| + | movl %eax, linux_errno | ||
| + | movq $-1, %rax | ||
| + | ret | ||
| + | </code> | ||
| + | |||
| + | With that in place we can implement most of the functions as simple wrappers: | ||
| + | <code c> | ||
| + | void * linux_mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) | ||
| + | { | ||
| + | return (void*)linux_syscall(__SYSCALL_mmap, addr, length, prot, flags, fd, offset); | ||
| + | } | ||
| + | |||
| + | void * linux_mremap(void * old_address, size_t old_size, size_t new_size, int flags) | ||
| + | { | ||
| + | return (void*)linux_syscall(__NR_mremap, old_address, old_size, new_size, flags); | ||
| + | } | ||
| + | </code> | ||
| + | Now you can see why our ''syscall()'' returns a ''long'' instead of an ''int''. Otherwise we wouldn't be able to return a pointer on ''x86_64''. | ||
| ===== Subsystems ===== | ===== Subsystems ===== | ||