POSIX APIs and System Calls
The difference between an API and a system call
API: Function definition that specifies how to obtain a given service
System Call: an explicit request to the kernel made via a software interrupt
Wrapper routine: routine whose only purpose is to issue a system call
POSIX standard refer to a set of APIs and not to system calls.
System Call Handler and System Routine
The conventions for return values of system calls are different from those of wrapper routines.
0 or positive integers indicate a successful termination of the system call while negative integers indicate a failure. It is the wrapper routines’ responsibility to set errno.
System call number: specify the system call to invoke
System call handler: similar to exception handlers
Save registers -> call system call service routine -> load registers and switch back to user mode
Naming rules:
System call: xxxx()
System service routine: sys_xxxx()
System call dispatch table
System call number ßà service routine
Entering and Exiting a system call
Two ways to enter and exit a system call
Enter: int $0x80 Exit: iret
Enter: sysenter Exit: sysexit
0x80 (128) in Interrupt Descriptor Table
Set_system_gate(0x80, &system_call)
(See interrupt, trap and system gates in chapter 4)
Parameter passing
System call numbers: (example)
#define __NR_restart_syscall (__NR_SYSCALL_BASE+ 0)
#define __NR_exit (__NR_SYSCALL_BASE+ 1)
#define __NR_fork (__NR_SYSCALL_BASE+ 2)
#define __NR_read (__NR_SYSCALL_BASE+ 3)
#define __NR_write (__NR_SYSCALL_BASE+ 4)
#define __NR_open (__NR_SYSCALL_BASE+ 5)
#define __NR_close (__NR_SYSCALL_BASE+ 6)
/* 7 was sys_waitpid */
#define __NR_creat (__NR_SYSCALL_BASE+ 8)
#define __NR_link (__NR_SYSCALL_BASE+ 9)
#define __NR_unlink (__NR_SYSCALL_BASE+ 10)
#define __NR_execve (__NR_SYSCALL_BASE+ 11)
#define __NR_chdir (__NR_SYSCALL_BASE+ 12)
#define __NR_time (__NR_SYSCALL_BASE+ 13)
#define __NR_mknod (__NR_SYSCALL_BASE+ 14)
#define __NR_chmod (__NR_SYSCALL_BASE+ 15)
The system call number is set by wrapper routines, so the programmer usually does not need to care about it.
è Set eax register to the system call number
è Write parameters into CPU registers (we cannot write parameters into stacks as usual because the system call cross both kernel stack and user mode stack.
è The kernel copies the parameters into the kernel mode stack
è Call int 0x80 or sysenter
(Ordinary C functions use parameters via a stack, either kernel mode stack or user mode stack.)
Because we use registers to pass parameters, two conditions must be satisfied:
1. The length of the parameter cannot exceed the length of a register.
2. The number of parameters cannot exceed six, besides the system call number passed by eax.
These two conditions implies to two things. One, large parameters must be passed by reference. Two, if more than six parameters are needed, a single register is used to point to a memory area in the process address space. Of course, the programmer need not care about this workaround. The wrapper routine will find the appropriate way to pass the parameters to the kernel.
Kernel Wrapper Routines
Although system calls are mainly used by User Mode processes, they can also be invoked by kernel threads, which cannot use library functions.
References:
Understanding the Linux Kernel, 3rd