什么是 vdso 和 vsyscall?

我做了 sudo cat /proc/1/maps -vv

我正试图弄清楚输出的意义。我可以看到许多共享库正如预期的那样被映射到内存映射段。

7f3c00137000-7f3c00179000 r-xp 00000000 08:01 21233923                   /lib/x86_64-linux-gnu/libdbus-1.so.3.5.8
7f3c00179000-7f3c00379000 ---p 00042000 08:01 21233923                   /lib/x86_64-linux-gnu/libdbus-1.so.3.5.8
7f3c00379000-7f3c0037a000 r--p 00042000 08:01 21233923                   /lib/x86_64-linux-gnu/libdbus-1.so.3.5.8
7f3c0037a000-7f3c0037b000 rw-p 00043000 08:01 21233923                   /lib/x86_64-linux-gnu/libdbus-1.so.3.5.8
7f3c0037b000-7f3c00383000 r-xp 00000000 08:01 21237216                   /lib/x86_64-linux-gnu/libnih-dbus.so.1.0.0
7f3c00383000-7f3c00583000 ---p 00008000 08:01 21237216                   /lib/x86_64-linux-gnu/libnih-dbus.so.1.0.0
7f3c00583000-7f3c00584000 r--p 00008000 08:01 21237216                   /lib/x86_64-linux-gnu/libnih-dbus.so.1.0.0
7f3c00584000-7f3c00585000 rw-p 00009000 08:01 21237216                   /lib/x86_64-linux-gnu/libnih-dbus.so.1.0.0
7f3c00585000-7f3c0059b000 r-xp 00000000 08:01 21237220                   /lib/x86_64-linux-gnu/libnih.so.1.0.0
7f3c0059b000-7f3c0079b000 ---p 00016000 08:01 21237220                   /lib/x86_64-linux-gnu/libnih.so.1.0.0
7f3c0079b000-7f3c0079c000 r--p 00016000 08:01 21237220                   /lib/x86_64-linux-gnu/libnih.so.1.0.0

到最后

7f3c0165b000-7f3c0177e000 rw-p 00000000 00:00 0                          [heap]
7fff97863000-7fff97884000 rw-p 00000000 00:00 0                          [stack]
7fff97945000-7fff97946000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

vdsovsyscall是什么意思?Vsyscall 是内存的核心部分吗?如果有人能解释一下这个问题,那就太好了。

44052 次浏览

The vsyscall and vDSO segments are two mechanisms used to accelerate certain system calls in Linux. For instance, gettimeofday is usually invoked through this mechanism. The first mechanism introduced was vsyscall, which was added as a way to execute specific system calls which do not need any real level of privilege to run in order to reduce the system call overhead. Following the previous example, all gettimeofday needs to do is to read the kernel's current time. There are applications that call gettimeofday frequently (e.g to generate timestamps), to the point that they care about even a little bit of overhead. To address this concern, the kernel maps into user space a page containing the current time and a fast gettimeofday implementation (i.e. just a function which reads the time saved into vsyscall). Using this virtual system call, the C library can provide a fast gettimeofday which does not have the overhead introduced by the context switch between kernel space and user space usually introduced by the classic system call model INT 0x80 or SYSCALL.

However, this vsyscall mechanism has some limitations: the memory allocated is small and allows only 4 system calls, and, more important and serious, the vsyscall page is statically allocated to the same address in each process, since the location of the vsyscall page is nailed down in the kernel ABI. This static allocation of the vsyscall compromises the benefit introduced by the memory space randomisation commonly used by Linux. An attacker, after compromising an application by exploiting a stack overflow, can invoke a system call from the vsyscall page with arbitrary parameters. All he needs is the address of the system call, which is easily predictable as it is statically allocated (if you try to run again your command even with different applications, you'll notice that the address of the vsyscall does not change). It would be nice to remove or at least randomize the location of the vsyscall page to thwart this type of attack. Unfortunately, applications depend on the existence and exact address of that page, so nothing can be done.

This security issue has been addressed by replacing all system call instructions at fixed addresses by a special trap instruction. An application trying to call into the vsyscall page will trap into the kernel, which will then emulate the desired virtual system call in kernel space. The result is a kernel system call emulating a virtual system call which was put there to avoid the kernel system call in the first place. The result is a vsyscall which takes longer to execute but, crucially, does not break the existing ABI. In any case, the slowdown will only be seen if the application is trying to use the vsyscall page instead of the vDSO.

The vDSO offers the same functionality as the vsyscall, while overcoming its limitations. The vDSO (Virtual Dynamically linked Shared Objects) is a memory area allocated in user space which exposes some kernel functionalities at user space in a safe manner. This has been introduced to solve the security threats caused by the vsyscall. The vDSO is dynamically allocated which solves security concerns and can have more than 4 system calls. The vDSO links are provided via the glibc library. The linker will link in the glibc vDSO functionality, provided that such a routine has an accompanying vDSO version, such as gettimeofday. When your program executes, if your kernel does not have vDSO support, a traditional syscall will be made.

Credits and useful links :

I just want to add that now in new kernels, vDSO is not used only for "safe" syscalls but it is used to decide which syscall mechanism is preferred method to invoke a syscall on the system.