linuxlinux-kernelsystem-callsstracevdso

Why does strace ignore some syscalls (randomly) depending on environment/kernel?


If I compile the following program:

$ cat main.cpp && g++ main.cpp
#include <time.h>
int main() {
    struct timespec ts;
    return clock_gettime(CLOCK_MONOTONIC, &ts);
}

and then run it under strace in "standard" Kubuntu, I get this:

strace -tt --trace=clock_gettime ./a.out
17:58:40.395200 +++ exited with 0 +++

As you can see, there is no clock_gettime (full strace output is here).

On the other hand, if I run the same app in my custom built linux kernel under qemu, I get the following output:

strace -tt --trace=clock_gettime ./a.out
18:00:53.082115 clock_gettime(CLOCK_MONOTONIC, {tv_sec=101481, tv_nsec=107976517}) = 0
18:00:53.082331 +++ exited with 0 +++

Which is more expected - there is clock_gettime.

So, my questions are:


Solution

  • Answer to the first question

    From vdso man

    strace(1), seccomp(2), and the vDSO

    When tracing systems calls with strace(1), symbols (system calls) that are exported by the vDSO will not appear in the trace output. Those system calls will likewise not be visible to seccomp(2) filters.

    Answer to the second question:

    In the vDSO, clock_gettimeofday and related functions are reliant on specific clock modes; see __arch_get_hw_counter.

    If the clock mode is VCLOCK_TSC, the time is read without a syscall, using RDTSC; if it’s VCLOCK_PVCLOCK or VCLOCK_HVCLOCK, it’s read from a specific page to retrieve the information from the hypervisor. HPET doesn’t declare a clock mode, so it ends up with the default VCLOCK_NONE, and the vDSO issues a system call to retrieve the time.

    And indeed:

    In the default kernel (from Kubuntu):

    $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
    tsc hpet acpi_pm
    $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
    tsc
    

    Custom built kernel:

    $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
    hpet
    

    More info about various clock sources. In particular:

    The documentation of Red Hat MRG version 2 states that TSC is the preferred clock source due to its much lower overhead, but it uses HPET as a fallback. A benchmark in that environment for 10 million event counts found that TSC took about 0.6 seconds, HPET took slightly over 12 seconds, and ACPI Power Management Timer took around 24 seconds.