It's well-documented how to use ftrace
to find the function graph starting from a certain function, e.g.
# echo nop > current_tracer
# echo 100 > max_graph_depth
# echo ksys_dup3 > set_graph_function
# echo function_graph > current_tracer
# cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
7) | ksys_dup3() {
7) 0.533 us | expand_files();
7) | do_dup2() {
7) | filp_close() {
7) 0.405 us | dnotify_flush();
7) 0.459 us | locks_remove_posix();
7) | fput() {
7) | fput_many() {
7) | task_work_add() {
7) 0.533 us | kick_process();
7) 1.558 us | }
7) 2.475 us | }
7) 3.382 us | }
7) 6.122 us | }
7) 7.104 us | }
7) + 10.763 us | }
But this only returns the function graph starting at ksys_dup3
. It omits the full function graph that leads to ksys_dup3
:
7) | el0_svc_handler() {
7) | el0_svc_common() {
7) | __arm64_sys_dup3() {
7) | ksys_dup3() {
7) 0.416 us | expand_files();
7) | do_dup2() {
7) | filp_close() {
7) 0.405 us | dnotify_flush();
7) 0.406 us | locks_remove_posix();
7) | fput() {
7) 0.416 us | fput_many();
7) 1.269 us | }
7) 3.819 us | }
7) 4.746 us | }
7) 6.475 us | }
7) 7.381 us | }
7) 8.362 us | }
7) 9.205 us | }
Is there a way to use ftrace
to filter a full function graph?
I'd say all of ftrace is well documented (here). There is no way to do what you want though, because the filtering ability of ftrace is implemented through triggers that can start/stop tracing exactly when an event happens. Setting set_graph_function
will trigger a trace start when the function is entered and a trace stop when it's exited. Since when you enter el0_svc_handler
you cannot know beforehand if ksys_dup3
is going to be called, there is no way to write a trigger to start the trace on such a condition (that would require being able to "predict the future").
You can however do fine grained filtering with the set_ftrace_pid
parameter writing a program that only does what you need, running a full trace on that program and filtering on the parent el0_svc_handler
.
In case you don't actually know which parent functions are being called before the one you want, you can do a single targeted run with the func_stack_trace
option enabled to get an idea about what the entire call chain is, and then use that output to set the appropriate filter for a "normal" run.
For example, let's say I want to trace do_dup2
, but as you say I want to start from one of the parent functions. I'll first write a dummy test program like the following one:
int main(void) {
printf("My PID is %d, press ENTER to go...\n", getpid());
getchar();
dup2(0, 1);
return 0;
}
I'll compile and start the above program in one shell:
$ ./test
My PID is 1234, press ENTER to go...
Then in another root shell, configure tracing as follows:
cd /sys/kernel/tracing
echo function > current_tracer
echo 1 > options/func_stack_trace
echo do_dup2 > set_ftrace_filter
echo PID_OF_THE_PROGRAM_HERE > set_ftrace_pid
Note: echo do_dup2 > set_ftrace_filter
is very important otherwise you will trace and dump the stack for every single kernel function, which would be a huge performance hit and can make the system unresponsive. For the same reason doing echo PID_OF_THE_PROGRAM_HERE > set_ftrace_pid
is also important if you don't want to trace every single dup2
syscall done by the system.
Now I can do one trace:
echo 1 > tracing_on
# ... press ENTER in the other shell ...
echo 0 > tracing_on
cat trace
And the result will be something like this (my machine is x86 so the syscall entry functions have different names):
# tracer: function
#
# entries-in-buffer/entries-written: 2/2 #P:20
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
a.out-6396 [000] .... 1455.370160: do_dup2 <-__x64_sys_dup2
a.out-6396 [000] .... 1455.370174: <stack trace>
=> 0xffffffffc322006a
=> do_dup2
=> __x64_sys_dup2
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
I can now see the entire call chain that led to do_dup2
(printed in reverse order above) and then do another normal trace from one of the parent functions (remember to disable options/func_stack_trace
first).