Currently, I'm getting into the topic of kernel tracing with LTTng and Perf. I'm especially interested to trace the different states a process is in.
I stumbled over the event sched_process_free
and sched_process_exit
. I'm wondering if my current understanding is correct:
If a process is exited, sched_process_exit
is written to the trace. However, the process descriptor might still be in the memory which leads to a zombie. When the whole memory connected to the process is freed, sched_process_free
is called. This would mean, if I really want to be sure that the process is fully "terminated" and removed from memory, I have to listen to sched_process_free
instead of sched_process_exit
in the trace. Is this correct?
I find some time to edit my answer to make it more clear. If there are still some problem, please tell me, we can discuss and make it more clear. Let's dive into the end of task :
there are two system calls : exit_group()
and exit()
, and all of them will go to do_exit()
, which will do the following things.
PF_EXTING
which means the task is deletingdel_timer_sync()
exit_mm(), exit_sem(), __exit_fs()
and others to release structure of that taskexit_code
to _exit()/exit_group()
or errorexit_notify()
exit_signal
, send SIGCHLD
EXIT_DEAD
, call release_task()
to recycle other memory and decrease ref count.EXIT_ZOMBIE
PF_DEAD
schedule()
We need zombie state cause the parent may need to use those file descriptors so we can not delete all the things in the first time. The parent task will need to use something like wait()
to check if child is dead. After wait()
, it is time for the zombie to release totally by release_task()
ptrace_children
list__exit_signal()
delete all pending signals and release signal_struct descriptor and exit_itimers()
delete all the timer__exit_sighand()
delete signal handler__unhash_process()
nr_threads
--detach_pid()
to delete task descriptor from PIDTYPE_PID
and PIDTYPE_TGID
REMOVE_LINKS
to delete the task from listsched_exit()
to schedule parent's time piecesput_task-struct()
to decrease the counter, and release memory & task descriptorSo, we know that sched_process_exit
state will be make in the do_exit(), but we can not make sure if the process is released or not (may call release_task() or not, which will trigger sched_process_free
). That is why we need both of the two perf event point.