When I write a small script with fork, the syscall returns twice processes (once per process):
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int pid = fork();
if (pid == 0) {
// child
} else if (pid > 0) {
// parent
}
}
If I instrument that with systemtap, I only find one return value:
// fork() in libc calls clone on Linux
probe syscall.clone.return {
printf("Return from clone\n")
}
(SystemTap installes probes on _do_fork
instead of clone, but that shouldn't change anything.)
This confuses me. A couple of related questions:
_do_fork
code correctly, the process is cloned in the middle of the function. (copy_process
and wake_up_new_task
). Shouldn't the subsequent code run in both processes?Thus the code would have to differentiate between executing as a parent and a child. But there are no checks of the sort, which is already a strong hint that the child does not execute this code in the first place. Thus one should look for a dedicated place new children return to.
Since the code is quite big and hairy, one can try to cheat and just look for 'fork' in arch-specific code, which quickly reveals ret_from_fork.
It is set a starting point by -> do_fork -> copy_process -> copy_thread_tls http://lxr.free-electrons.com/source/arch/x86/kernel/process_64.c#L158
Thus
Why does the syscall only return once?
It does not return once. There are 2 returning threads, except the other one uses a different code path. Since the probe is installed only on the first one, you don't see the other one. Also see below.
If I understand the _do_fork code correctly, the process is cloned in the middle of the function. (copy_process and wake_up_new_task). Shouldn't the subsequent code run in both processes?
I noted earlier this is false. The real question is what would be the benefit of making the child return in the same place as the parent. I don't see any and it would troublesome (extra special casing, as noted above). To re-state: making the child return elsehwere lets callers not have to handle the returning child. They only need to check for errors.
Does the kernel code after a syscall run in the same thread / process as the user code before the syscall?
What is 'kernel code after a syscall'? If you are thread X and enter the kernel, you are still the thread X.