Is it safe to retrieve userspace PID using current->pid in Linux Driver?

I would like to retrieve caller process' PID (Userspace) in Linux Driver.

At first, I could find pid attribute so easy from the struct task_struct in Linux Source Code (include/linux/sched.h) I used current->pid to retrieve userspace PID.

However, I read some articles telling that the pid in Kernel and Userspace context are different. In other words, so called userspace PID is in fact tgid which is also an attribute of struct task_struct.

Moreover, when I look into the source code getpid() system call expecting this should return task_struct.pid or task_struct.tgid, it turns out that it returned unexpected function- task_tgid_vnr- ultimately uses the signal inside task_struct.

As my ultimate goal was so simple- to find out the exact same PID retrieved from getpid() system call in Linux kernel- I am very confused.

In conclusion, among all the options above, which way is the recommended way to retrieve the caller process' userspace PID?

Solution

Most of the times, you shouldn't access the task structure directly to get this kind of information. There are several problems in doing so:

PIDs in kernel space are not just simple integers, but are managed using structures (struct pid) that can be referenced by multiple tasks. Note that even though the name of the struct is pid, it is used to refer to all kind of PIDs (PID, TGID, PGID, SID).
A lot of things in the task structure are guarded by mutexes, spinlocks or RCU. You need to look for the appropriate helper functions to access most fields. It is rarely safe to just directly access a field doing e.g. task->field. PID structures for example are RCU protected.
Namespaces also come into play: a task may have some PID in the global namespace, but a different PID in its current PID namespace. Whether you need the global PID or the virtual PID is for you to decide. Be careful with this, because you can end up mixing global (outside) and virtual (inside) PIDs.

The kernel vs user space terminology is also confusing, so make sure to understand the difference. In general, when coding kernel modules, only kernel terminology is used. What the kernel calls "PID" is a unique number associated with every task in the namespace. A task is a single thread, therefore a PID in kernel space is what you would call TID (thread ID) in user space. A TGID (thread group ID) in kernel space is what you would call a PID (process ID) in user space. All tasks with the same TGID belong to the same process. Only one task will have a PID equal to its TGID, and that is the main thread.

Looking at getpid syscall code is a good start. In this case, getpid wants to return the virtual TGID to user space, so it uses task_tgid_vnr(). You can see a comment explaining the meaning of the various pid helpers in linux/pid.h:

/*
 * the helpers to get the task's different pids as they are seen
 * from various namespaces
 *
 * task_xid_nr()     : global id, i.e. the id seen from the init namespace;
 * task_xid_vnr()    : virtual id, i.e. the id seen from the pid namespace of
 *                     current.
 * task_xid_nr_ns()  : id seen from the ns specified;
 *
 * see also pid_nr() etc in include/linux/pid.h
 */

You shold use one of the above functions depending on your needs:

if you want the global userspace PID then use task_tgid_nr();
if you want the virtual userspace PID (inside the tasks' current PID namespace) then use task_tgid_vnr();
if you want the virtual userspace PID in another namespace than the one of the task, then first get ahold of the namespace and then use task_tgid_nr_ns().