On multiprocessor, each core can have its own variables. I thought they are different variables in different addresses, although they are in same process and have the same name.
But I am wondering, how does the kernel implement this? Does it dispense a piece of memory to deposit all the percpu pointers, and every time it redirects the pointer to certain address with shift or something?
Normal global variables are not per CPU. Automatic variables are on the stack, and different CPUs use different stack, so naturally they get separate variables.
I guess you're referring to Linux's per-CPU variable infrastructure.
Most of the magic is here (asm-generic/percpu.h
):
extern unsigned long __per_cpu_offset[NR_CPUS];
#define per_cpu_offset(x) (__per_cpu_offset[x])
/* Separate out the type, so (int[3], foo) works. */
#define DEFINE_PER_CPU(type, name) \
__attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
/* var is in discarded region: offset to particular copy we want */
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
The macro RELOC_HIDE(ptr, offset)
simply advances ptr
by the given offset in bytes (regardless of the pointer type).
What does it do?
DEFINE_PER_CPU(int, x)
, an integer __per_cpu_x
is created in the special .data.percpu
section.__per_cpu_offset
array is filled with the distances between the copies. Supposing 1000 bytes of per cpu data are used, __per_cpu_offset[n]
would contain 1000*n
.per_cpu__x
will be relocated, during load, to CPU 0's per_cpu__x
.__get_cpu_var(x)
, when running on CPU 3, will translate to *RELOC_HIDE(&per_cpu__x, __per_cpu_offset[3])
. This starts with CPU 0's x
, adds the offset between CPU 0's data and CPU 3's, and eventually dereferences the resulting pointer.