In the XNU sources, specifically <libsyscall/os/tsd.h>
there is a function for fast access of thread-local data:
__attribute__((always_inline))
static __inline__ void*
_os_tsd_get_direct(unsigned long slot)
{
void *ret;
__asm__("mov %%gs:%1, %0" : "=r" (ret) : "m" (*(void **)(slot * sizeof(void *))));
return ret;
}
I'm confused about the way the inline assembly is interpreted by the compiler.
Suppose that slot == 1
. On x86_64 sizeof(void *) == 8
, so the input operand expression becomes *(void **)(8)
. Why doesn't the following dereference result in a memory access error?
In fact if I try to move the expression out of the asm
statement I do get an error.
void * my_os_tsd_get_direct(unsigned long slot) {
void *ret;
void *ptr = *(void **)(slot * sizeof(void *));
__asm__("mov %%gs:%1, %0" : "=r" (ret) : "m" (ptr));
return ret;
}
I looked at the output of the assembler and noticed that the second version dereferences the pointer but the first does not.
So I thought, okay, let's try removing the explicit dereference in the asm
statement because the compiler seems to ignore it.
void * my_os_tsd_get_direct_v2(unsigned long slot) {
void *ret;
__asm__("mov %%gs:%1, %0" : "=r" (ret) : "m" ((void *)(slot * sizeof(void *))));
return ret;
}
But that produces error: invalid lvalue in asm input for constraint 'm'
.
Can anyone shed some light on what's happening?
Why doesn't the following dereference result in a memory access error?
Because you're using it as a memory operand to an asm block which doesn't deref it directly, only relative to the GS segment base. The GS base is set to whatever virtual address we want this thread's thread-local-storage block to be in.
See How does the gcc `__thread` work? and/or Addresses of Thread Local Storage Variables for how gcc on Linux implements thread-local storage (TLS) using the FS or GS segment register. XNU is clearly doing basically the same thing, but using inline asm instead of taking advantage of GNU C builtins for thread stuff.
An "m"
constraint is somewhat similar to C's &
operator: instead of loading the object into a register, the compiler merely substitutes an addressing mode that references the object into the asm template.
Since this asm template doesn't use the addressing mode directly, but instead with %%gs:
, it's not actually doing the dereference of *(void **)(slot * sizeof(void *)))
that would happen if you assigned that to a variable in pure C.
asm-template substitutions are purely textual. You can do stuff like 16 + %0
to access a memory location 16 bytes ahead of a memory operand.
As usual, it helps to look at the compiler's asm output. I put your code on the Godbolt compiler explorer (with gcc and clang), and removed the static inline stuff so we can see the asm for a stand-alone definition of the function.
void*
_os_tsd_get_direct(unsigned long slot)
{
void *ret;
__asm__("mov %%gs:%1, %0\n\t"
"nop # operand 1 was %1" : "=r" (ret) : "m" (*(void **)(slot * sizeof(void *))));
return ret;
}
assembles to
#gcc -O3
mov %gs:0(,%rdi,8), %rax
nop # operand 1 was 0(,%rdi,8)
ret
I used a NOP instead of just a comment so it's still visible even after Godbolt removes comment-only lines. It's often handy to add dummy comments showing what the template operands were (especially if you're ever using any instructions with implicit operands, and want to see what the compiler picked for operands that aren't otherwise mentioned in the template.)
Here I added it just to make the point that the 0(,%rdi,8)
substituted by the compiler is just text that can go anywhere you ask for it. The trick is that we're asking for it right after a %%gs:
.
void *ptr = *(void **)(slot * sizeof(void *));
That's doing something completely different. You're actually dereferencing the TLS offset as a pointer into the flat virtual address space (using the default DS segment base = 0).
If you wanted to break it up, you'd do
void * separated_os_tsd_get_direct(unsigned long slot) {
void *ret;
unsigned long slot_offset = slot * sizeof(void*);
void **gs_ptr = (void **)slot_offset;
__asm__("mov %%gs:%1, %0" : "=r" (ret) : "m" (*gs_ptr));
return ret;
}
compiles to:
separated_os_tsd_get_direct(unsigned long):
mov %gs:0(,%rdi,8), %rax
ret
It's essential that the operand to the asm template be a pointer-dereference, not a local. With optimization enabled a local could be optimized away and turned back into a pointer deref of the original location (if written with semantics that make that possible, unlike your version), but it's better to make sure it's safe by avoiding an actual deref other than in the expression inside the "m"(*ptr)
constraint.