I'm working on a runtime library that uses user-level context switching (using Boost::Context), and am having trouble using thread_level
variables. Consider the following (reduced) code:
thread_local int* volatile tli;
int main()
{
tli = new int(1); // part 1, done by thread 1
UserLevelContextSwitch();
int li = *tli; // part 2, done by thread 2
cout << li;
}
Since there are two accesses to the thread_local
variable, the main function is transformed by the compiler to something along these lines (reversed from assembly):
register int** ptli = &tli; // cache address of thread_local variable
*ptli = new int(1);
UserLevelContextSwitch();
int li = **ptli;
cout << li;
This seems to be a legal optimization, since the value of volatile tli
is not being cached in a register. But the address of the volatile tli
is in fact being cached, and not read from memory on part 2.
And that's the problem: after the user-level context switch, the thread that did part 1 goes somewhere else. Part 2 is then picked up by some other thread, which gets the previous stack and registers state. But now the thread that's executing part 2 reads the value of the tli
that belongs to thread 1.
I'm trying to figure out a way to prevent the compiler from caching the thread-local variable's address, and volatile
doesn't go deep enough. Is there any trick (preferably standard, possibly GCC-specific) to prevent the caching of the thread-local variables' addresses?
There is no way to pair user-level context switches with TLS. Even with atomics and full memory fence, caching address seems legitimate optimization since the thread_local variable is file-scope, static variable which cannot be moved as assumed by the compiler. (though, perhaps some compilers can still be sensitive to the compiler memory barriers like std::atomic_thread_fence
and asm volatile ("" : : : "memory");
)
cilk-plus uses the same technique as you described to implement "continuation stealing" when a different thread can continue execution after the sync point. And they explicitly discourage usage of TLS in a Cilk program. Instead, they recommend using "hyperobjects" - a special feature of Cilk which substitutes TLS (and also provides serial/deterministic join semantics). See also Cilk developer presentation about thread_local
and parallelism.
Also, Windows provides FLS (Fiber Local Storage) as a TLS replacement when Fibers (the same lightweight context switches) are in use.