c++tbbthread-local-storage

thread_local storage, constructors, destructors, and tbb


It's my understanding that tbb may maintain pool threads for reuse... is there a way to ensure that data that I have declared using a modern C++ implementation as thread_local and with a non-trivial (default) constructor and destructor, which is initialized whenever the data is first used from a new thread is destroyed when tbb puts a thread into its pool and constructed again when the thread is pulled out of the pool? I am, as I said, currently just declaring my data as static and using the C++ thread_local specifier.

EDIT:

Apologies for not spelling this out initially, but an early respondent has made it clear that some assumptions might be made about the code I am hoping to update which are not valid.

  1. Refactoring usage of tbb isn't practical, because the code makes heavy use of it already and it is non-trivial to refactor them all, but I will need the threads created by it to still have access to the thread local data.

  2. A have hidden all access to thread_local data behind a small number of functions, which is what I was ideally hoping to change. Is there some way, perhaps with an additional thread_local value, that I can tell that I'm on a thread that has been reused since the last time the data was accessed? This is actually the ideal solution I would be looking for.

  3. One major disadvantage I find with refactoring all tbb calls in the application is not so much that there are many of them (although that is certainly a significant factor), but that then I am adding references to thread_local data in every single tbb thread, even if that particular thread did not ever actually need to access it. On systems which delay construction of thread_local data until it is first accessed, this overhead is undesirable. This is why, ideally, I would like to put the logic for it inside of the functions that accesses the thread_local data.


Solution

  • Program logic context and threads should be considered as another concept. Threads are used by your program context. Your program context might need one thread which is created and destroyed mutually with you program context, or might run on a thread provided from pooled thread, or might run on multiple threads which are created on demand or provide from thread pool.

    Your program context just have to use a thread local instance.

    Your program context needs to re-initialize/re-create a thread local instance, because the thread local instance was already used by a thread with another context and it is not irrelevant to your current program context.

    If your program context is composed of multiple threads(e.g. func a() run on thread t1, func b() run on tread t2), you should not to use thread local storage because the program context can not ensure data integrity.

    In the code below, TBB assign a thread from pooled thread to a individual parallel_for context, so unless you initialize a thread local instance at first in your context, the data of this counting context would be collapsed.

    #include <ranges>
    #include <tbb/tbb.h>
    using namespace std;
    using namespace std::ranges::views;
    
    int main() {
        class A{
        public:
            A(int ctx) : m_ctx(ctx){}
            int m_ctx;
            int m_num = 0;
        };
        thread_local A a(-1);  // already existing instance
    
        tbb::parallel_for(0, 7, 1, [](int i) {
            //a = A(i);     // re-initialize thread local
            a.m_ctx = i;    // not re-initize thread local, only assign context id 
            for (auto i : iota(0, 5)) {
                a.m_num++;
                printf("ctx=%d count=%d th=%d \n", a.m_ctx, a.m_num, tbb::this_task_arena::current_thread_index());
            }
         });
    }
    

    Output with not re-initializing a thread local instance. So, the count value is collapsed because that TBB assigns a thread which already used a thread local instance for another program context.

    Output with no re-initialization is collapsed.

    ctx=0 count=1 th=0
    ctx=0 count=2 th=0
    ctx=0 count=3 th=0
    ctx=0 count=4 th=0
    ctx=0 count=5 th=0
    ctx=1 count=6 th=0
    ctx=1 count=7 th=0
    ctx=1 count=8 th=0
    ctx=1 count=9 th=0
    ctx=1 count=10 th=0
    ctx=2 count=11 th=0
    ctx=2 count=12 th=0
    ctx=2 count=13 th=0
    ctx=3 count=1 th=1
    ctx=3 count=2 th=1
    ctx=3 count=3 th=1
    ctx=3 count=4 th=1
    ctx=3 count=5 th=1
    ctx=5 count=1 th=2
    ctx=5 count=2 th=2
    ctx=5 count=3 th=2
    ctx=5 count=4 th=2
    ctx=5 count=5 th=2
    ctx=6 count=1 th=4
    ctx=6 count=2 th=4
    ctx=6 count=3 th=4
    ctx=6 count=4 th=4
    ctx=6 count=5 th=4
    ctx=2 count=14 th=0
    ctx=2 count=15 th=0
    ctx=4 count=1 th=3
    ctx=4 count=2 th=3
    ctx=4 count=3 th=3
    ctx=4 count=4 th=3
    ctx=4 count=5 th=3
    

    Output with re-initializing thread local instance. it works well.

    ctx=0 count=1 th=0
    ctx=0 count=2 th=0
    ctx=0 count=3 th=0
    ctx=0 count=4 th=0
    ctx=0 count=5 th=0
    ctx=1 count=1 th=0
    ctx=1 count=2 th=0
    ctx=1 count=3 th=0
    ctx=1 count=4 th=0
    ctx=1 count=5 th=0
    ctx=2 count=1 th=0
    ctx=2 count=2 th=0
    ctx=2 count=3 th=0
    ctx=2 count=4 th=0
    ctx=2 count=5 th=0
    ctx=5 count=1 th=2
    ctx=5 count=2 th=2
    ctx=5 count=3 th=2
    ctx=5 count=4 th=2
    ctx=5 count=5 th=2
    ctx=6 count=1 th=0
    ctx=6 count=2 th=0
    ctx=6 count=3 th=0
    ctx=6 count=4 th=0
    ctx=6 count=5 th=0
    ctx=4 count=1 th=3
    ctx=4 count=2 th=3
    ctx=4 count=3 th=3
    ctx=4 count=4 th=3
    ctx=4 count=5 th=3
    ctx=3 count=1 th=1
    ctx=3 count=2 th=1
    ctx=3 count=3 th=1
    ctx=3 count=4 th=1
    ctx=3 count=5 th=1