pythonmultithreadingthread-local-storage

Thread local storage in Python


How do I use thread local storage in Python?

Related


Solution

  • Thread local storage is useful for instance if you have a thread worker pool and each thread needs access to its own resource, like a network or database connection. Note that the threading module uses the regular concept of threads (which have access to the process global data), but these are not too useful due to the global interpreter lock. The different multiprocessing module creates a new sub-process for each, so any global will be thread local.

    threading module

    Here is a simple example:

    import threading
    from threading import current_thread
    
    threadLocal = threading.local()
    
    def hi():
        initialized = getattr(threadLocal, 'initialized', None)
        if initialized is None:
            print("Nice to meet you", current_thread().name)
            threadLocal.initialized = True
        else:
            print("Welcome back", current_thread().name)
    
    hi(); hi()
    

    This will print out:

    Nice to meet you MainThread
    Welcome back MainThread
    

    One important thing that is easily overlooked: a threading.local() object only needs to be created once, not once per thread nor once per function call. The global or class level are ideal locations.

    Here is why: threading.local() actually creates a new instance each time it is called (just like any factory or class call would), so calling threading.local() multiple times constantly overwrites the original object, which in all likelihood is not what one wants. When any thread accesses an existing threadLocal variable (or whatever it is called), it gets its own private view of that variable.

    This won't work as intended:

    import threading
    from threading import current_thread
    
    def wont_work():
        threadLocal = threading.local() #oops, this creates a new dict each time!
        initialized = getattr(threadLocal, 'initialized', None)
        if initialized is None:
            print("First time for", current_thread().name)
            threadLocal.initialized = True
        else:
            print("Welcome back", current_thread().name)
    
    wont_work(); wont_work()
    

    Will result in this output:

    First time for MainThread
    First time for MainThread
    

    multiprocessing module

    All global variables are thread local, since the multiprocessing module creates a new process for each thread.

    Consider this example, where the processed counter is an example of thread local storage:

    from multiprocessing import Pool
    from random import random
    from time import sleep
    import os
    
    processed=0
    
    def f(x):
        sleep(random())
        global processed
        processed += 1
        print("Processed by %s: %s" % (os.getpid(), processed))
        return x*x
    
    if __name__ == '__main__':
        pool = Pool(processes=4)
        print(pool.map(f, range(10)))
    

    It will output something like this:

    Processed by 7636: 1
    Processed by 9144: 1
    Processed by 5252: 1
    Processed by 7636: 2
    Processed by 6248: 1
    Processed by 5252: 2
    Processed by 6248: 2
    Processed by 9144: 2
    Processed by 7636: 3
    Processed by 5252: 3
    [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
    

    ... of course, the thread IDs and the counts for each and order will vary from run to run.