pythonthread-synchronization

Proving the Necessity of Synchronization Primitives in Python


Now I am preparing a report on the topic of synchronization primitives in threads and I am trying to find a good example when one result is obtained with the Lock() blocking, and completely different without using it.

In the example below, I'm trying to increment a number by 1 in a loop on multiple threads at once. I have already brought the number of iterations to 1000000 and the number of threads to 1000, but the effect of race conditions (or whatever else) does not want to occur. The result is still strictly equal to the product of the number of iterations and the number of threads (running on Ubuntu-20.04)

from threading import Thread

COUNT = 1000000
NUM_THREADS = 1000
counter = 0


def increment():
    global counter
    for _ in range(COUNT):
        counter += 1


threads = [Thread(target=increment) for _ in range(NUM_THREADS)]
[thread.start() for thread in threads]
[thread.join() for thread in threads]

diff = counter - COUNT * NUM_THREADS
print(f"Diff for counter without synchronization: {diff}")

Can anyone suggest an example (preferably not very complex) where the result of multiple threads computations without applying synchronization primitives would be different from its "synchronized counterpart"?


Solution

  • This is what happens in the increment function:

    >>> counter = 0
    >>>
    >>> def increment():
    ...     global counter
    ...     for _ in range(1000):
    ...         counter += 1
    ...
    >>> import dis
    >>> dis.dis(increment)
      3           0 LOAD_GLOBAL              0 (range)
                  2 LOAD_CONST               1 (1000)
                  4 CALL_FUNCTION            1
                  6 GET_ITER
            >>    8 FOR_ITER                12 (to 22)
                 10 STORE_FAST               0 (_)
    
      4          12 LOAD_GLOBAL              1 (counter)
                 14 LOAD_CONST               2 (1)
                 16 INPLACE_ADD
                 18 STORE_GLOBAL             1 (counter)
                 20 JUMP_ABSOLUTE            8
            >>   22 LOAD_CONST               0 (None)
                 24 RETURN_VALUE
    

    To achieve what you want to see, Python needs to do a thread switch after instruction 12 (LOAD_GLOBAL) and instruction 18 (STORE_GLOBAL) - and of course, the other thread will have to modify counter while it has the GIL.

    You can get the frequency of Python's thread switches from sys.getswitchinterval() - on my system it is 5 milliseconds. The chances of hitting the switch interval exactly between the right instructions is not zero, so given enough time it will happen. Decreasing the number of threads and increasing the iterations might improve your chances, ...or increasing the number of instructions between the load/store (i.e. doing more work).

    What you are seeing are two of the problems with unsynchronized access, i.e. it will work correctly a lot of the time, and it is difficult to reproduce a problem.