I'm familiar with Python's GIL, and so I know that multithreading is not really multithreading in Python.
When I ran the code below, I expected the result would be 0, because the GIL won't allow a race condition to exist. In python3, the result was 0. But in python2, it was not 0; the result was an unexpected one like -3492 or 21283.
How do I solve the problem?
import threading
x = 0 # A shared value
def foo():
global x
for i in range(100000000):
x += 1
def bar():
global x
for i in range(100000000):
x -= 1
t1 = threading.Thread(target=foo)
t2 = threading.Thread(target=bar)
t1.start()
t2.start()
t1.join()
t2.join() # Wait for completion
print(x)
The statement x += 1
is not threadsafe in any version of Python. The fact that you were seeing the results of the a race condition in Python 2 but not Python 3 is mostly just a coincidence (it probably has to do with optimizations for when the GIL switches between threads, but I don't know the details). It could get the wrong results in Python 3 too.
The reason is that the +=
operator is not atomic. It requires several bytecodes to run, and the GIL is only guaranteed to prevent switching between threads while any one bytecode is running. Lets look at a disassembly of your foo
function to see how it works (this is from Python 3.7, in Python 2.7 the addresses within the bytecode are different, but all the operations are the same):
>>> dis.dis(foo)
3 0 SETUP_LOOP 24 (to 26)
2 LOAD_GLOBAL 0 (range)
4 LOAD_CONST 1 (100000000)
6 CALL_FUNCTION 1
8 GET_ITER
>> 10 FOR_ITER 12 (to 24)
12 STORE_FAST 0 (i)
4 14 LOAD_GLOBAL 1 (x)
16 LOAD_CONST 2 (1)
18 INPLACE_ADD
20 STORE_GLOBAL 1 (x)
22 JUMP_ABSOLUTE 10
>> 24 POP_BLOCK
>> 26 LOAD_CONST 0 (None)
28 RETURN_VALUE
The lines we care about are the four with bytecode positions 14-20. The first two load the arguments to the addition. The third does the INPLACE_ADD
operation. The result of the addition gets put back on the stack, because not all types of objects can be updated in place (integers cannot, so it's necessary here). The last bytecode stores the sum back to the original name.
If the interpreter chooses to switch which thread holds the GIL in between when we load x
in bytecode 14 and when we store the new value to it again in bytecode 20, we'll probably end up with an incorrect result, as the value we loaded earlier may not be valid any more when we get get hold of the GIL again.
As I mentioned above, the fact that you get 0
in Python 3 is simply the result of an implementation detail, that the interpreter choose not to switch during that critical section of bytecode in the time you were testing it. There's no guarantee it won't choose differently if you run the program again in another situation (e.g. under heavy CPU load), or in a different interpreter version (e.g. 3.7 instead of 3.6, or whatever).
If you want real thread safety, then you should use actual locks, not rely solely on the GIL. The GIL only ensures that the interpreter's internal state remains sane. It doesn't guarantee that every line of your code is atomic.