Consider the following python code:
from multiprocessing import Process, Manager
class MyClass():
def __init__(self, dic1, dic2):
self.dic1 = Manager().dict(dic1) # Create a managed dictionary
self.dic2 = Manager().dict(dic2) # Create a managed dictionary
process1 = Process(target=self.dictSumOverloaded, args=())
process2 = Process(target=self.dictSumElementWise, args=())
process1.start()
process1.join()
process2.start()
process2.join()
def dictSumOverloaded(self):
self.dic1['1'][0] += 1 # dic1 is not updated
def dictSumElementWise(self):
a = self.dic2['1']
self.dic2['1'] = [a[0]+1, a[1], a[2]] # dic2 is updated
def main():
dic1 = {'1': [1, 0, 0]}
dic2 = {'1': [1, 0, 0]}
result = MyClass(dic1, dic2)
print(result.dic1) # Failed
print(result.dic2) # Success
# Bypass multiprocessing environment
dic3 = {'1': [1, 0, 0]}
dic3['1'][0]+=1
print(dic3) # Success
if __name__ == '__main__':
main()
In this example, I create a managed dict containing a list as an attribute of MyClass
. The goal is to increment some of the elements of this list in a multiprocessing environment, but some methods do not effectively modify the list.
Method 1: dictSumOverloaded
The overloaded operator +=
is used to increment an element of the list by 1 but the result does not persist. The dict is not updated.
Method 2: dictSumElementWise
This function creates a new list element wise, based on the old list and the values to add. Then the new list is assigned to the dict key. The dict is successfully modified.
Sanity check: outside the multiprocessing environment
dic3
is modified as expected when using +=
outside the multiprocessing environment.
Questions:
1) Why is +=
not modifying the list element in the multiprocessing environment?
2) Using the element wise method to update the list works but is cumbersome, any suggestion on making it cleaner/faster?
I believe the problem you are encountering is related to detection of a change in the dictionary dic1
by the anonymous Manager
object that you create it with.
Changing the list itself with +=
operator does not change the reference to the list - it is the same list, just an element of it has changed (namely 0-th element of a list stored in the thread-safe dictionary dic1
under the key '1'
).
With dic2
the situation is different. With the following line:
self.dic2['1'] = [a[0]+1, a[1], a[2]]
You effectively update the value stored under the key '1'
. The assigned value is a completely new list. It is made of elements of the list stored as the previous value under the same key but it is nevertheless a different list.
Such a change is detected by the Manager
object and the reference in the process in which you check the value of dic2
is seamlessly updated so that you could read the correct value.
The main point here is the following:
the thread-safe collection (dict
) does not propagate any changes to other processes (or threads) if there are no changes to keys, or values, or both. List is a reference type so the value (i.e. reference) does not change even if the list values change.