python-3.xlistdictionarypython-multiprocessingmultiprocessing-manager

Managed dict of list not updated in multiprocessing when using += operator


Consider the following python code:

from multiprocessing import Process, Manager

class MyClass():
    def __init__(self, dic1, dic2):
        self.dic1 = Manager().dict(dic1) # Create a managed dictionary
        self.dic2 = Manager().dict(dic2) # Create a managed dictionary
        process1 = Process(target=self.dictSumOverloaded, args=())
        process2 = Process(target=self.dictSumElementWise, args=())

        process1.start()
        process1.join()

        process2.start()
        process2.join()

    def dictSumOverloaded(self):
        self.dic1['1'][0] += 1 # dic1 is not updated

    def dictSumElementWise(self):
        a = self.dic2['1']
        self.dic2['1'] = [a[0]+1, a[1], a[2]] # dic2 is updated

def main():
    dic1 = {'1': [1, 0, 0]}
    dic2 = {'1': [1, 0, 0]}

    result = MyClass(dic1, dic2)
    print(result.dic1) # Failed
    print(result.dic2) # Success

    # Bypass multiprocessing environment
    dic3 = {'1': [1, 0, 0]}
    dic3['1'][0]+=1
    print(dic3) # Success

if __name__ == '__main__':
    main()

In this example, I create a managed dict containing a list as an attribute of MyClass. The goal is to increment some of the elements of this list in a multiprocessing environment, but some methods do not effectively modify the list.

Method 1: dictSumOverloaded
The overloaded operator += is used to increment an element of the list by 1 but the result does not persist. The dict is not updated.
Method 2: dictSumElementWise
This function creates a new list element wise, based on the old list and the values to add. Then the new list is assigned to the dict key. The dict is successfully modified.
Sanity check: outside the multiprocessing environment
dic3 is modified as expected when using += outside the multiprocessing environment.

Questions:
1) Why is += not modifying the list element in the multiprocessing environment?
2) Using the element wise method to update the list works but is cumbersome, any suggestion on making it cleaner/faster?


Solution

  • I believe the problem you are encountering is related to detection of a change in the dictionary dic1 by the anonymous Manager object that you create it with.

    Changing the list itself with += operator does not change the reference to the list - it is the same list, just an element of it has changed (namely 0-th element of a list stored in the thread-safe dictionary dic1 under the key '1').

    With dic2 the situation is different. With the following line:

    self.dic2['1'] = [a[0]+1, a[1], a[2]]
    

    You effectively update the value stored under the key '1'. The assigned value is a completely new list. It is made of elements of the list stored as the previous value under the same key but it is nevertheless a different list.

    Such a change is detected by the Manager object and the reference in the process in which you check the value of dic2 is seamlessly updated so that you could read the correct value.

    The main point here is the following:

    the thread-safe collection (dict) does not propagate any changes to other processes (or threads) if there are no changes to keys, or values, or both. List is a reference type so the value (i.e. reference) does not change even if the list values change.