pythonmultiprocessingpython-multiprocessing

Isolation of a custom multiprocessing manager and how to update internal state


I try to use a custom multiprocessing manager, mostly following the example from the docs. The main difference is that my class updates internal state. It looks like this:

class IdIndex:
    def __init__(self):
        self.data = set()
        self.call_count = 0
        self.lock = multiprocessing.Lock()

    def get_new_ones(self, ids):
        with self.lock:
            self.call_count += 1
            new_ones = ids - self.data
            self.data.update(new_ones)
            return new_ones


class IndexManager(BaseManager):
    pass


IndexManager.register("ids", IdIndex)

Later I use it like this:

with IndexManager() as index:
    # pass index.ids() proxies to subprocesses

My understanding is, that IndexManager starts a new process which hosts a single instance of IdIndex. If I call get_new_ones on one of the proxy objects the call will be forwarded to the single in instance in the dedicated process and will be processed there. So there should be only one "shared" instance of IdIndex. Even the self.lock should not be necessary.

Based on what I observe based on detailed logging this understanding is wrong. self.call_count is kind of incremented, but not sequentially. It looks like there were either multiple instances of IdIndex or something is cached in the proxy objects. But I have a hard time putting my finger on what's really going on. If I log self.call_count I get something like 1,2,3,4,4,5,6,4,5,5,7,8,8,...

Can somebody explain what's wrong with my understanding and how to set this up, so that I have just one single instance of IdIndex?


Solution

  • What, no minimal, reproducible example? So I can only speak in generalizations here:

    Proxies created with calls to index.ids() do not by default reference the same, single IdIndex instance.

    If you are creating a single instance of IdIndex and passing its proxy reference to whomever needs to access it, whether in the same or different thread or process, then yes, there will be a single instance of IdIndex running in the process created by the manager. But if multiple proxies are created with code such as my_id_index = index.ids(), then there will be multiple IdIndex instances living in the manager's process. BTW, why are you naming an IndexManager instance index rather than the more meaningful index_manager or even just manager?

    Consider:

    from multiprocessing.managers import BaseManager
    import threading
    
    class IdIndex:
        def __init__(self):
            self.data = set()
            self.call_count = 0
            self.lock = threading.Lock()
    
        def get_new_ones(self, ids):
            with self.lock:
                self.call_count += 1
                new_ones = ids - self.data
                self.data.update(new_ones)
                return new_ones
    
        def get_id(self):
            return id(self)
    
    
    class IndexManager(BaseManager):
        pass
    
    
    IndexManager.register("ids", IdIndex)
    
    def main():
        with IndexManager() as manager:
            idx1 = manager.ids()
            idx2 = manager.ids()
            print(idx1.get_id(), idx2.get_id())
    
    if __name__ == '__main__':
        main()
    

    Prints:

    2971746618016 2971747107856
    

    This clearly shows two distinct instances live with the manager process.

    Now consider:

    from multiprocessing.managers import BaseManager
    import threading
    
    class IdIndex:
        def __init__(self):
            self.data = set()
            self.call_count = 0
            self.lock = threading.Lock()
    
        def get_new_ones(self, ids):
            with self.lock:
                self.call_count += 1
                new_ones = ids - self.data
                self.data.update(new_ones)
                return new_ones
    
        def get_id(self):
            return id(self)
    
    
    class IndexManager(BaseManager):
        pass
    
    
    singleton = None
    
    def get_singleton():
        global singleton
        
        if singleton is None:
            singleton = IdIndex()
        return singleton
    
    
    IndexManager.register("ids", get_singleton)
    
    def main():
        with IndexManager() as manager:
            idx1 = manager.ids()
            idx2 = manager.ids()
            print(idx1.get_id(), idx2.get_id())
    
    if __name__ == '__main__':
        main()
    

    Prints:

    1751065705744 1751065705744
    

    Of course, this only works if the same process is creating all the proxies.