pythonproxymultiprocessingpython-multiprocessingmultiprocessing-manager

Multiprocessing proxy: let getters return proxies themselves


I have a complex unpickable object that has properties (defined via getters and setters) that are of complex and unpickable type as well. I want to create a multiprocessing proxy for the object to execute some tasks in parallel.

The problem: While I have succeeded to make the getter methods available for the proxy object, I fail to make the getters return proxies for the unpickable return objects.

My setup resembles the following:

from multiprocessing.managers import BaseManager, NamespaceProxy

class A():
    @property
    def a(self):
        return B()
    @property
    def b(self):
        return 2

# unpickable class
class B():
    def __init__(self, *args):
        self.f = lambda: 1
    

class ProxyBase(NamespaceProxy):
    _exposed_ = ('__getattribute__', '__setattr__', '__delattr__')

class AProxy(ProxyBase): pass
class BProxy(ProxyBase): pass
class MyManager(BaseManager):pass

MyManager.register('A', A, AProxy)

if __name__ == '__main__':
    with MyManager() as manager:
        myA = manager.A()
        print(myA.b) # works great
        print(myA.a) # raises error, because the object B is not pickable

I know that I can specify the result type of a method when registering it with the manager. That is, I can do

MyManager.register('A', A, AProxy, method_to_typeid={'__getattribute__':'B'})
MyManager.register('B', B, BProxy)


if __name__ == '__main__':
    with MyManager() as manager:
        myA = manager.A()
        print(myA.a) # works great!
        print(myA.b) # returns the same as myA.a ?!

It is clear to me that my solution does not work since the __getattr__ method applies to all properties, whereas I only want it to return a proxy for B when property a is accessed. How could I achieve this?

As a side question: if I remove the *args argument from the __init__ method of B, I get an error that it is called with the wrong number of arguments. Why? How could I resolve this?


Solution

  • I don't this is possible without some hacks, since the choice to return a value or proxy is made based on the method name alone, and not the type of the return value (from Server.serve_client):

    try:
        res = function(*args, **kwds)
    except Exception as e:
        msg = ('#ERROR', e)
    else:
        typeid = gettypeid and gettypeid.get(methodname, None)
        if typeid:
            rident, rexposed = self.create(conn, typeid, res)
            token = Token(typeid, self.address, rident)
            msg = ('#PROXY', (rexposed, token))
        else:
            msg = ('#RETURN', res)
    

    Also keep in mind exposing __getattribute__ in an unpickable class's proxy basically breaks the proxy functionality when calling methods.

    But if you're willing to hack it and just need attribute access, here is a working solution (note calling myA.a.f() still won't work, the lambda is an attribute and is not proxied, only methods are, but that's a different problem).

    import os
    from multiprocessing.managers import BaseManager, NamespaceProxy, Server
    
    class A():
        @property
        def a(self):
            return B()
        @property
        def b(self):
            return 2
    
    # unpickable class
    class B():
        def __init__(self, *args):
            self.f = lambda: 1
            self.pid = os.getpid()
    
    class HackedObj:
        def __init__(self, obj, gettypeid):
            self.obj = obj
            self.gettypeid = gettypeid
    
        def __getattribute__(self, attr):
            if attr == '__getattribute__':
                return object.__getattribute__(self, attr)
                
            obj = object.__getattribute__(self, 'obj')
            result = object.__getattribute__(obj, attr)
            if isinstance(result, B):
                gettypeid = object.__getattribute__(self, 'gettypeid')
                # This tells the server that the return value of this method is
                # B, for which we've registered a proxy.
                gettypeid['__getattribute__'] = 'B'
    
    
            return result
    
    class HackedDict:
        def __init__(self, data):
            self.data = data
    
        def __setitem__(self, key, value):
            self.data[key] = value
    
        def __getitem__(self, key):
            obj, exposed, gettypeid = self.data[key]
            if isinstance(obj, A):
                gettypeid = gettypeid.copy() if gettypeid else {}
                # Now we need getattr to update gettypeid based on the result
                # luckily BaseManager queries the typeid info after the function
                # has been invoked
                obj = HackedObj(obj, gettypeid)
    
            return (obj, exposed, gettypeid)
    
    class HackedServer(Server):
        def __init__(self, registry, address, authkey, serializer):
            super().__init__(registry, address, authkey, serializer)
            self.id_to_obj = HackedDict(self.id_to_obj)
    
    class MyManager(BaseManager):
        _Server = HackedServer
    
    class ProxyBase(NamespaceProxy):
        _exposed_ = ('__getattribute__', '__setattr__', '__delattr__')
    class AProxy(ProxyBase): pass
    class BProxy(ProxyBase): pass
    
    MyManager.register('A', callable=A, proxytype=AProxy)
    MyManager.register('B', callable=B, proxytype=BProxy)
    
    if __name__ == '__main__':
        print("This process: ", os.getpid())
        with MyManager() as manager:
            myB = manager.B()
            print("Proxy process, using B directly: ", myB.pid)
            
            myA = manager.A()
            print('myA.b', myA.b)
            
            print("Proxy process, via A: ", myA.a.pid)
    
    

    The key to the solution is to replace the _Server in our manager, and then wrap the id_to_obj dict with the one that performs the hack for the specific method we need.

    The hack consists on populating the gettypeid dict for the method, but only after it has been evaluated and we know the return type to be one that we would need a proxy for. And we're lucky in the order of evaluations, gettypeid is accessed after the method has been called.

    Also luckily gettypeid is used as a local in the serve_client method, so we can return a copy of it and modify it and we don't introduce any concurrency issues.

    While this was a fun exercise, I have to say I really advise against this solution, if you're dealing with external code that you cannot modify, you should simply create your own wrapper class that has explicit methods instead of @property accessors, proxy your own class instead, and use method_to_typeid.