pythonpicklepython-multiprocessingcprofile

cProfile causes pickling error when running multiprocessing Python code


I have a Python script that runs well when I run it normally:

$ python script.py <options>

I am attempting to profile the code using the cProfile module:

$ python -m cProfile -o script.prof script.py <options>

When I launch the above command I get an error regarding being unable to pickle a function:

Traceback (most recent call last):
  File "scripts/process_grid.py", line 1500, in <module>
    _compute_write_index(kwrgs)
  File "scripts/process_grid.py", line 626, in _compute_write_index
    args,
  File "scripts/process_grid.py", line 1034, in _parallel_process
    pool.map(_apply_along_axis_palmers, chunk_params)
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
    put(task)
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function _apply_along_axis_palmers at 0x7fe05a540b70>: attribute lookup _apply_along_axis_palmers on __main__ failed

The code uses multiprocessing, and I assume that this is where the pickling is taking place.

The code in play is here on GitHub.

Essentially I'm mapping a function and a corresponding argument dictionary in a process pool:

pool.map(_apply_along_axis_palmers, chunk_params)

The function _apply_along_axis_palmers is "picklable" as far as I know, in that it's defined at the top level of the module. Again this error doesn't occur when running outside of the cProfile context, so maybe that's adding additional constraints for pickling?

Can anyone comment as to why this may be happening, and/or how I can rectify the issue?


Solution

  • The problem you've got here is that, by using -mcProfile, the module __main__ is cProfile (the actual entry point of the code), not your script. cProfile tries to fix this by ensuring that when your script runs, it sees __name__ as "__main__", so it knows it's being run as a script, not imported as a module, but sys.modules['__main__'] remains the cProfile module.

    Problem is, pickle handles pickling functions by just pickling their qualified name (plus some boilerplate to say it's a function in the first place). And to make sure it will survive the round trip, it always double checks that the qualified name can be looked up in sys.modules. So when you do pickle.dumps(_apply_along_axis_palmers) (explicitly, or implicitly in this case by passing it as the mapper function), where _apply_along_axis_palmers is defined in your main script, it double checks that sys.modules['__main__']._apply_along_axis_palmers exists. But it doesn't, because cProfile._apply_along_axis_palmers doesn't exist.

    I don't know of a good solution for this. The best I can come up with is to manually fix up sys.modules to make it expose your module and its contents correctly. I haven't tested this completely, so it's possible there will be some quirks, but a solution I've found is to change a module named mymodule.py of the form:

    # imports...
    # function/class/global defs...
    
    if __name__ == '__main__':
        main()  # Or series of statements
    

    to:

    # imports...
    import sys
    # function/class/global defs...
    
    if __name__ == '__main__':
        import cProfile
        # if check avoids hackery when not profiling
        # Optional; hackery *seems* to work fine even when not profiling, it's just wasteful
        if sys.modules['__main__'].__file__ == cProfile.__file__:
            import mymodule  # Imports you again (does *not* use cache or execute as __main__)
            globals().update(vars(mymodule))  # Replaces current contents with newly imported stuff
            sys.modules['__main__'] = mymodule  # Ensures pickle lookups on __main__ find matching version
        main()  # Or series of statements
    

    From there on out, sys.modules['__main__'] refers to your own module, not cProfile, so things seem to work. cProfile still seems to work despite this, and pickling finds your functions as expected. Only real cost is reimporting your module, but if you're doing enough real work, the cost of reimporting should be fairly small.