I have a Python script that runs well when I run it normally:
$ python script.py <options>
I am attempting to profile the code using the cProfile module:
$ python -m cProfile -o script.prof script.py <options>
When I launch the above command I get an error regarding being unable to pickle a function:
Traceback (most recent call last):
File "scripts/process_grid.py", line 1500, in <module>
_compute_write_index(kwrgs)
File "scripts/process_grid.py", line 626, in _compute_write_index
args,
File "scripts/process_grid.py", line 1034, in _parallel_process
pool.map(_apply_along_axis_palmers, chunk_params)
File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
put(task)
File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function _apply_along_axis_palmers at 0x7fe05a540b70>: attribute lookup _apply_along_axis_palmers on __main__ failed
The code uses multiprocessing, and I assume that this is where the pickling is taking place.
The code in play is here on GitHub.
Essentially I'm mapping a function and a corresponding argument dictionary in a process pool:
pool.map(_apply_along_axis_palmers, chunk_params)
The function _apply_along_axis_palmers
is "picklable" as far as I know, in that it's defined at the top level of the module. Again this error doesn't occur when running outside of the cProfile context, so maybe that's adding additional constraints for pickling?
Can anyone comment as to why this may be happening, and/or how I can rectify the issue?
The problem you've got here is that, by using -mcProfile
, the module __main__
is cProfile
(the actual entry point of the code), not your script. cProfile
tries to fix this by ensuring that when your script runs, it sees __name__
as "__main__"
, so it knows it's being run as a script, not imported as a module, but sys.modules['__main__']
remains the cProfile
module.
Problem is, pickle
handles pickling functions by just pickling their qualified name (plus some boilerplate to say it's a function in the first place). And to make sure it will survive the round trip, it always double checks that the qualified name can be looked up in sys.modules
. So when you do pickle.dumps(_apply_along_axis_palmers)
(explicitly, or implicitly in this case by passing it as the mapper function), where _apply_along_axis_palmers
is defined in your main script, it double checks that sys.modules['__main__']._apply_along_axis_palmers
exists. But it doesn't, because cProfile._apply_along_axis_palmers
doesn't exist.
I don't know of a good solution for this. The best I can come up with is to manually fix up sys.modules
to make it expose your module and its contents correctly. I haven't tested this completely, so it's possible there will be some quirks, but a solution I've found is to change a module named mymodule.py
of the form:
# imports...
# function/class/global defs...
if __name__ == '__main__':
main() # Or series of statements
to:
# imports...
import sys
# function/class/global defs...
if __name__ == '__main__':
import cProfile
# if check avoids hackery when not profiling
# Optional; hackery *seems* to work fine even when not profiling, it's just wasteful
if sys.modules['__main__'].__file__ == cProfile.__file__:
import mymodule # Imports you again (does *not* use cache or execute as __main__)
globals().update(vars(mymodule)) # Replaces current contents with newly imported stuff
sys.modules['__main__'] = mymodule # Ensures pickle lookups on __main__ find matching version
main() # Or series of statements
From there on out, sys.modules['__main__']
refers to your own module, not cProfile
, so things seem to work. cProfile
still seems to work despite this, and pickling finds your functions as expected. Only real cost is reimporting your module, but if you're doing enough real work, the cost of reimporting should be fairly small.