I wish to run an embarrassingly parallel function that creates plots (and eventually will save them to a file) using Jupyter Notebook with Python (edit - I found a much simpler way to do exactly this here). I'm trying the simplest version possible and I'm getting an import error.
Where and why should I import the relevant modules? I think I'm importing them everywhere just to be sure but still I have an error!
The positions in the files for the imports are numbered from 1-4
[1] Is this line really necessary? why?
[2] Is this line really necessary? why?
[3] Is this line really necessary? why?
[4] Is this line really necessary? why?
Below are my files: The jupyter notebook file:
import ipyparallel
clients = ipyparallel.Client()
print(clients.ids)
dview = clients[:]
with dview.sync_imports():
import module #[1]
import matplotlib #[2]
import module #[3]
dview.map_sync(module.pll, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
and a python file name module.py
import matplotlib #[4]
def pll(x):
matplotlib.pyplot.plot(x, '.')
When I run the notebook I get the following output
[0, 1, 2, 3, 4, 5]
importing module on engine(s)
importing matplotlib on engine(s)
[Engine Exception]
NameErrorTraceback (most recent call last)<string> in <module>()
(...)
NameError: name 'matplotlib' is not defined
sync_imports is unnecessary when you use module functions. This should be sufficient:
# notebook:
import ipyparallel as ipp
client = ipp.Client()
dview = client[:]
import module
dview.map_sync(module.pll, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
and
# module.py
from matplotlib import pyplot
def pll(x):
pyplot.plot(x, '.')
One caveat: You will almost certainly want to setup matplotlib to use a non-default backend on the engines. You must do this before importing pyplot. The two logical choices with ipython parallel are agg
if you are just saving to files, or %matplotlib inline
if you want to see plots interactively in the notebook. To use agg:
import matplotlib
dview.apply_sync(matplotlib.use, 'agg')
or setup inline plotting:
%px %matplotlib inline
To answer your bulleted questions:
module
defined in the globals everywherematplotlib
defined in the globals everywhere.pll
to mapmodule
is a different namespace from __main__
, which is where all of your notebook code is running.There are two contexts you need to think about when dealing with what needs to be imported and where:
When a function is defined interactively (that is, the def foo()
is in your notebook), name lookup is performed in the interactive namespace, and the interactive namespace on your engines may differ between the engines and client. For instance, you can see this with:
import numpy
%px numpy = 'whywouldyoudothis'
def return_numpy():
return numpy # resolved locally *on engines*
dview.apply_sync(return_numpy)
where the apply
will return a list of ['why..'] strings, not your local numpy
import. Python doesn't know that names refer to modules or anything else; it's all a matter of what namespace(s) are used for looking up the names. This is why you will often see interactively defined functions that look like one of these:
import module
%px import module
def foo():
return module.x
or this:
def foo():
import module
return module.x
Both are ways to ensure that module
in foo
maps to the imported module on the engines: one performs an interactive-namespace import everywhere and relies on global-namespace lookup. The other imports in the function, so it can't be wrong.
sync_imports()
is a pure-Python way to do the same thing as:
import module
%px import module
It imports the module both here and there. If you use sync_imports
, it is unnecessary to repeat the import locally as well, as the local import has already been performed.
If the function is defined in a module, as yours is, it will find globals in its module, not in the interactive namespace. So import matplotlib
in your notebook has no effect on whether the matplotlib
name is defined when module.pll
is called. Similarly, importing matplotlib in the module does not make it available in the interactive namespace.
Something important to consider: when you send a module function to the engines, it only sends a reference to the function, not the content of the function or module. So if from module import pll
returns something different on the engines from the client, you will get different behavior. This can trip people up when working with local modules in ipython parallel while actively changing that module. Reloading that module in the notebook does not reload the module on the engines. It's going to send the same module.pll
reference. So if you are actively working on module.py
, you are going to need to call reload(module)
everywhere when that module changes.