pythonjupyter-notebookcustomizationwatson-studio

customise environment jupyter notebook


I wish to create a custom environment for my Jupyter notebook, without having to install the various packages from within the session.

Following the instructions at https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/customize-envs.html, I customised the environment as follows

# Modify the following content to add a software customization to an environment.
# To remove an existing customization, delete the entire content and click Apply.

# Add conda channels below defaults, indented by two spaces and a hyphen.
channels:
  - defaults

# To add packages through conda or pip, remove the comment on the following line.
# dependencies:

# Add conda packages here, indented by two spaces and a hyphen.
# Remove the comment on the following line and replace sample package name with your package name:
  - ffmpeg=4.2.2

# Add pip packages here, indented by four spaces and a hyphen.
# Remove the comments on the following lines  and replace sample package name with your package name.
  - pip:
    - numpy==1.18.0
    - pandas==1.0.3
    - matplotlib==3.1.3

because I need mpeg codec in my notebook and recent versions of pandas, numpy and matplotlib.

The configurations are

Environment             Custom env
Creator                 Andrea Chiappo
Language                Python 3.6
Hardware configuration  4 vCPU and 16 GB RAM
Software configuration  Default Python 3.6 + DO

However, once I initiate the session, if I try

import pandas as pd
print(pd.__version__)

I get the default version of the package, which is 0.24.1.

Does anybody know how to enable the more recent versions of such Python packages in my jypter session? Many thanks


Solution

  • You need to uncomment the line # dependencies:. And feel free to get rid of all the comments. That will help to spot indentation problems with the YAML format. Try this:

    dependencies:
      - ffmpeg=4.2.2
      - pip
      - pip:
        - numpy==1.18.0
        - pandas==1.0.3
        - matplotlib==3.1.3
    

    But installing updates of numpy and pandas from PyPI instead of Anaconda may lead to problems. Some of the Anaconda packages are built for a specific version of numpy and could misbehave if you replace numpy. I recommend that you get packages from Anaconda rather than PyPI wherever that is feasible. In a Python notebook, you can check what changes conda will apply like this:

    !conda install --dry-run numpy=1.18
    

    Likewise for pandas and matplotlib. Or you specify all three with the same command. Though it won't be obvious then which of the package changes causes dependency updates to be pulled in.

    !conda install --dry-run numpy=1.18 pandas=1.0 matplotlib=3.1
    

    I didn't try whether the exact fixlevels you want to install are available from Anaconda. If not, or if certain combinations don't work (for example when Anaconda doesn't build a new package version for Python 3.6 anymore), it's you choice whether you want to go with what Anaconda provides, or whether to get the package from PyPI and hope that nothing breaks.

    PS: In the customization above, I've added the pip dependency as a matter of style. Newer versions of conda will complain about having a pip: section when pip is not listed as a dependency. When we roll out new runtimes in WS Cloud, they'll use a newer version of conda.