pythonjupyter-notebook

What is the best python data structure for the contents of jupyter notebook files?


I want to convert jupyter notebooks to python files, but before doing so, I want to filter their contents, e.g. remove all markdown cells; therefore the gui export functionality or calling nbconvert from the command line doesn't exactly satisfy my needs.

What I want to do therefore is to first load the contents of the notebook into a python container, e.g. a list of dictionaries, and then do the filtering on the container, save the remaining contents as a jupyter notebook that is then exported to a python file.

Question:

Remark:
Now that I'm not allowed to ask questions anymore on stackoverflow without being given any specific reason apart from a standard text I'll take the opportunity to say goodbye to stackoverflow and contribute to less chauvinistic forums for which the right of asking questions isn't seen as a privilege and can only be regained by scrubbing floors or polishing door knobs.


Solution

  • An IPython notebook is just JSON. You can just parse the JSON.

    The description of the format is here.

    Briefly:

    At the highest level, a Jupyter notebook is a dictionary with a few keys:

    • metadata (dict)
    • nbformat (int)
    • nbformat_minor (int)
    • cells (list)

    There are markdown cells:

    {
      "cell_type" : "markdown",
      "metadata" : {},
      "source" : ["some *markdown*"],
    }
    

    And code cells:

    {
      "cell_type" : "code",
      "execution_count": 1, # integer or null
      "metadata" : {
          "collapsed" : True, # whether the output of the cell is collapsed
          "autoscroll": False, # any of true, false or "auto"
      },
      "source" : ["some code"],
      "outputs": [{
          # list of output dicts (described below)
          "output_type": "stream",
          ...
      }],
    }
    

    But I'm not sure why:

    jupyter nbconvert --to script mynotebook.ipynb
    

    Doesn't work for you.