What is the best python data structure for the contents of jupyter notebook files?

I want to convert jupyter notebooks to python files, but before doing so, I want to filter their contents, e.g. remove all markdown cells; therefore the gui export functionality or calling nbconvert from the command line doesn't exactly satisfy my needs.

What I want to do therefore is to first load the contents of the notebook into a python container, e.g. a list of dictionaries, and then do the filtering on the container, save the remaining contents as a jupyter notebook that is then exported to a python file.

Question:

what data structure is most appropriate for holding the contents of a jupyter notebook?
Are there any libraries that already enable the manipulation of jupyter notebooks with python?

Remark:
Now that I'm not allowed to ask questions anymore on stackoverflow without being given any specific reason apart from a standard text I'll take the opportunity to say goodbye to stackoverflow and contribute to less chauvinistic forums for which the right of asking questions isn't seen as a privilege and can only be regained by scrubbing floors or polishing door knobs.

Solution

An IPython notebook is just JSON. You can just parse the JSON.

The description of the format is here.

Briefly:

At the highest level, a Jupyter notebook is a dictionary with a few keys:

metadata (dict)

nbformat (int)

nbformat_minor (int)

cells (list)

There are markdown cells:

{
  "cell_type" : "markdown",
  "metadata" : {},
  "source" : ["some *markdown*"],
}

And code cells:

{
  "cell_type" : "code",
  "execution_count": 1, # integer or null
  "metadata" : {
      "collapsed" : True, # whether the output of the cell is collapsed
      "autoscroll": False, # any of true, false or "auto"
  },
  "source" : ["some code"],
  "outputs": [{
      # list of output dicts (described below)
      "output_type": "stream",
      ...
  }],
}

But I'm not sure why:

jupyter nbconvert --to script mynotebook.ipynb

Doesn't work for you.