pythonipythonjupyter-notebookjupyternbconvert

Jupyter (IPython) notebook: Convert an HTML notebook to ipynb


I have converted a Jupyter/IPython notebook to HTML format and subsequently lost the original ipynb file.

Is there a simple way to generate the original notebook file from the converted HTML file?


Solution

  • I recently used BeautifulSoup and JSON to convert html notebook to ipynb. the trick is to look at the JSON schema of a notebook and emulate that. The code selects only input code cells and markdown cells

    here is my code

    from bs4 import BeautifulSoup
    import json
    import urllib.request
    url = 'http://nbviewer.jupyter.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb'
    response = urllib.request.urlopen(url)
    #  for local html file
    # response = open("/Users/note/jupyter/notebook.html")
    text = response.read()
    
    soup = BeautifulSoup(text, 'lxml')
    # see some of the html
    print(soup.div)
    dictionary = {'nbformat': 4, 'nbformat_minor': 1, 'cells': [], 'metadata': {}}
    for d in soup.findAll("div"):
        if 'class' in d.attrs.keys():
            for clas in d.attrs["class"]:
                if clas in ["text_cell_render", "input_area"]:
                    # code cell
                    if clas == "input_area":
                        cell = {}
                        cell['metadata'] = {}
                        cell['outputs'] = []
                        cell['source'] = [d.get_text()]
                        cell['execution_count'] = None
                        cell['cell_type'] = 'code'
                        dictionary['cells'].append(cell)
    
                    else:
                        cell = {}
                        cell['metadata'] = {}
    
                        cell['source'] = [d.decode_contents()]
                        cell['cell_type'] = 'markdown'
                        dictionary['cells'].append(cell)
    open('notebook.ipynb', 'w').write(json.dumps(dictionary))
    

    here is part of print(soup.div) output

    div class="container">
    <div class="navbar-header">
    <button class="navbar-toggle collapsed" data-target=".navbar-collapse" data-toggle="collapse" type="button">
    <span class="sr-only">Toggle navigation</span>
    <i class="fa fa-bars"></i>
    </button>
    <a class="navbar-brand" href="/">
    <img src="/static/img/nav_logo.svg?v=479cefe8d932fb14a67b93911b97d70f" width="159"/>
    </a>
    </div>
    <div class="collapse navbar-collapse">
    <ul class="nav navbar-nav navbar-right">
    <li>
    <a class="active" href="http://jupyter.org">JUPYTER</a>
    </li>
    <li>
    <a href="/faq" title="FAQ">
    <span>FAQ</span>
    

    A screen shot of the resulting ipynb file, loaded on my local jupyter and after running all the cells

    enter image description here