pythonjupyter-notebookjupyter

How to share code in different Jupyter notebooks in subfolders?


I need to create a Github repository where I can organize Jupyter notebooks by topic as tutorials. Some notebooks will require to load large(r) data files, which I don't want to be part of the repository themselves.

My idea is to provide all data files in a different online resource, and download the required files in the notebooks using some custom auxiliary method in some utils.py script.

Since I want to use different subfolders for organizing the notebooks, utils.py would need to reside in a parent folder. However, loading .py files from a parent folder within a notebook seems to require manually tweaking the class path in the notebook.

I guess an alternative would be to put utils.py (and other shared code) into its own package that needs to be installed before using the notebook. Kind feels like overkill?

Is there some other and better alternative to handle this.


Solution

  • Use a single utils.py file and structuring notebooks into topic-based folders.

    Christian_tutorials/
    │
    ├── utils.py                    
    │
    ├── data/                          
    └── notebooks/
        ├── topic1/
        │   ├── notebook1.ipynb
        │   └── notebook2.ipynb
        └── topic2/
            └── notebook3.ipynb
    

    Assumptions

    Utils.py - Assuming shared functions here.
    data/- Assumng local folder forthe storage and download.
    

    Accessing the utils

    import sys
    sys.path.append('../..')  
    from utils import get_data
    
    # Now you can use it to get data:
    data_file = get_data(
        filename="example.csv",
        url="https://storage_url/example.csv"
    )
    
    
    import pandas as pd
    df = pd.read_csv(data_file)
    

    utils.py

    import os
    import requests
    from pathlib import Path
    
    def get_data(filename, url):
        data_dir = Path("data")
        data_dir.mkdir(exist_ok=True)
        file_path = data_dir / filename
        if not file_path.exists():
            print(f"Downloading {filename}...")
            response = requests.get(url)
            response.raise_for_status()
            with open(file_path, 'wb') as f:
                f.write(response.content)
            print("Download complete!")
        return file_path
    

    The get_data function streamlines data handling by creating a data folder if it doesn’t exist, checking for the requested file, and downloading it only if necessary