pythonpipsetuptoolspyproject.toml

How to access package data after specifying location in pyproject.toml


This question is a follow-up to this question on including package data using setuptools in pyproject.toml.

The file structure for my package is as follows:

project_root_directory
├── pyproject.toml
└── mypkg
    ├── models
    │   ├── __init__.py
    │   ├── model1.pkl
    │   └── model2.pkl
    ├── __init__.py
    ├── module1.py
    └── module2.py

In pyproject.toml, I include the package data using the below specification, following the setuptools protocol:

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
include = ["mypkg"]  

[tool.setuptools.package-data]
"mypkg.models" = ["*.pkl"]

Following the instructions in the setuptools protocol I include the following code in module1.py to access the data at runtime.

import importlib.resources
import joblib

def load_model(package, filename):
    '''
    Reads in the model pickle files from the package
    '''
    with importlib.resources.path(package, filename) as file_path:
        return joblib.load(file_path)

saved_model=load_model('mypkg.models', 'model1.pkl')

But I get this error, when I try to package the code and run it:

ModuleNotFoundError: No module named 'mypkg.models'

How can I fix this so that I can load the models in module1.py?

Thank you in advance for any help with this!


Solution

  • In case it's helpful to anyone, I was able to resolve this by adding a MANIFEST.in file at the same level as pyproject.toml.

    If you're new to making a MANIFEST.in file, I only needed to add the following line to the file to get this to work:

    include mypkg/models/*.pkl
    

    Once I did that, I needed to modify the package-data section of pyproject.toml to the following:

    [tool.setuptools.package-data]
    "models" = ["*.pkl"]
    

    With those changes, I was able to package and access both of the .pkl files within the script.