This question is a follow-up to this question on including package data using setuptools
in pyproject.toml
.
The file structure for my package is as follows:
project_root_directory
├── pyproject.toml
└── mypkg
├── models
│ ├── __init__.py
│ ├── model1.pkl
│ └── model2.pkl
├── __init__.py
├── module1.py
└── module2.py
In pyproject.toml
, I include the package data using the below specification, following the setuptools protocol:
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
include = ["mypkg"]
[tool.setuptools.package-data]
"mypkg.models" = ["*.pkl"]
Following the instructions in the setuptools protocol I include the following code in module1.py
to access the data at runtime.
import importlib.resources
import joblib
def load_model(package, filename):
'''
Reads in the model pickle files from the package
'''
with importlib.resources.path(package, filename) as file_path:
return joblib.load(file_path)
saved_model=load_model('mypkg.models', 'model1.pkl')
But I get this error, when I try to package the code and run it:
ModuleNotFoundError: No module named 'mypkg.models'
How can I fix this so that I can load the models in module1.py
?
Thank you in advance for any help with this!
In case it's helpful to anyone, I was able to resolve this by adding a MANIFEST.in
file at the same level as pyproject.toml
.
If you're new to making a MANIFEST.in
file, I only needed to add the following line to the file to get this to work:
include mypkg/models/*.pkl
Once I did that, I needed to modify the package-data
section of pyproject.toml
to the following:
[tool.setuptools.package-data]
"models" = ["*.pkl"]
With those changes, I was able to package and access both of the .pkl
files within the script.