pythonpyinstallerspacy

Can't find SpaCy model when packaging with PyInstaller


I am using PyInstaller package a python script into an .exe. This script is using spacy to load up the following model: en_core_web_sm. I have already run python -m spacy download en_core_web_sm to download the model locally. The issue is when PyInstaller tries to package up my script it can't find the model. I get the following error: Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory. I thought maybe this meant that I needed to run the download command in my python script in order to make sure it has the model, but if I have my script download the model it just says the requirements are already satisfied. I also have a hook file that handles bringing in hidden imports and is supposed to bring in the model as well:

from PyInstaller.utils.hooks import collect_all, collect_data_files

datas = []
datas.extend(collect_data_files('en_core_web_sm'))

# ----------------------------- SPACY -----------------------------
data = collect_all('spacy')

datas.extend(data[0])
binaries = data[1]
hiddenimports = data[2]

# ----------------------------- THINC -----------------------------
data = collect_all('thinc')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- CYMEM -----------------------------
data = collect_all('cymem')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- PRESHED -----------------------------
data = collect_all('preshed')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- BLIS -----------------------------

data = collect_all('blis')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- STDNUM -----------------------------

data = collect_all('stdnum')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- OTHER -------------------------------

hiddenimports += ['srsly.msgpack.util']

I use the following code to download the model and then to package the script with PyInstaller:

os.system('python -m spacy download en_core_web_sm')
PyInstaller.__main__.run([path_to_script, '--onefile', '--additional-hooks-dir=.'])

The hook-spacy.py script is in the same directory as the script that is being packaged by PyInstaller.

All of this works if I run the script locally. It finds the model as it should. I only get this error if I try to package the script with PyInstaller and try to run the .exe.

I am using Python v3.8.7, PyInstaller v4.2, and spacy v3.0.3 with en_core_web_sm v3.0.0


Solution

  • When you use PyInstaller to collect data files into the bundle as you are doing here, the files are actually compiled into the resulting exe itself. This is transparently handled for Python code by PyInstaller when import statements are evaluated.

    However, for data files you must handle this yourself. For instance, spacy is likely looking for the model in the current working directory. It won’t find your model because it is compiled into the .exe instead and therefore isn’t present in the current working directory.

    You will need to use this API:

    https://pyinstaller.readthedocs.io/en/stable/spec-files.html#using-data-files-from-a-module

    This allows you to read a data file from the exe that PyInstaller creates. You can then write it to the current working directory and then spacy should be able to find it.