I'm trying to bundle SpaCy into a pex file as detailed in their docs here. I have successfully built the pex file, and am now trying to run a simple python script that runs SpaCy against a passed in text file. I import SpaCy and try to load the en_core_web_trf
model as shown below.
import spacy
nlp = spacy.load("en_core_web_trf")
I try to run the script with the spacy.pex
executable output by their tool and am getting:
Can't find model 'en_core_web_trf'. It doesn't seem to be a Python package or a valid path to a data directory
I have tried to download the model by running:
./spacy.pex -m spacy download en_core_web_trf
but that doesn't seem to work either.
I have tried to download the model directly from their github page here. and extract the tar archive to the same directory as the spacy.pex
file and renamed the resulting folder to en_core_web_trf
and instead get:
[E053] Could not read config file from en_core_web_trf/config.cfg
Looking through that model folder, I found that there is a nested config.cfg
in the subfolder en_core_web_trf
. I tried moving that folder up and still no dice.
How do I package the model so that I can use it with the spacy.pex
file and run scripts with it?
First make sure you install the model in your terminal using this:
python -m spacy download en_core_web_trf
After that you can use the following code to load the model:
import spacy
import en_core_web_trf
nlp = en_core_web_trf.load()
Probably you will receive this error:
ValueError: [E002] Can't find factory for 'transformer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).
So make sure you install spacy-transformers
too in your terminal using pip install spacy-transformers
. With the final code like this:
import spacy
import spacy_transformers
import en_core_web_trf
nlp = en_core_web_trf.load()