pythoncontinuous-integrationnltkgithub-actionsbuilding-github-actions

GitHub Actions fails when NLTK "punkt" is needed


I have written some code which needs to use NLTK's punkt. I have included nltk in the requirements.txt and in the setup.py. However, when I run the build of my project using GitHub actions, it fails with this error.

E       LookupError:   
E       **********************************************************************  
E         Resource punkt not found.  
E         Please use the NLTK Downloader to obtain the resource:  
E       
E         >>> import nltk  
E         >>> nltk.download('punkt') 

What is the standard way to tell GitHub actions that it needs 'punkt' without hard coding nltk.download('punkt') somewhere into the code? Should I add a line in the ci.yml file, and what is the best way to do it?


Solution

  • In the ci.yml file, adding the nltk.downloader commandline after importing dependencies defined in requirements.txt worked for me.

    if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
    python -m nltk.downloader punkt stopwords