I am building a docker container using the following Dockerfile:
FROM ubuntu:14.04
RUN apt-get update
RUN apt-get install -y python python-dev python-pip
ADD . /app
RUN apt-get install -y python-scipy
RUN pip install -r /arrc/requirements.txt
EXPOSE 5000
WORKDIR /app
CMD python app.py
Everything goes well until I run the image and get the following error:
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
**********************************************************************
I have had this problem before and it is discussed here however I am not sure how to approach it using Docker. I have tried:
CMD python
CMD import nltk
CMD nltk.download()
as well as:
CMD python -m nltk.downloader -d /usr/share/nltk_data popular
But am still getting the error.
In your Dockerfile, try adding instead:
RUN python -m nltk.downloader punkt
This will run the command and install the requested files to //nltk_data/
The problem is most likely related to using CMD vs. RUN in the Dockerfile. Documentation for CMD:
The main purpose of a CMD is to provide defaults for an executing container.
which is used during docker run <image>
, not during build. So other CMD lines probably were overwritten by the last CMD python app.py
line.