I'm trying to get NLTK and Wordnet working on a lambda via CodeBuild.
It looks like it installs fine in CloudFormation, but I get the following error in the Lambda:
START RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c Version: $LATEST
Unable to import module 'index': No module named 'nltk'
END RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c
REPORT RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c Duration: 2.10 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 21 MB
However when I check, it installed fine in CodeBuild:
[Container] 2018/11/06 12:45:06 Running command pip install -U nltk
Collecting nltk
Downloading https://files.pythonhosted.org/packages/50/09/3b1755d528ad9156ee7243d52aa5cd2b809ef053a0f31b53d92853dd653a/nltk-3.3.0.zip (1.4MB)
Requirement already up-to-date: six in /usr/local/lib/python2.7/site-packages (from nltk)
Building wheels for collected packages: nltk
Running setup.py bdist_wheel for nltk: started
Running setup.py bdist_wheel for nltk: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/d1/ab/40/3bceea46922767e42986aef7606a600538ca80de6062dc266c
Successfully built nltk
Installing collected packages: nltk
Successfully installed nltk-3.3
Here is the actual python code:
import json
import datetime
import nltk
from nltk.corpus import wordnet as wn
And here is the YML file:
version: 0.2
phases:
install:
commands:
# Upgrade AWS CLI to the latest version
- pip install --upgrade awscli
# Install nltk & WordNet
- pip install -U nltk
- python -m nltk.downloader wordnet
pre_build:
commands:
# Discover and run unit tests in the 'tests' directory. For more information, see <https://docs.python.org/3/library/unittest.html#test-discovery>
# - python -m unittest discover tests
build:
commands:
# Use AWS SAM to package the application by using AWS CloudFormation
- aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml
artifacts:
type: zip
files:
- template-export.yml
Any idea why it installs fine in CodeBuild but can't access the module NLTK in the Lambda? For reference the code runs fine in the lambda if you just remove NLTK.
I have a feeling this a YML file issue, but not sure what, given NLTK installs fine.
Ok, so thanks to laika for pointing me in the right direction.
This is a working deployment of NLTK & Wordnet to Lambda via CodeStar / CodeBuild. Some things to keep in mind:
1) You cannot use source venv/bin/activate
as it is not POSIX compliant. Use . venv/bin/activate
as below instead.
2) You must set the path for NLTK as shown in the define directories section.
buildspec.yml
version: 0.2
phases:
install:
commands:
# Upgrade AWS CLI & PIP to the latest version
- pip install --upgrade awscli
- pip install --upgrade pip
# Define Directories
- export HOME_DIR=`pwd`
- export NLTK_DATA=$HOME_DIR/nltk_data
pre_build:
commands:
- cd $HOME_DIR
# Create VirtualEnv to package for lambda
- virtualenv venv
- . venv/bin/activate
# Install Supporting Libraries
- pip install -U requests
# Install WordNet
- pip install -U nltk
- python -m nltk.downloader -d $NLTK_DATA wordnet
# Output Requirements
- pip freeze > requirements.txt
# Unit Tests
# - python -m unittest discover tests
build:
commands:
- cd $HOME_DIR
- mv $VIRTUAL_ENV/lib/python3.6/site-packages/* .
# Use AWS SAM to package the application by using AWS CloudFormation
- aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml
artifacts:
type: zip
files:
- template-export.yml
If anyone has any improvements LMK. It's working for me.