amazon-web-servicesaws-lambdanltkaws-codebuildaws-codestar

Installing NLTK/WORDNET on AWS Lambda via CodeBuild


I'm trying to get NLTK and Wordnet working on a lambda via CodeBuild.

It looks like it installs fine in CloudFormation, but I get the following error in the Lambda:

START RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c Version: $LATEST
Unable to import module 'index': No module named 'nltk'

END RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c
REPORT RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c  Duration: 2.10 ms   Billed Duration: 100 ms     Memory Size: 128 MB Max Memory Used: 21 MB  

However when I check, it installed fine in CodeBuild:

[Container] 2018/11/06 12:45:06 Running command pip install -U nltk
Collecting nltk
 Downloading https://files.pythonhosted.org/packages/50/09/3b1755d528ad9156ee7243d52aa5cd2b809ef053a0f31b53d92853dd653a/nltk-3.3.0.zip (1.4MB)
Requirement already up-to-date: six in /usr/local/lib/python2.7/site-packages (from nltk)
Building wheels for collected packages: nltk
 Running setup.py bdist_wheel for nltk: started
 Running setup.py bdist_wheel for nltk: finished with status 'done'
 Stored in directory: /root/.cache/pip/wheels/d1/ab/40/3bceea46922767e42986aef7606a600538ca80de6062dc266c
Successfully built nltk
Installing collected packages: nltk
Successfully installed nltk-3.3

Here is the actual python code:

import json
import datetime
import nltk
from nltk.corpus import wordnet as wn

And here is the YML file:

version: 0.2

phases:
  install:
    commands:

      # Upgrade AWS CLI to the latest version
      - pip install --upgrade awscli

      # Install nltk & WordNet
      - pip install -U nltk
      - python -m nltk.downloader wordnet

  pre_build:
    commands:

      # Discover and run unit tests in the 'tests' directory. For more information, see <https://docs.python.org/3/library/unittest.html#test-discovery>
      # - python -m unittest discover tests

  build:
    commands:

      # Use AWS SAM to package the application by using AWS CloudFormation
      - aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml

artifacts:
  type: zip
  files:
    - template-export.yml

Any idea why it installs fine in CodeBuild but can't access the module NLTK in the Lambda? For reference the code runs fine in the lambda if you just remove NLTK.

I have a feeling this a YML file issue, but not sure what, given NLTK installs fine.


Solution

  • Ok, so thanks to laika for pointing me in the right direction.

    This is a working deployment of NLTK & Wordnet to Lambda via CodeStar / CodeBuild. Some things to keep in mind:

    1) You cannot use source venv/bin/activate as it is not POSIX compliant. Use . venv/bin/activate as below instead.

    2) You must set the path for NLTK as shown in the define directories section.

    buildspec.yml

    version: 0.2
    
    phases:
      install:
        commands:
    
          # Upgrade AWS CLI & PIP to the latest version
          - pip install --upgrade awscli
          - pip install --upgrade pip
    
          # Define Directories
          - export HOME_DIR=`pwd`
          - export NLTK_DATA=$HOME_DIR/nltk_data
    
      pre_build:
        commands:
          - cd $HOME_DIR
    
          # Create VirtualEnv to package for lambda
          - virtualenv venv
          - . venv/bin/activate
    
          # Install Supporting Libraries
          - pip install -U requests
    
          # Install WordNet
          - pip install -U nltk
          - python -m nltk.downloader -d $NLTK_DATA wordnet
    
          # Output Requirements
          - pip freeze > requirements.txt
    
          # Unit Tests
          # - python -m unittest discover tests
    
      build:
        commands:
          - cd $HOME_DIR
          - mv $VIRTUAL_ENV/lib/python3.6/site-packages/* .
    
          # Use AWS SAM to package the application by using AWS CloudFormation
          - aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml
    
    artifacts:
      type: zip
      files:
        - template-export.yml
    

    If anyone has any improvements LMK. It's working for me.