pythonamazon-web-servicesaws-lambdajupyter-notebook

Running Jupyter Notebook on AWS Lambda


I am trying to run Jupyter Notebook on AWS Lambda, created a layer with all the dependencies, the jupyter notebook is a simple code which pulls a csv file from amazon S3 and displays the data as bar graph. Below is the lambda function written to download the .ipynb file and execute the notebook with papermill. Not sure why its failing with boto3 module not found.

import json
import sys
import os
import boto3
# papermill to execute notebook
import papermill as pm
import pandas as pd
import logging
import matplotlib.pyplot as plt

sys.path.append("/opt/bin")
sys.path.append("/opt/python")
os.environ["PYTHONPATH"]='/var/task'
os.environ["PYTHONPATH"]='/opt/python/'
os.environ["MPLCONFIGDIR"] = '/tmp/'
# ipython needs a writeable directory
os.environ["IPYTHONDIR"]='/tmp/ipythondir'
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    s3.meta.client.download_file('test-boto', 'testing.ipynb', '/tmp/test.ipynb')
    pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
    s3_client.upload_file('/tmp/juptest_output.ipynb', 'test-boto', 'temp/juptest_output.ipynb')
    logger.info(event)

Error o/p:

START RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06 Version: $LATEST
[INFO]  2020-08-07T17:55:16.602Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Input Notebook:  /tmp/test.ipynb
[INFO]  2020-08-07T17:55:16.603Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Output Notebook: /tmp/juptest_output.ipynb

Executing:   0%|          | 0/15 [00:00<?, ?cell/s][INFO]   2020-08-07T17:55:17.311Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Executing notebook with kernel: python3
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k

Executing:   7%|▋         | 1/15 [00:01<00:14,  1.06s/cell]
Executing:   7%|▋         | 1/15 [00:01<00:20,  1.46s/cell]
[ERROR] PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [1]":
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-9c332490c231> in <module>
      1 import pandas as pd
      2 import os
----> 3 import boto3
      4 import matplotlib.pyplot as plt
      5 client = boto3.client('s3')

ModuleNotFoundError: No module named 'boto3'

Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 28, in lambda_handler
    pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
  File "/opt/python/papermill/execute.py", line 110, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/opt/python/papermill/execute.py", line 222, in raise_for_execution_errors
    raise errorEND RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06
REPORT RequestId:c4da3406-c829-4f99-9fbf-b231a0d3dc06
    Duration: 1624.78 ms    Billed Duration: 1700 ms    Memory Size: 3008 MB    Max Memory Used: 293 MB

Jupyter Notebook:

import pandas as pd
import os
import boto3
import matplotlib.pyplot as plt
client = boto3.client('s3')

path = 's3://test-boto/aws-costs-Owner-Month-08.csv'
monthly_owner = pd.read_csv(path)
plt.bar(monthly_owner.Owner.head(6),monthly_owner.Amount.head(6))
plt.xlabel('Owner', fontsize=15)
plt.ylabel('Amount', fontsize=15)
plt.title('AWS Monthly Cost by Owner')
plt.show()

Solution

  • It looks like papermill kernel is not able to detect boto3 package even though your lambda handler is able to find it. I see you are overriding (not appending) PYTHONPATH in your lambda handler. This will remove other directories from PYTHONPATH to look for packages. Papermill child process will use this python path subsequently.

    You might also find this useful. It allows you to directly deploy Jupyter Notebooks as serverless functions. It uses papermill behind the scene.

    Disclaimer: I work for Clouderizer.