I am trying to run Jupyter Notebook on AWS Lambda, created a layer with all the dependencies, the jupyter notebook is a simple code which pulls a csv file from amazon S3 and displays the data as bar graph. Below is the lambda function written to download the .ipynb file and execute the notebook with papermill. Not sure why its failing with boto3 module not found.
import json
import sys
import os
import boto3
# papermill to execute notebook
import papermill as pm
import pandas as pd
import logging
import matplotlib.pyplot as plt
sys.path.append("/opt/bin")
sys.path.append("/opt/python")
os.environ["PYTHONPATH"]='/var/task'
os.environ["PYTHONPATH"]='/opt/python/'
os.environ["MPLCONFIGDIR"] = '/tmp/'
# ipython needs a writeable directory
os.environ["IPYTHONDIR"]='/tmp/ipythondir'
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
s3 = boto3.resource('s3')
s3.meta.client.download_file('test-boto', 'testing.ipynb', '/tmp/test.ipynb')
pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
s3_client.upload_file('/tmp/juptest_output.ipynb', 'test-boto', 'temp/juptest_output.ipynb')
logger.info(event)
Error o/p:
START RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06 Version: $LATEST
[INFO] 2020-08-07T17:55:16.602Z c4da3406-c829-4f99-9fbf-b231a0d3dc06 Input Notebook: /tmp/test.ipynb
[INFO] 2020-08-07T17:55:16.603Z c4da3406-c829-4f99-9fbf-b231a0d3dc06 Output Notebook: /tmp/juptest_output.ipynb
Executing: 0%| | 0/15 [00:00<?, ?cell/s][INFO] 2020-08-07T17:55:17.311Z c4da3406-c829-4f99-9fbf-b231a0d3dc06 Executing notebook with kernel: python3
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
Executing: 7%|▋ | 1/15 [00:01<00:14, 1.06s/cell]
Executing: 7%|▋ | 1/15 [00:01<00:20, 1.46s/cell]
[ERROR] PapermillExecutionError:
---------------------------------------------------------------------------
Exception encountered at "In [1]":
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-9c332490c231> in <module>
1 import pandas as pd
2 import os
----> 3 import boto3
4 import matplotlib.pyplot as plt
5 client = boto3.client('s3')
ModuleNotFoundError: No module named 'boto3'
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 28, in lambda_handler
pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
File "/opt/python/papermill/execute.py", line 110, in execute_notebook
raise_for_execution_errors(nb, output_path)
File "/opt/python/papermill/execute.py", line 222, in raise_for_execution_errors
raise errorEND RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06
REPORT RequestId:c4da3406-c829-4f99-9fbf-b231a0d3dc06
Duration: 1624.78 ms Billed Duration: 1700 ms Memory Size: 3008 MB Max Memory Used: 293 MB
Jupyter Notebook:
import pandas as pd
import os
import boto3
import matplotlib.pyplot as plt
client = boto3.client('s3')
path = 's3://test-boto/aws-costs-Owner-Month-08.csv'
monthly_owner = pd.read_csv(path)
plt.bar(monthly_owner.Owner.head(6),monthly_owner.Amount.head(6))
plt.xlabel('Owner', fontsize=15)
plt.ylabel('Amount', fontsize=15)
plt.title('AWS Monthly Cost by Owner')
plt.show()
It looks like papermill kernel is not able to detect boto3 package even though your lambda handler is able to find it. I see you are overriding (not appending) PYTHONPATH in your lambda handler. This will remove other directories from PYTHONPATH to look for packages. Papermill child process will use this python path subsequently.
You might also find this useful. It allows you to directly deploy Jupyter Notebooks as serverless functions. It uses papermill behind the scene.
Disclaimer: I work for Clouderizer.