pandasamazon-web-servicesamazon-s3aws-lambda

Reading xlsx file into lambda function AWS


Trying to read an xlsx file from an S3 bucket. Lambda function is called but I get the following 'seek' error

[ERROR] UnsupportedOperation: seek
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 19, in lambda_handler
    df = pd.read_excel(xl_file\['Body'\], dtype_backend='pyarrow', header=\[1,2\])
File "/opt/python/pandas/io/excel/\_base.py", line 495, in read_excel
    io = ExcelFile(
File "/opt/python/pandas/io/excel/\_base.py", line 1550, in __init__
    ext = inspect_excel_format(
File "/opt/python/pandas/io/excel/\_base.py", line 1406, in inspect_excel_format
    stream.seek(0)

...and here is my lambda function (the part that is relevant). Header is spread across rows 1 and 2. Code runs fine in it's local form (i.e. when pulling xlsx file from a local directory)

import json
import boto3
import io
import csv
import pandas as pd

s3Client = boto3.client('s3')

def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    xl_file = s3Client.get_object(Bucket = bucket, Key = key)
    df = pd.read_excel(xl_file['Body'], dtype_backend='pyarrow', header=[1,2])

Solution

  • Working code as follows

    import boto3
    import io
    import csv
    import pandas as pd
    from io import BytesIO
    
    s3Client = boto3.client('s3')
    
    def lambda_handler(event, context):
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = event['Records'][0]['s3']['object']['key']
    
        xl_file = s3Client.get_object(Bucket = bucket, Key = key)
        xl_file_body = io.BytesIO(xl_file['Body'].read()) 
        df = pd.read_excel(xl_file_body, dtype_backend='pyarrow', header=[1,2])