amazon-web-servicesamazon-s3amazon-s3-select

S3 Select CSV Headers


I am using S3 Select to read csv file from S3 Bucket and outputting as CSV. In the output I only see rows, but not headers. How do I get output with headers included.

import boto3

s3 = boto3.client('s3')

r = s3.select_object_content(
        Bucket='demo_bucket',
        Key='demo.csv',
        ExpressionType='SQL',
        Expression="select * from s3object s",
        InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
        OutputSerialization={'CSV': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)

CSV

Name, Age, Status
Rob, 25, Single
Sam, 26, Married

Output from s3select

Rob, 25, Single
Sam, 26, Married

Solution

  • Change InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},

    TO InputSerialization={'CSV': {"FileHeaderInfo": "NONE"}},

    Then, it will print full content, including the header.

    Explanation:

    FileHeaderInfo accepts one of "NONE|USE|IGNORE".

    Use NONE option rather then USE, it will then print header as well, as NONE tells that you need header as well for processing.

    Here is reference. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.select_object_content

    I hope it helps.