phpamazon-web-servicesamazon-s3aws-sdkamazon-s3-select

Missing Headers / Invalid Column Index errors with AWS S3 Select


I am trying to make a select request from data stored in CSV format with AWS S3 SelectObjectContent and an SQL expression. I am working with Laravel 6.

When I request all the data from the object ('Expression' => 'SELECT * from S3Object'), everything works fine. I retrieve the data and can work with it. The problem occurs when I want to use a WHERE-clause in the SQL expression.

This is my code:

$client = new S3Client([
            'region' => 'us-east-2',
            'version' => 'latest',
        ]);


$results = $client->selectObjectContent([
            'Bucket' => 'mybucketname', 
            'Key' => 'my_data_file.csv', 
            'ExpressionType' => 'SQL',
            'Expression' => 'SELECT * FROM S3Object s WHERE s.continent = "Europe"',
            'InputSerialization' => [
                'CSV' => [
                    'FileHeaderInfo' => 'USE', 
                    'RecordDelimiter' => "\n", 
                    'FieldDelimiter' => ',',
                ],
            ], 
            'OutputSerialization' => [
                'CSV' => [
                    'QuoteFields' => 'ASNEEDED',
                    'RecordDelimiter' => ",",
                ],
            ],
        ]);

I get the following error:

Error executing "SelectObjectContent"

<Error><Code>MissingHeaders</Code><Message>Some headers in the query are missing (truncated...) MissingHeaders (client): Some headers in the query are missing from the file. Please check the file and try again.

When I check the file (downloading it directly from the S3 bucket), in the first row I find the headers comma-separated, as I expected: continent,country,user_name,created_at

I also tried using positional headers and changed the expression to 'Expression' => 'SELECT * FROM S3Object s WHERE s._1 = "Europe"' and 'FileHeaderInfo' => 'IGNORE', but that gave me another error:

Error executing "SelectObjectContent"

<Error><Code>InvalidColumnIndex</Code><Message>The column index at line 1, column (truncated...) InvalidColumnIndex (client): The column index at line 1, column 39 is invalid. Please check the service documentation and try again.

I am clueless how to proceed. I already saw this post, but it didn't help. Maybe someone has a clue? Feel free to tell me what further information you could need for helping me solve this issue!

Thanks in advance!

EDIT

I tried querying the csv file via the AWS console and everything works fine: The SQL expression in which I use the file header info as well as the one where I use the positional headers instead. Transferring the SQL expression in my controller I get the same errors I mentioned above:

Missing Headers for the query 'SELECT * FROM S3Object WHERE continent = "Europe"' (and 'FileHeaderInfo' => 'USE')

Invalid Column Index for the query 'SELECT * FROM S3Object s WHERE s_1 = "Europe"' (and 'FileHeaderInfo' => 'NONE')


Solution

  • I just solved it. The problem was that apparently only single quotation marks are allowed in the WHERE clause of the SQL expression.

    So below you find all expressions that worked for me:

    1. With 'FileHeaderInfo' => 'USE'

    'Expression' => "SELECT * FROM S3object WHERE continent = 'Europe'"
    

    2. With 'FileHeaderInfo' => 'NONE' (positional headers & S3 object alias)

    'Expression' => "SELECT * FROM S3object s WHERE s._1 = 'Europe'"
    

    3. With 'FileHeaderInfo' => 'NONE' (positional header without S3 object alias)

    'Expression' => "SELECT * FROM S3object WHERE _1 = 'Europe'"
    

    NOTE: The important thing is that you wrap the actual query in double quotation marks so that you can put the string you want to query with the WHERE clause in single ones.

    Also I came across this site from the AWS docs which explains the S3 SELECT command in more detail!

    Hope that helps some!