I am using s3 select query along with where clause to retrieve data from s3. The query is working fine and returning the expected result when there's no where clause. Although when I am using where clause, the filtered data is correct, but the key in the object is the first row after the header and not the header.
Example : csv file
A B C
1 2 3
1 5 6
Query : select * from s3object s where s._1 = '1' limit 100
Expected Output : [{A : 1, B:2, C:3}, {A:1, B:5, C:6}]
Actual Output : [{1:1, 2:5, 3:6}]
This is the params object I am using to query :
let params = {
Bucket: S3_BUCKET,
Key: S3_PATH,
Expression: "select * from s3object s where s._1 = '1' limit 100"
ExpressionType: "SQL",
InputSerialization: {
CSV: {
FileHeaderInfo: "NONE",
RecordDelimiter: "\n",
FieldDelimiter: ","
}
},
OutputSerialization: {
CSV: {}
}
};
I get the same output even when I use FileHeaderInfo : "USE"
, and change the query to select * from s3object s where id = '22' and s.date > '2020-05-01' limit 100
AWS Doc : https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html
So it seems, while fetching the query results from s3, it is impossible to get the headers as well. We can query with headerNames, or with columnNumber, but if we use the where clause, then we should use headerNames, and in that case, the header row doesn't come in the results.
So, I have now hardcoded the headers in my api call from where I am calling s3 select query, and appending those in the results.