hivecompressionamazon-athenaprestosnappy

Unable to use AWS Athena with JSON + Snappy


In looking at AWS Athena's supported compression documentation I can see that Snappy is supported. However, when attempting to use Snappy compression with JSON data format, I am met with a multitude of errors.

I have tried creating tables in Athena with both available SerDes:

'org.apache.hive.hcatalog.data.JsonSerDe'
'org.openx.data.jsonserde.JsonSerDe'

I have tried uncompressed JSON and compressing with GZIP. Both work fine.

I have tried creating the table with a multitude of TBLPROPERTIES and SERDEPROPERTIES, but none have helped.

'Zero Records Returned' from my query is the end result of any attempt to query SNAPPY compressed JSON.

Has anyone seen this issue and overcome it?


Solution

  • For data in CSV, TSV, and JSON, Athena determines the compression type from the file extension. If no file extension is present, Athena treats the data as uncompressed plain text. If your data is compressed, make sure the file name includes the compression extension .json.snappy