jsonamazon-athenasnappyamazon-s3-select

how to read json.snappy file from athena


I have input file in s3 bucket with .json.snappy compression and I am trying to read through athena table. I tried using different serde 'org.apache.hive.hcatalog.data.JsonSerDe' & 'org.openx.data.jsonserde.JsonSerDe' but it didn't work, Athena table showing zero records when queried.

Also i tried to see sample data from s3 select but it didn't work. Is there a way to read this data.


Solution

  • After debugging for few hours I found input file was compressed with raw snappy format and then encoded with base64. Hence, Athena was not able to read this file. So as a workaround solution, I decoded and uncompressed file using python-snappy library and uploaded to s3 bucket.

    import snappy
    import base64
    import os
    snappy_file = 'input.json.snappy'
    with open (snappy_file, "r") as input_file:
        data = input_file.read()
        # decoding base64 data
        data = base64.b64decode(data)
        # uncompress raw-snappy 
        uncompressed = snappy.uncompress(data)
        print(uncompressed.decode('utf-8'))