jsonhadoophiveamazon-emremr

How do you make a HIVE table out of JSON data?


I want to create a Hive table out of some JSON data (nested) and run queries on it? Is this even possible?

I've gotten as far as uploading the JSON file to S3 and launching an EMR instance but I don't know what to type in the hive console to get the JSON file to be a Hive table?

Does anyone have some example command to get me started, I can't find anything useful with Google ...


Solution

  • You'll need to use a JSON serde in order for Hive to map your JSON to the columns in your table.

    A really good example showing you how is here:

    http://aws.amazon.com/articles/2855

    Unfortunately the JSON serde supplied doesn't handle nested JSON very well so you might need to flatten your JSON in order to use it.

    Here's an example of the correct syntax from the article:

    create external table impressions (
        requestBeginTime string, requestEndTime string, hostname string
      )
      partitioned by (
        dt string
      )
      row format 
        serde 'com.amazon.elasticmapreduce.JsonSerde'
        with serdeproperties ( 
          'paths'='requestBeginTime, requestEndTime, hostname'
        )
      location 's3://my.bucket/' ;