hadoopstructhiveamazon-athenahiveddl

How to create an array of struct in aws athena - hive on parquet data


I tried creating an table on aws-athena with hive on parquet data with following :

CREATE TABLE IF NOT EXISTS db.test (
  country STRING ,
  day_part STRING ,
  dma STRING ,
  first_seen STRING, 
  geohash STRING ,
  last_seen STRING, 
  location_backfill ARRAY <
   element STRUCT <
    backfill_type: BIGINT, 
    brq: BIGINT ,
    first_seen: STRING, 
    last_seen: STRING ,
    num_days: BIGINT >>
  ) 
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://<location>'
TBLPROPERTIES (
  'parquet.compress'='SNAPPY', 
  'transient_lastDdlTime'='<sometime>')

i repeatedly get the error

line 9:12: mismatched input 'struct' expecting {'(', 'array', '>'} (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: )

The syntax seems fine and not sure. The data is stored in s3 path any idea what may be causing this problem?


Solution

  • Array elements are not named, specify only type (struct):

    location_backfill ARRAY <
        STRUCT <
        backfill_type: BIGINT, 
        brq: BIGINT ,
        first_seen: STRING, 
        last_seen: STRING ,
        num_days: BIGINT >>