amazon-web-servicesaws-glueamazon-athenapresto

Athena Partition Projection Custom Date Format


I have partitions in S3 in hive-style format that look like s3://prefix/year=yyyy/month=MM/day=dd. I'm trying to create a partitioned projected table in Athena with one "date" partition.

I've keep running into parsing issues or creating tables that do not recognize any partitions. Below is the table that I would expect to work.

PARTITIONED BY (`date` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://prefix'
TBLPROPERTIES (
  'classification'='csv', 
  'has_encrypted_data'='true', 
  'skip.header.line.count'='1', 
  'projection.enabled'='true',
  'projection.date.type'="date",
  'projection.date.format'='yyyy/MM/dd',
  'projection.date.partition.format'="\'year=\'yyyy/\'month=\'MM/\'day=\'dd", 
  'projection.date.range'='2000/01/01,NOW',
  'projection.date.interval'='1',
  'projection.date.interval.unit'='DAYS',
  'storage.location.template'='s3://prefix/${date}', 
  'transient_lastDdlTime'='1697473018')

It seems to me that assigning a projection.date.partition.format is the way to go, but I can't find any other examples of anyone using this.

Can anyone tell me what I'm missing here?


Solution

  • Late update, but this format does work. As far as I can tell, the 'projection.date.partition.format' property just needed all single quotes rather than the doubles:

    PARTITIONED BY ( 
      `partition_date` string)
    ROW FORMAT DELIMITED 
      FIELDS TERMINATED BY ',' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      's3://prefix'
    TBLPROPERTIES (
      'classification'='csv', 
      'has_encrypted_data'='true', 
      'projection.enabled'='true', 
      'projection.partition_date.format'='\'year=\'yyyy/\'month=\'MM/\'day=\'dd', 
      'projection.partition_date.interval'='1', 
      'projection.partition_date.interval.unit'='DAYS', 
      'projection.partition_date.range'='NOW-3YEARS,NOW', 
      'projection.partition_date.type'='date', 
      'skip.header.line.count'='1', 
      'storage.location.template'='s3://prefix/${partition_date}', 
      'transient_lastDdlTime'='1734535763')