I have partitions in S3 in hive-style format that look like s3://prefix/year=yyyy/month=MM/day=dd. I'm trying to create a partitioned projected table in Athena with one "date" partition.
I've keep running into parsing issues or creating tables that do not recognize any partitions. Below is the table that I would expect to work.
PARTITIONED BY (`date` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://prefix'
TBLPROPERTIES (
'classification'='csv',
'has_encrypted_data'='true',
'skip.header.line.count'='1',
'projection.enabled'='true',
'projection.date.type'="date",
'projection.date.format'='yyyy/MM/dd',
'projection.date.partition.format'="\'year=\'yyyy/\'month=\'MM/\'day=\'dd",
'projection.date.range'='2000/01/01,NOW',
'projection.date.interval'='1',
'projection.date.interval.unit'='DAYS',
'storage.location.template'='s3://prefix/${date}',
'transient_lastDdlTime'='1697473018')
It seems to me that assigning a projection.date.partition.format
is the way to go, but I can't find any other examples of anyone using this.
Can anyone tell me what I'm missing here?
Late update, but this format does work. As far as I can tell, the 'projection.date.partition.format'
property just needed all single quotes rather than the doubles:
PARTITIONED BY (
`partition_date` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://prefix'
TBLPROPERTIES (
'classification'='csv',
'has_encrypted_data'='true',
'projection.enabled'='true',
'projection.partition_date.format'='\'year=\'yyyy/\'month=\'MM/\'day=\'dd',
'projection.partition_date.interval'='1',
'projection.partition_date.interval.unit'='DAYS',
'projection.partition_date.range'='NOW-3YEARS,NOW',
'projection.partition_date.type'='date',
'skip.header.line.count'='1',
'storage.location.template'='s3://prefix/${partition_date}',
'transient_lastDdlTime'='1734535763')