amazon-web-servicesamazon-s3aws-glueamazon-athenaaws-glue-data-catalog

How to create Athena tables for dynamic S3 paths using AWS Crawler?


Below are given my S3 paths under which multiple folders are present. Each folder contains a CSV file each with a different schema.

The values within the curly braces {} will be dynamic.

s3://test_bucket/{val1}/data/{val2}/input/latest/

s3://test_bucket/{val1}/data/{val2}/input/archived/timestamp={val3}/

I want to create the Athena tables using AWS Glue Crawler. We can have a separate database for input_data both for current and archive.

The tables formed should be such that it's partitioned over val1 and val2 both for the current and archive. And, an additional partition should be present in the table, that is, val3, in the case of the archived.

Kindly help me with any approach I can take to set the configuration for creating tables dynamically. I would really appreciate your time. Please let me know in case more information is needed.


Solution

  • My comment, use the api to create the crawlers with the specific s3 paths to read, and the database name to write.