amazon-s3external-tablesstarburst

Starburst unmanaged tables


I'm building a database on Starburst using unmanaged (or external) tables that read CSV files stored on S3.

The file path structure will be as follows:

bucket_name/file_name/year/month/day

The CSV files are loaded weekly, and my tables need to read only the most recent file, i.e., the one with the latest date in the path.

Is there a way to define a parameterized external location in the table definition so that only the latest file is read?

For example, something like:

bucket_name/file_name/<year>/<month>/<day>

Solution

  • No, that's not possible, but since you are also using year/month/day as what looks to be partitioning you can put the partition_by definition in the create table statement then you could add the dates you are searching for in your where clause and thereby limiting what is being read.