amazon-s3amazon-athenaprestotrino

In SQL Athena, how can I get a CSV's creation date and time?


Based on the creation date and time of each CSV file, I want to build a table showing how fresh the data is. In SQL Athena, how can I get a CSV's creation date and time?


Solution

  • You can try using the "hidden" $file_modified_time metadata column:

    select "$file_modified_time" 
    from your_table;
    

    Presto/Trino supports it in several connectors, like the Hive one:

    Metadata columns

    In addition to the defined columns, the Hive connector automatically exposes metadata in a number of hidden columns in each table:

    • $bucket: Bucket number for this row
    • $path: Full file system path name of the file for this row
    • $file_modified_time: Date and time of the last modification of the file for this row
    • $file_size: Size of the file for this row
    • $partition: Partition name for this row

    And at least some of this metadata columns are mentioned in AWS docs (like Getting the file locations for source data in Amazon S3 or Considerations and limitations for SQL queries in Amazon Athena)