amazon-redshiftamazon-redshift-spectrum

Is there a way to get list of files scanned for a RedShift query referencing tables in Spectrum?


We have a query which is performing an aggregation, like: SELECT t.date, COUNT(*) AS rec_count FROM our_schema.log_data t WHERE t.date BETWEEN '2011-01-01' AND '2012-01-01' GROUP BY t.date;

I know we can get the quantity of data scanned by this query, but I would like to get the list of DISTINCT files that this query has accessed in S3.

I've been digging around for a bit, but can't find a system view that would give that information, and I'm not sure how we could use $path with the query to get the info.

Thoughts?


Solution

  • STL_S3LIST should have what you need (per comment exchange).