Before all thank you for your help!
I want to find out which tables in the database are most heavily used, i.e. the amount of users that query the table, the amount of times it was queried, the resources that where consumed by users per table, the total time the tables where queried, and any other useful data. For now I would limit the analysis to 9 specific tables. I'd tried using stl_scan and pg_user using the next two querys:
SELECT
s.perm_table_name AS table_name,
count(*) AS qty_query,
count(DISTINCT s.userid) AS qty_users
FROM stl_scan s
JOIN pg_user b
ON s.userid = b.usesysid
JOIN temp_mone_tables tmt
ON tmt.table_id = s.tbl AND tmt.table = s.perm_table_name
WHERE s.userid > 1
GROUP BY 1
ORDER BY 1;
SELECT
b.usename AS user_name,
count(*) AS qty_scans,
count(DISTINCT s.tbl) AS qty_tables,
count(DISTINCT trunc(starttime)) AS qty_days
FROM stl_scan s
JOIN pg_user b
ON s.userid = b.usesysid
JOIN temp_mone_tables tmt
ON tmt.table_id = s.tbl AND tmt.table = s.perm_table_name
WHERE s.userid > 1
GROUP BY 1
ORDER BY 1;
The temp_mone_tables is a temporal table that contains the id and name of the tables I'm interested.
With this queries I'm able to get some information but I need more details. Surprisingly there's not much data online about this kind of statistics.
Again thank you all beforehand!
Nice work! You are on the right track using the stl_scan
table. I'm not clear what further details you're looking for.
For detailed metrics on resource usage you may want to use the SVL_QUERY_METRICS_SUMMARY
view. Note that this data is summarized by query not table because a query is the primary way resources are utilized.
Generally, have a look at the admin queries (and views) in our Redshift Utils library on GitHub, particularly v_get_tbl_scan_frequency.sql