[SOLVED] Determining relations hit by a query

Determining relations hit by a query

I have a PostgreSQL query constructed by a templating mechanism. What I want to do is to determine the relations actually hit by the query when it is run and record them in a relation. So this a very rudimentary lineage problem. Simply looking at the relation names appearing in the query (or parsing the query) would not easily solve the problem as the queries are somewhat complex and the templating mechanism inserts expressions like WHERE FALSE.

I can of course do it by using EXPLAIN on the query and insert the relation names I find manually. However this has two drawbacks:

EXPLAIN actually runs the query. Unfortunately running the query takes a lot of time so it is not ideal to run the query twice, once for the result and once for the EXPLAIN.
It is manual.

After reading a few documents I found out that on can log the result of an EXPLAIN automatically to a CSV file and read it back to a relation. But, as far as I understand, this means logging everything to the CSV which is not an option for me. Also, automatic logging seems to be triggered only when the execution takes longer then a predetermined threshold and I want to do this for a few specific queries, not for all time consuming ones.

PS: This does not need to be implemented fully at database layer. For instance, once I have the result of EXPLAIN in a relation, I can parse it and extract the relations it hits at the application layer.

Solution

EXPLAIN does not execute the query.

You can run EXPLAIN (FORMAT JSON) SELECT ..., which will return the execution plan as a JSON. Simply extract all Relation Name attributes, and you have a list of the tables scanned.