I have a PostgreSQL query constructed by a templating mechanism. What I want to do is to determine the relations actually hit by the query when it is run and record them in a relation. So this a very rudimentary lineage problem. Simply looking at the relation names appearing in the query (or parsing the query) would not easily solve the problem as the queries are somewhat complex and the templating mechanism inserts expressions like WHERE FALSE
.
I can of course do it by using EXPLAIN
on the query and insert the relation names I find manually. However this has two drawbacks:
EXPLAIN
actually runs the query. Unfortunately running the query takes a lot of time so it is not ideal to run the query twice, once for the result and once for the EXPLAIN
.After reading a few documents I found out that on can log the result of an EXPLAIN
automatically to a CSV file and read it back to a relation. But, as far as I understand, this means logging everything to the CSV which is not an option for me. Also, automatic logging seems to be triggered only when the execution takes longer then a predetermined threshold and I want to do this for a few specific queries, not for all time consuming ones.
PS: This does not need to be implemented fully at database layer. For instance, once I have the result of EXPLAIN
in a relation, I can parse it and extract the relations it hits at the application layer.
EXPLAIN
does not execute the query.
You can run EXPLAIN (FORMAT JSON) SELECT ...
, which will return the execution plan as a JSON. Simply extract all Relation Name
attributes, and you have a list of the tables scanned.