I'm learning PostgreSQL Clustering abilities and I would like to compare performance of the same query with table not clustered and with table clustered.
I tried to generate 25 million user events and run query before clustering and after.
Yet, running EXPLAIN ANALYSE
doesn't give monotonic results in each case, and it's hard to compare the values. I mean, running it before clusterization results in query time ~100-200ms, and after clusterization results in somewhat similar, though I see that Heap Fetches: 0
in that case.
My question is how do I benchmark query before and query after to analyze it? Are there any tools available that allow to do it? Maybe I can collect stats from multiple query runs and get the visualization of percentiles in each case?
I have seen that it's possible to collect the sum of values and to compare it, but isn't it possible to get percentiles somehow? Maybe you use some data visualization tools for that?
To answer your question as it is asked: you benchmark your queries by running them. To get dependable values, you have to run them repeatedly, ideally with different constants. Running the query with different constants reduces the effects of caching (the statement becomes faster and faster, because more and more of its data are cached in RAM). One tool that you can use for that purpose is the built-in pgbench
with a custom script.
To answer the unasked question that I suspect is your actual problem: Your query is performing an index-only scan, and your attempts to improve the performance by running CLUSTER
on an underlying table failed. That is hardly surprising, since an index-only scan that doesn't perform any heap fetches is independent of the physical order of the heap table, because the heap table is not even accessed. CLUSTER
rewrites the table, but the index will look pretty much the same and won't perform any different.