I have some PostgreSQL tables with columns of type: timestamp without time zone interval text numeric
For example assume a table like:
CREATE TABLE IF NOT EXISTS table_foo(
"timestamp" timestamp without time zone NOT NULL,
granularity interval NOT NULL,
dimension_1 TEXT NOT NULL,
dimension_2 TEXT NOT NULL,
CONSTRAINT table_foo_pk PRIMARY KEY ("timestamp", granularity, dimension_1, dimension_2),
data_volume_down NUMERIC NOT NULL default 0,
data_volume_up NUMERIC NOT NULL default 0
)
and the type of queries that want to perform is the following:
Select
timestamp,
granularity,
dimension_1,
dimension_2,
SUM(data_volume_down+data_volume_up) as total_volume
From table_foo
WHERE timestamp > '2024-09-30 00:00'
AND granularity = '1 day'
GROUP BY timestamp, granularity, dimension_1, dimension_2
ORDER BY total_volume
LIMIT 10;
I read in the manual the following:
It might be useful to do that for columns that are never used as part of the WHERE, GROUP BY, or ORDER BY clauses of queries, since the planner will have no use for statistics on such columns.
Is it safe to set statistics equal to 0 for numeric columns when a value that is computed from them is used in "order by" clause?
If that's the only statement that uses the columns data_volume_down
and data_volume_up
, you might as well disable statistics for these columns.
However, that will only speed up ANALYZE
and will probably not affect the performance of the query. It is probably a micro-optimization that is not worth the effort.
You would probably see a bigger performance gain if you define these columns as double precision
rather than numeric
.