I have a table with more than 500 million rows. When I perform aggregation like sum() or avg() on these rows, it's very fast. But when I use uniqExact() function on a columns with higher cardinality, it is very slow. How to speed up the performance of uniqExact function? I don't see blogs related to this.
I tried using uniq() function as suggested in the docs, but it didn't make much difference.
Create a materialized view for uniq
(avoid uniqExact
- which will never be fast). Something like:
CREATE TABLE dest_table
(
key <data_type>,
uniq_column_count AggregateFunction(uniq, <data_type_of_column>)
)
ENGINE = AggregatingMergeTree
PRIMARY KEY key;
CREATE MATERIALIZED VIEW mv
TO dest_table
AS
SELECT
key,
uniqState(column) AS uniq_column_count
FROM source_table
GROUP BY key;
You will need to backfill the MV with the existing data in the source table, so a one-time running of the following query:
INSERT INTO dest_table
SELECT
key,
uniqState(column) AS uniq_column_count
FROM source_table
GROUP BY key;
And of course you will need uniqMerge
to read the results:
SELECT
key,
uniqMerge(uniq_column_count) AS uniqCount
FROM dest_table
GROUP BY key;
Your source table now has a MV that keeps a "running" unique count of the values of whatever column you're counting. When you insert new rows into the source table, the dest_table
will get updated.