clickhouse

Improving performance of UniqExact in ClickHouse


I have a table with more than 500 million rows. When I perform aggregation like sum() or avg() on these rows, it's very fast. But when I use uniqExact() function on a columns with higher cardinality, it is very slow. How to speed up the performance of uniqExact function? I don't see blogs related to this.

I tried using uniq() function as suggested in the docs, but it didn't make much difference.


Solution

  • Create a materialized view for uniq (avoid uniqExact - which will never be fast). Something like:

    CREATE TABLE dest_table 
    (
          key <data_type>,
          uniq_column_count AggregateFunction(uniq, <data_type_of_column>) 
    )
    ENGINE = AggregatingMergeTree 
    PRIMARY KEY key;
    
    
    CREATE MATERIALIZED VIEW mv
    TO dest_table
       AS 
       SELECT
          key,
          uniqState(column) AS uniq_column_count
       FROM source_table
       GROUP BY key;
    

    You will need to backfill the MV with the existing data in the source table, so a one-time running of the following query:

    INSERT INTO dest_table
       SELECT
          key,
          uniqState(column) AS uniq_column_count
       FROM source_table
       GROUP BY key;
    

    And of course you will need uniqMerge to read the results:

    SELECT
        key,
        uniqMerge(uniq_column_count) AS uniqCount
    FROM dest_table
    GROUP BY key;
    

    Your source table now has a MV that keeps a "running" unique count of the values of whatever column you're counting. When you insert new rows into the source table, the dest_table will get updated.