columnstoreclickhousehyperloglog

Is it possible in clickhouse to store a HyperLogLog / uniqState() state directly trough an insert query?


We can use an AggregatedMergeTree table engine, which can be used for a aggregating rows.

Generally in aggregated data we are not interested in storing all unique identifiers and still want to do a count distinct. Still we want to have the ability to do another aggregation to get unique count on these rows afterwards (trough grouping rows in a select query). This is where HyperLogLog comes in handy, which is implemented as the uniqState function in clickhouse.

I would like to store a hyperloglog directly trough an insert query and offer it to a clickhouse table from my client application. Is this possible?


Solution

  • So I achieved this feat using just a clickhouse query. Its working very well!

    CREATE TABLE demo_db.aggregates
    (
        name String,
        date Date,
        ids AggregateFunction(uniq, UInt8)
    ) ENGINE = MergeTree(date, date, 8192)
    
    //So here the declaration of a set of ids in the insert query will lead to a binary hash tree being stored    
    INSERT INTO aggregates SELECT
        'Demo',
        toDate('2016-12-03'),
        uniqState(arrayJoin([1, 5, 6, 7])) 
    
    SELECT
        name,
        date,
        uniqMerge(ids) //our hashtree can be grouped and give us unique count over the grouped rows
    FROM aggregates
    GROUP BY name, date