We can use an AggregatedMergeTree table engine, which can be used for a aggregating rows.
Generally in aggregated data we are not interested in storing all unique identifiers and still want to do a count distinct. Still we want to have the ability to do another aggregation to get unique count on these rows afterwards (trough grouping rows in a select query). This is where HyperLogLog comes in handy, which is implemented as the uniqState function in clickhouse.
I would like to store a hyperloglog directly trough an insert query and offer it to a clickhouse table from my client application. Is this possible?
So I achieved this feat using just a clickhouse query. Its working very well!
CREATE TABLE demo_db.aggregates
(
name String,
date Date,
ids AggregateFunction(uniq, UInt8)
) ENGINE = MergeTree(date, date, 8192)
//So here the declaration of a set of ids in the insert query will lead to a binary hash tree being stored
INSERT INTO aggregates SELECT
'Demo',
toDate('2016-12-03'),
uniqState(arrayJoin([1, 5, 6, 7]))
SELECT
name,
date,
uniqMerge(ids) //our hashtree can be grouped and give us unique count over the grouped rows
FROM aggregates
GROUP BY name, date