arrayssql-likeclickhousebloom-filter

How to use Clickhouse Token Bloomfilters with Arrays


How can I leverage bloomfilters on array(string) in Clickhouse.

For example, I have this data structure:

create table test_schema.test_table (
partition_time            DateTime64(9
,keys                     Array(String)
) ENGINE = ReplicatedReplacingMergeTree('some/path', '{replica}')
PARTITION BY toYYYYMMDD(partition_time)
TTL toDateTime(partition_time) + toIntervalDay(1)
SETTINGS index_granularity = 8192;

and I build a token bloomfilter on keys

alter table test_schema.test_table INDEX test_bf_idx `keys` TYPE tokenbf_v1(4096, 2, 0) GRANULARITY 1;

When I use the condition has(keys,'some_key'), I can see it leveraging the bloom filter. But I could not use the bloomfilter for like queries. I have tried array filter and array join, but still it is not using bloomfilter:

where notEmpty(arrayFilter(x -> x LIKE '%some_key%', keys) )

and

select * from 
(select arrayJoin(keys) as val from schema.table)
where val like '%some_key%'

Can someone show me how I can use the token bloomfilter built on an array for LIKE statements, (i.e. use bloomfilter for like statement on an array)


Solution

  • maybe an index over arrayStringConcat ?

    create table test (
        x Int64,
        k Array(String),
        INDEX test_bf_idx arrayStringConcat(k, ',') TYPE tokenbf_v1(4096, 2, 0) GRANULARITY 1
    ) ENGINE = MergeTree
    order by x as 
    select number, ['a', 'b'] k
    from numbers(1e6);
    
    select * from test
    where hasToken( arrayStringConcat(k, ',') , 'xxx')