encodingclickhousecardinality

ClickHouse: Does it make sense to use LowCardinality fields on Uint8 used as Boolean?


LowCardinality fields in ClickHouse are an optimization where the values are dictionary-encoded for faster lookups and smaller storage. As per documentation:

The efficiency of using LowCarditality data type depends on data diversity. If a dictionary contains less than 10,000 distinct values, then ClickHouse mostly shows higher efficiency of data reading and storing. If a dictionary contains more than 100,000 distinct values, then ClickHouse can perform worse in comparison with using ordinary data types.

What about UInt8 values used as Boolean? Cardinality is 2, but with such small footprint(8), would it actually provide a benefit in queries to use it?


Solution

  • LowCardinality has sense mostly for String type.

    LowCardinality(UInt8) is always worse than UInt8.

    There are very rare cases where LowCardinality makes sense for numeric types. But I would not even test it because it wasting of time. Pointer to a LC dictionary takes (Int8-Int32) in a .bin file so it's cheaper in disk space and CPU to store numeric value itself in .bin file.