clickhouseminmaxsecondary-indexes

Clickhouse: how to use `Data Skipping Indexes` and `Manipulations With Data Skipping Indices` features in clickhouse?


I'm using the Data Skipping Indexes feature in clickhouse and i got confused about its usage. If i add a data skip index when i create the table like this:

CREATE TABLE MyTable
(
    ...
    INDEX index_time TimeStamp TYPE minmax GRANULARITY 1
)
ENGINE =MergeTree()
...

When i query with TimeStamp filter condition the 'index_time' works. But if i didn't add index when creating table, alternatively, i added the index with Manipulations With Data Skipping Indicesfeature like this:

ALTER TABLEE MyTable ADD INDEX index_time TimeStamp TYPE minmax GRANULARITY 1

Then the index 'index_time' doesn't work.

My database is running on production so i can't recreate the table. I have to use the second way. Can anyone explain why it does not work or i used the feature in a wrong way?


Solution

  • The reason your queries don't use the index after an ALTER TABLE ADD INDEX is because the index does not exist yet. (!)

    Any new data will be properly indexed, which is why your index works when you put it in CREATE TABLE. ClickHouse builds the index as you load data. If you created the table, ran ALTER TABLE ADD INDEX, and loaded data you would see the same behavior.

    When the data already exist, things are different. ALTER TABLE updates the metadata for the table, but at this point all your data have been written to parts in the table. ClickHouse does not rewrite parts automatically to implement new indexes. However, you should be able to force rewriting to include the index by running:

    OPTIMIZE TABLE MyTable FINAL 
    

    See the Github issue https://github.com/yandex/ClickHouse/issues/6561 referenced by Ruijang for more information.