[SOLVED] Clickhouse table data deleting

Clickhouse table data deleting

I have clustered clickhouse instance and observing following behavior: when I delete data on every node of clickhouse using for example alter table db.tb on cluster cl1 delete where event_date = 20231212 , when I run this script a delete statement is performed very fast but when I select data from table with event_date = 20231212 I see that data is still in table (despite it is slowly reduced when i repeat check of row count right after previous check). Right after deletion I need to load data with the same event_date = 20231212 but I don't see any mechanism to avoid me from data that is being deleted but that not deleted yet so this can end with mixed data from deleted/inserted transactions. Is there to avoid such potential problem ?

Solution

Assuming you are using the *MergeTree table engine, Deletions and Updates (aka mutations) in ClickHouse are performance asynchronously in the background upon merging [https://clickhouse.com/docs/en/sql-reference/statements/alter#mutations]

If you expect the ALTER TABLE...DELETE to occur immediately, then using the mutations_sync option would probably be recommended [https://clickhouse.com/docs/en/operations/settings/settings#mutations_sync]

Use the SETTINGS mutations_sync = 1; or SETTINGS mutation_sync = 2; (depending on single replica or multiple replicas) option with your DELETE statement to ensure that deletions are being performed synchronously

If running asynchronous mutations, you can also check the system.mutations table to determine if the mutations are complete:

SELECT mutation_id,*
FROM clusterAllReplicas('default',system.mutations)
WHERE is_done = 0;