clickhouse

Clickhouse ReplacingMergeTree Not Merging Parts


I have a large ClickHouse table (mcdonalds.messages) with approximately 10 billion rows using ReplacingMergeTree engine. The table has data partitioned by month (PARTITION BY toYYYYMM(ts)) going back to December 2013.

I'm experiencing an issue where parts within partitions aren't being merged, and consequently, duplicate rows aren't being eliminated. My table structure is:

CREATE TABLE chatprogram.messages 
(
    `ts` DateTime64(3) CODEC(Delta(8), ZSTD(3)),
    `channel` LowCardinality(String),
    `author` String CODEC(ZSTD(5)),
    `content` String CODEC(ZSTD(5)),
    `author_id` Nullable(UInt64) CODEC(ZSTD(1)),
    PROJECTION channel_log_dates
    (
        SELECT
            channel,
            toDateTime(toStartOfDay(ts)) AS date
        GROUP BY
            channel,
            date
    )
)
ENGINE = ReplacingMergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY (channel, author, ts)
TTL toDateTime(ts) + toIntervalYear(4) TO VOLUME 's3_disk_cached'
SETTINGS index_granularity = 8192, storage_policy = 's3_tiered_policy'

When I try to optimize a specific partition (202201 which is about 1GB in size), it returns instantly:

OPTIMIZE TABLE chatprogram.messages PARTITION '202201' FINAL;
Ok. (returns in 0.003 sec)

But when I check system.parts, I still see 46 active parts for this partition. I've verified there are duplicate rows by using SELECT FINAL which removes them at query time, but I want the duplicates to be permanently eliminated through merges (that is my understanding, at least).

Relevant settings:

max_parts_to_merge_at_once = 100
max_bytes_to_merge_at_max_space_in_pool = 161061273600 (about 161GB)
min_bytes_for_wide_part = 10485760 (10MB)

I've confirmed there are no ongoing merges or mutations. The parts are stored on local SSD so I/O shouldn't be a limiting factor (although they WERE on S3, but I have since moved it to the local SSD through. ALTER TABLE chatprogram.messages MOVE PARTITION '202201' TO VOLUME 'default';

Why won't ClickHouse merge these parts despite the OPTIMIZE FINAL command? How can I force complete merges to eliminate duplicates without recreating my entire 10 billion row table?


Solution

  • After enabling:

    SETTINGS optimize_throw_if_noop=1
    

    The issue appears that the OPTIMIZE query was getting an exception

    DB::Exception: Cannot OPTIMIZE table: Parts have different projection sets: {} in and {} in 202201_5559717_5569657_4_6520992. (CANNOT_ASSIGN_OPTIMIZE)

    So, I ran

    ALTER TABLE chatprogram.messages MATERIALIZE PROJECTION channel_log_dates IN PARTITION '202201';

    Waited for it to complete, by monitoring the system mutations table, then running the OPTIMIZE query again. After that there were no duplicate rows in that partition.