I have aggregating merge tree table
-- app_events.events_v2_b definition
CREATE TABLE app_events.events_v2_b on cluster 'cluster_name'
(
`event_date` Date,
`user_pseudo_id` String,
`event_params_ga_session_id` String,
`event_params_ga_session_number` String,
`min_time` AggregateFunction(min, String)
)
ENGINE = Distributed('cluster_name',
'app_events',
'events_v2_b_local',
rand());
-- app_events.events_v2_b_local definition
CREATE TABLE app_events.events_v2_b_local on cluster 'cluster_name'
(
`event_date` Date,
`user_pseudo_id` String,
`event_params_ga_session_id` String,
`event_params_ga_session_number` String,
`min_time` AggregateFunction(min,String)
)
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{layer}-{shard}/events_v2_b_local',
'{replica}')
PARTITION BY event_date
ORDER BY (event_date,
user_pseudo_id,
event_params_ga_session_id,
event_params_ga_session_number)
SETTINGS index_granularity = 8192;
When I load data to app_events.events_v2_b using select :
INSERT INTO app_events.events_v2_b
(
`event_date` ,
`user_pseudo_id` ,
`event_params_ga_session_id` ,
`event_params_ga_session_number` ,
`min_time`
)
SELECT
event_date,
user_pseudo_id,
event_params_ga_session_id,
event_params_ga_session_number,
minState(event_timestamp) AS min_time
FROM app_events.events_v2
and event_date = '2025-01-01'
GROUP BY
event_date,
user_pseudo_id,
event_params_ga_session_id,
event_params_ga_session_number
and then trying to calculate row count :
select 'events_v2_b' ,count(*) from (
SELECT event_date,
user_pseudo_id,
event_params_ga_session_id,
event_params_ga_session_number,
minMerge(min_time) as min_time
FROM app_events.events_v2_b
where event_date = '2025-01-01'
group by
event_date,
user_pseudo_id,
event_params_ga_session_id,
event_params_ga_session_number
)
I get result 100 123 rows
If I repeat proccess of loading to event_v2_b same dataset based on select (I truncate table befoore new load ) and then again select row count from event_v2_b using the same provided select mentioned above I getting 100 115 rows. Why results can be different ? Per my undestanding despite on rand() shardkey and (probably) unmerged parts I should get same results with every loading.
Issue happened because of before of insertion I had started drop partition sql command. This command didnt performed completely in time of I started insertion. While syncronization is not completed I had started insertion , when insertion have been completed a truncation command completed after the process insertion and deleted some of my inserted rows. Using following hint for drop partition process I have resolved the issue , every run of sequence drop - insert commands do the same and return same results every time.
Setting alter_sync = 1
forces the command to wait until the operation is fully complete before returning.