what is the effect of distributed_group_by_no_merge

I know that the distributed node does not combine intermediate results from shards by using distributed_group_by_no_merge.

The following SQL

select sum(xxxxx),xxxxx from (
    select sum(xxxx),xxxx 
    from (
        select count(xxx),xxx 
        from distributed_table group by xxx )  
    group by xxxx SETTINGS distributed_group_by_no_merge = 1
) group by xxxxx

I want to know that which part of sql will be sent to MergeTree node to execute by using distributed_group_by_no_merge？ is it？select count(xxx),xxx from distributed_table group by xxx ) group by xxxx SETTINGS distributed_group_by_no_merge = 1

how does the parameter of distributed_group_by_no_merge change the behavior of distributed query？which part of sql execute on MergeTree node and which part of sql execute on distributed node?

Solution

distributed_group_by_no_merge-param affects the way how the initiator-node (it is a node which runs distributed query) will form the final result of a distributed query:

either by merging aggregated intermediate states coming from shards by itself (it required copying full aggregated intermediate states from shards to initiator-node) [distributed_group_by_no_merge = 0 (default mode)]
or get already final result from shards (when each shard merges an intermediate aggregation state on its side and send to initiator-node only the final result). It provides a significant improvement in performance and resource consumption but requires the right selection of the sharding key [distributed_group_by_no_merge = 1]

I would put distributed_group_by_no_merge at the same level of subquery where defined distributed table to explicitly define your intention and avoid confusion when there are several distributed-subqueries.

Let's look at the way how to check the differences between the two modes (will use _shard_num-virtual column):

distributed_group_by_no_merge=0

SELECT
    groupUniqArray(_shard_num) AS shards,
    ..
FROM table
WHERE ..
GROUP BY ..
SETTINGS distributed_group_by_no_merge = 0

/* Aggregated states were merged into ONE result set on initiator-node.
┌─shards────┬─ ..
│ [2, 1, 3] │  ..
└───────────┴─ ..
*/

distributed_group_by_no_merge=1

SELECT
    groupUniqArray(_shard_num) AS shards,
    ..
FROM table
WHERE ..
GROUP BY ..
SETTINGS distributed_group_by_no_merge = 1

/* Get a set of final results (not aggregated states) from each shard. They should be unioned manually.
┌─shards─┬─ ..
│ [2]    │  ..
│ [1]    │  ..
│ [3]    │  ..
└────────┴─ ..
*/

See https://clickhouse.com/docs/en/operations/settings/settings#distributed_group_by_no_merge.

How to avoid merging high cardinality sub-select aggregations on distributed tables