memoryloadbufferclickhouse

How clickhouse use memory in clusted enviroment


I have question about clickhouse memory distribution while query execution.

For example I have table TABLE_TEST that distributed across shards with rand() key sharding , that is every replica lying under shard has approxiamtely equal number of rows.

Table has two columns (col1,col2) and ORDER BY col1.

When I want to recevie whole set of data in specific order , in my case I want to receive "SELECT * FROM TABLE_TEST order by col2" , col2 is not table's "0RDER BY" column so this way clickhouse need to allocate memory on every replica node for sorting of data by col2 , every node allocate specific memory size according to size of data on this specific replica , when every replica has ordered its data set then every replica should sent its chunk of data to specific node where query come from to get final dataset ordered by col2 to return final result to user.

Is it correct logic ?.

I mean for example I have 5 shards with 1 replica lying under every shard and this every replica server has 10 GB of RAM.

To perform my query that returns dataset with specific order that is not like original ORDER BY table clause i need to allocate memory , for example in case of success execution of query it should use 50 GB of memory (just for example I found this on another server that has more then 50 GB of memory and can handle such query) but I have 5 replicas with only 10 GB on each and when clickhouse will be building final result it will fail because no one replica has 50 GB RAM to accomondate final result.

So this way we should not consider sum of all replicas memory as size of memory that can be used for handle query that require 50GB of memory, that is every node should has 50GB of memory to handle such query.

Sorry if something named wrong, but I believe main question is clear.

I'm just new in clickhouse RDBMS.

I know that I don't give any specific details but just want to know a main principle of memory distribution in such case.


Solution

  • Memory allocations for sorting on replicas, depends on will clickhouse-server push down ORDER BY clause to distributed rewrited sub queries for each replica or not. Of course, for GROUP BY clause it always pushed down and will allocate memory for <aggregatingFunction>State.

    Anyway, on the initiator node you need memory to full sort data which came from each shard, and need memory to finalize aggregations state

    You can use much slower disk-based merge sorting and control it via SETTINGS max_bytes_before_external_sort=XXX, max_bytes_before_external_group_by=XXX

    look details https://clickhouse.com/docs/en/operations/settings/query-complexity#settings-max_bytes_before_external_sort

    and https://clickhouse.com/docs/en/operations/settings/query-complexity#settings-max_bytes_before_external_group_by

    You can check memory allocations via system.query_log table, look details in documentation https://clickhouse.com/docs/en/operations/system-tables/query_log look to origin_query_id and peak_memory_usage fields

    use something like that SELECT hostName(), query_id, origing_query_id, peak_memory_usage, ProfileEvents FROM clusterAllReplicas('your-cluster',system.query_log) WHERE event_date=today() AND query_id=? OR initial_query_id=? FORMAT Vertical