I am experiencing performance issues with an
INSERT INTO target_table
SELECT ...
FROM source_table
[WHERE ...]
[GROUP BY ...]
query in my Apache Doris cluster (version 2.0.8).
The SELECT
part of the query, when run independently, executes reasonably fast (e.g., in N seconds/minutes), but when combined with the INSERT INTO
statement, the overall execution time increases dramatically (e.g., M times slower or takes an excessively long time).
insert into ... select
is slow, you can check it according to this process:
Segment the problem by setting session var dry_run_query = true
:
a. If dry_run_query = true
becomes much faster, then the data distribution and storage nodes are slow;
b. If dry_run_query = true
is still very slow or cannot run out, then the query is slow.
With dry_run_query = true
, only the query part is run, not the data distribution and storage, so it can be used to quickly troubleshoot whether the query is slow.
Supported after 2.0.2-rc05
Note: dry_run_query
should not be set to global, and it needs to be set back after opening, otherwise the data will not be found.
If it is data distribution or storage node that is slow:
top -H
and IOutil information to determine whether it is a cpu or an IO bottleneck.For details, please refer to the Doris Slack