databasesql-insertolapapache-doris

Apache Doris INSERT INTO ... SELECT ... query is unexpectedly slow - how to troubleshoot?


I am experiencing performance issues with an INSERT INTO target_table SELECT ... FROM source_table [WHERE ...] [GROUP BY ...] query in my Apache Doris cluster (version 2.0.8). The SELECT part of the query, when run independently, executes reasonably fast (e.g., in N seconds/minutes), but when combined with the INSERT INTO statement, the overall execution time increases dramatically (e.g., M times slower or takes an excessively long time).


Solution

  • insert into select is slow, you can check it according to this process:

    1. Segment the problem by setting session var dry_run_query = true:

    a. If dry_run_query = true becomes much faster, then the data distribution and storage nodes are slow;

    b. If dry_run_query = true is still very slow or cannot run out, then the query is slow.

    -- dry_run_query = true, only the query part is run, not the data distribution and storage, so it can be used to quickly troubleshoot whether the query is slow.

    -- Supported after 2.0.2-rc05

    Note: dry_run_query should not be set to global, and it needs to be set back after opening, otherwise the data will not be found.

    2. If it is data distribution or storage node is slow

    You can use top -H and IOutil information to determine whether it is a cpu or an IO bottleneck.

    For details, please refer to the Doris Slack