mysqltime-complexityb-tree

What is the time complexity of MySql SELECT to find an interval that contains a number?


I am wondering if the following MySql SELECT query takes O(N) or O(logN).

Let us we have a table represents 4 integer intervals [startNum, endNum]. And the table is indexed by the startNum and endNum columns.

startNum, endNum
3, 8
10, 15
16, 21
28, 42

Query:

SELECT * from table
where startNum <= 19 AND endNum >= 19

I think MySql will take O(N), because it will

 1. find the first 3 rows using the "startNum"; then 
 2. go through each of them and use the "endNum" to identify the 3rd row; then 
 3. return the 3rd row [16, 21] as the result.

Is MySql "smart" enough to do the following?

1. binary search on the startNum to find the position of the 3rd row, since "startNum" is sorted; then
2. binary search on the endNum to find the 3rd row again, since "endNum" is also sorted; then
3. return the 3rd row [16, 21] as the result.

From this documentation: https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html

If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts.

I don't think MySql is doing the "smart" binary searches.

Am I right? Is there any configuration that can enable MySql do the binary searches?


Solution

  • You are correct. It's O(n). If you have indexes on both startNum and endNum, the query planner will choose one index. Based on table statistics it will try to choose the index that is more selective.

    Then it will random-access that index to the first eligible row and proceed to scan the rest of the table to satisfy the other inequality predicate. This is in the nature of BTREE indexing. The situation is the same in every make of table server that uses BTREE indexing, not just MySql / Mariadb.

    If your indexes are compound indexes, like this

    ALTER TABLE `table`
      ADD INDEX start_end (startNum, endNum),
      ADD INDEX end_start (endNum, startNum);
    

    the query planner will probably choose to scan the index instead of the whole table. That's usually faster, but still O(n).

    Keep in mind that it's an antipattern to use SELECT * in performance critical queries unless you know for certain you need every column in the table.