mysqlindexingcomposite-index

Confused about mysql compsite index and leftmost prefix matching


My MySQL version is 5.7 and I created a test table with 1,332,660 records:

CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` varchar(500) DEFAULT NULL,
`data_time` varchar(100) DEFAULT NULL,
`data_value` decimal(50,8) DEFAULT NULL,
`data_code` varchar(100) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_name_time_value` (`data_name`,`data_time`,`data_value`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

there is an index bydata_name,data_time,data_value. I have three SQL and there results :

explain select  *  from test    where data_name ='abc' and  data_time = '2022-06-15 00:00:00' and data_value=75.1

enter image description here

explain select  *  from test    where   data_time = '2022-06-15 00:00:00' and data_value=75.1

enter image description here

explain select  data_name  from test    where   data_time = '2022-06-15 00:00:00' and data_value=75.1

enter image description here

According to the leftmost prefix matching the second sql does not use index, but the third one actually used an index.

My question is why can the thired sql use index, and why does a full table scan occur even when indexes are used (The seconde one's Explain result rows is equal to the thired one's rows).

Can someone help me ? Thans a lot!!!!!!


Solution

  • Query 3 -- Why it uses the index but is slow...

    select  data_name
        from  test
        where  data_time = '2022-06-15 00:00:00'
          and  data_value=75.1
    

    needs 3 columns, and all are in a single index. So using that index is probably faster than scanning the table. This is called a "covering index". It is indicated in EXPLAIN with Using index. "Leftmost" is not relevant; "covering" is.

    Still, it will have to scan the entire index, as indicated by "Rows" being approximately the number of rows in the table.

    Your datetime string looks like MySQL's datetime; it would be better to declare the column DATETIME instead of VARCHAR.

    Query 1 is likely to be fast since you are testing all 3 columns of the index with =.

    Query 2 is likely to be very slow since the columns in the WHERE are not "leftmost".

    select  *
        from  test
        where  data_time = '2022-06-15 00:00:00'
          and  data_value=75.1
    

    would benefit either of these:

    INDEX(data_time, data_value)
    INDEX(data_value, data_time)
    

    Query 4:

        where  data_name ='abc'
          and  data_time = '2022-06-15 00:00:00'
          and  data_value >= 75
          and  data_value <  76
    

    This will be fast because "leftmost" lets the index be used.

    Query 5:

        where  data_name ='abc'
          and  data_time >= '2022-06-15'
          and  data_value = 75.1
    

    This will be only somewhat fast because "leftmost" will stop with the data_time. That is, it stopped after the first "range" test. Adding this would make query 5 fast and fully use the index:

       INDEX(data_name, data_value,   -- tested with "="
             data_time)               -- tested with "range"
    

    Note that the order of columns in the index important; the order of tests in WHERE is not.