mysqlindexing

Why mysql uses index with cardinality=1?


I created table categories:

 CREATE TABLE `categories` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `type` int unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `type` (`type`)
) ENGINE=InnoDB AUTO_INCREMENT=1100001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Then I filled table with random data:

<?php

$pdo = new PDO('mysql:host=localhost;dbname=test_perf', 'login', 'password');

$pdo->beginTransaction();
for ($i = 0; $i < 1000000; $i++) {
    $pdo->exec("INSERT INTO categories (name, type) VALUES ('" . uniqid() . "', 1)");
}
$pdo->commit();

Statistics for table indexes:

mysql> SHOW INDEXES FROM categories;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table      | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| categories |          0 | PRIMARY  |            1 | id          | A         |     1097250 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| categories |          1 | type     |            1 | type        | A         |           1 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0,01 sec)

It`s right, index type has only one value (1).

Then I try select by type with ignore index(type):

mysql> select MAX(name) FROM categories ignore index(type) WHERE type=1;
+---------------+
| MAX(name)     |
+---------------+
| 6769b0dfec5e7 |
+---------------+
1 row in set (0,23 sec)

mysql> explain select MAX(name) FROM categories ignore index(type) WHERE type=1;
+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | categories | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1097250 |   100.00 | Using where |
+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0,00 sec)

And without it:

mysql> select MAX(name) FROM categories WHERE type=1;
+---------------+
| MAX(name)     |
+---------------+
| 6769b0dfec5e7 |
+---------------+
1 row in set (0,80 sec)

mysql> explain select MAX(name) FROM categories WHERE type=1;
+----+-------------+------------+------------+------+---------------+------+---------+-------+--------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+-------+--------+----------+-------+
|  1 | SIMPLE      | categories | NULL       | ref  | type          | type | 4       | const | 548625 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+------+---------+-------+--------+----------+-------+
1 row in set, 1 warning (0,00 sec)

As you can see, mysql uses index type, but it only degrades performance. At the same time, the statistics are correct.

Why mysql uses index type? Mysql version: 8.0.35-0ubuntu0.22.04.1.

I expecting mysql should not use this index.


Solution

  • Short answer for question Why mysql uses index type? - DBMS assume that

    Your explain select ... shows this.

    I think for such a bad case, when there is 1 value per 1M rows, cardinality does not give a proper prediction and therefore it is simply assumed to be fifty-fifty ;)

    Update2. There is possible answer to question way index:

    (https://dev.mysql.com/doc/refman/8.4/en/where-optimization.html)

    Third of above cases cause using index in your query.

    If you change data type column type from int unsigned to varchar(5), query not use index in both cases (with ignore index and without hint).
    See here example

    Test model
    Query model (with 100K rows) shows that it preferably performs an index scan if the cardinality of the type is >15.

    Insert 100K rows.

    INSERT INTO categories (name, type) VALUES (uuid() , 1);
    set @@cte_max_recursion_depth=1000001;
    INSERT INTO categories (name, type) 
    with recursive r as(
      select 1 idx, uuid() name
        ,1 type
      union all
      select idx+1 ,uuid() name
        , 1 type                             -- model version 1
      -- , cast(rand()*15 as unsigned) type   -- version 2
      from r
      where idx<100000
    )
    select name,type from r;
    
    SHOW VARIABLES LIKE 'innodb_stats_on_metadata';
    SET GLOBAL innodb_stats_on_metadata=ON;
    SHOW INDEXES FROM categories;
    
    Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment Visible Expression
    categories 0 PRIMARY 1 id A 1 null null BTREE YES null
    categories 1 type 1 type A 1 null null BTREE YES null
    SET profiling = 1;
    

    Execute 2 queries

    select MAX(name) FROM categories ignore index(type) WHERE type=1;
    
    MAX(name)
    37ee9640-c211-11ef-b3c7-d720191c40ee
    select MAX(name) FROM categories WHERE type=1;
    
    MAX(name)
    37ee9640-c211-11ef-b3c7-d720191c40ee
    SHOW PROFILES;
    

    We can see that table scan duration is 0.04764225, index scan - 0.16639675.

    Query_ID Duration Query
    1 0.04764225 select MAX(name) FROM categories ignore index(type) WHERE type=1
    2 0.16639675 select MAX(name) FROM categories WHERE type=1
    explain 
    select MAX(name) FROM categories ignore index(type) WHERE type=1;
    
    id select_type table partitions type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE categories null ALL null null null null 100001 0.00 Using where
    explain 
    select MAX(name) FROM categories WHERE type=1;
    
    id select_type table partitions type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE categories null ref type type 4 const 50000 100.00 null

    Execution statistics

    explain analyze 
    select MAX(name) FROM categories ignore index(type) WHERE type=1;
    
    EXPLAIN
    -> Aggregate: max(categories.`name`) (cost=10000 rows=1) (actual time=59.1..59.1 rows=1 loops=1)
        -> Filter: (categories.`type` = 1) (cost=10000 rows=1) (actual time=0.0541..38.3 rows=100001 loops=1)
            -> Table scan on categories (cost=10000 rows=100001) (actual time=0.0524..30.8 rows=100001 loops=1)

    For index scan expected rows=50K, fact rows=100K.

    explain analyze 
    select MAX(name) FROM categories WHERE type=1;
    
    EXPLAIN
    -> Aggregate: max(categories.`name`) (cost=10001 rows=1) (actual time=142..142 rows=1 loops=1)
        -> Index lookup on categories using type (type=1) (cost=5001 rows=50000) (actual time=0.136..122 rows=100001 loops=1)

    Exeqution time statistics

    type Cardinality without index with index
    1 0.04764225 0.16639675
    5 0.03393750 0.05473025
    10 0.03168700 0.04383450
    15 0.03087850 0.02723575
    50 0.03003600 0.00542825
    500 0.03531725 0.00101600
    10000 0.02948400 0.00046225

    fiddle

    Update1. For interest
    With create index ix_type_name on categories(type,name);

    Execution time

    Query_ID Duration Query
    1 0.00073550 select MAX(name) FROM categories ignore index(type) WHERE type=1
    2 0.00020950 select MAX(name) FROM categories WHERE type=1

    Execution plans for both queries

    EXPLAIN
    -> Rows fetched before execution (cost=0..0 rows=1) (actual time=165e-6..230e-6 rows=1 loops=1)
    -> Rows fetched before execution (cost=0..0 rows=1) (actual time=126e-6..196e-6 rows=1 loops=1)

    fiddle