I created table categories
:
CREATE TABLE `categories` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`type` int unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `type` (`type`)
) ENGINE=InnoDB AUTO_INCREMENT=1100001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Then I filled table with random data:
<?php
$pdo = new PDO('mysql:host=localhost;dbname=test_perf', 'login', 'password');
$pdo->beginTransaction();
for ($i = 0; $i < 1000000; $i++) {
$pdo->exec("INSERT INTO categories (name, type) VALUES ('" . uniqid() . "', 1)");
}
$pdo->commit();
Statistics for table indexes:
mysql> SHOW INDEXES FROM categories;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| categories | 0 | PRIMARY | 1 | id | A | 1097250 | NULL | NULL | | BTREE | | | YES | NULL |
| categories | 1 | type | 1 | type | A | 1 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0,01 sec)
It`s right, index type
has only one value (1).
Then I try select by type with ignore index(type)
:
mysql> select MAX(name) FROM categories ignore index(type) WHERE type=1;
+---------------+
| MAX(name) |
+---------------+
| 6769b0dfec5e7 |
+---------------+
1 row in set (0,23 sec)
mysql> explain select MAX(name) FROM categories ignore index(type) WHERE type=1;
+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | categories | NULL | ALL | NULL | NULL | NULL | NULL | 1097250 | 100.00 | Using where |
+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0,00 sec)
And without it:
mysql> select MAX(name) FROM categories WHERE type=1;
+---------------+
| MAX(name) |
+---------------+
| 6769b0dfec5e7 |
+---------------+
1 row in set (0,80 sec)
mysql> explain select MAX(name) FROM categories WHERE type=1;
+----+-------------+------------+------------+------+---------------+------+---------+-------+--------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+-------+--------+----------+-------+
| 1 | SIMPLE | categories | NULL | ref | type | type | 4 | const | 548625 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------+------+---------+-------+--------+----------+-------+
1 row in set, 1 warning (0,00 sec)
As you can see, mysql uses index type
, but it only degrades performance. At the same time, the statistics are correct.
Why mysql uses index type
?
Mysql version: 8.0.35-0ubuntu0.22.04.1.
I expecting mysql should not use this index.
Short answer for question Why mysql uses index type?
- DBMS assume that
Your explain select ... shows this.
I think for such a bad case, when there is 1 value per 1M rows, cardinality does not give a proper prediction and therefore it is simply assumed to be fifty-fifty ;)
Update2. There is possible answer to question way index
:
Each table index is queried, and the best index is used unless the optimizer believes that it is more efficient to use a table scan. At one time, a scan was used based on whether the best index spanned more than 30% of the table, but a fixed percentage no longer determines the choice between using an index or a scan.
The optimizer now is more complex and bases its estimate on additional factors such as table size, number of rows, and I/O block size.
In some cases, MySQL can read rows from the index without even consulting the data file. If all columns used from the index are numeric, only the index tree is used to resolve the query.
(https://dev.mysql.com/doc/refman/8.4/en/where-optimization.html)
Third of above cases cause using index in your query.
If you change data type column type
from int unsigned
to varchar(5)
, query not use index in both cases (with ignore index
and without hint).
See here example
Test model
Query model (with 100K rows) shows that it preferably performs an index scan if the cardinality of the type is >15.
Insert 100K rows.
INSERT INTO categories (name, type) VALUES (uuid() , 1);
set @@cte_max_recursion_depth=1000001;
INSERT INTO categories (name, type)
with recursive r as(
select 1 idx, uuid() name
,1 type
union all
select idx+1 ,uuid() name
, 1 type -- model version 1
-- , cast(rand()*15 as unsigned) type -- version 2
from r
where idx<100000
)
select name,type from r;
SHOW VARIABLES LIKE 'innodb_stats_on_metadata';
SET GLOBAL innodb_stats_on_metadata=ON;
SHOW INDEXES FROM categories;
Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
categories | 0 | PRIMARY | 1 | id | A | 1 | null | null | BTREE | YES | null | |||
categories | 1 | type | 1 | type | A | 1 | null | null | BTREE | YES | null |
SET profiling = 1;
Execute 2 queries
select MAX(name) FROM categories ignore index(type) WHERE type=1;
MAX(name) |
---|
37ee9640-c211-11ef-b3c7-d720191c40ee |
select MAX(name) FROM categories WHERE type=1;
MAX(name) |
---|
37ee9640-c211-11ef-b3c7-d720191c40ee |
SHOW PROFILES;
We can see that table scan duration is 0.04764225, index scan - 0.16639675.
Query_ID | Duration | Query |
---|---|---|
1 | 0.04764225 | select MAX(name) FROM categories ignore index(type) WHERE type=1 |
2 | 0.16639675 | select MAX(name) FROM categories WHERE type=1 |
explain
select MAX(name) FROM categories ignore index(type) WHERE type=1;
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | categories | null | ALL | null | null | null | null | 100001 | 0.00 | Using where |
explain
select MAX(name) FROM categories WHERE type=1;
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | categories | null | ref | type | type | 4 | const | 50000 | 100.00 | null |
Execution statistics
explain analyze
select MAX(name) FROM categories ignore index(type) WHERE type=1;
EXPLAIN |
---|
-> Aggregate: max(categories.`name`) (cost=10000 rows=1) (actual time=59.1..59.1 rows=1 loops=1) -> Filter: (categories.`type` = 1) (cost=10000 rows=1) (actual time=0.0541..38.3 rows=100001 loops=1) -> Table scan on categories (cost=10000 rows=100001) (actual time=0.0524..30.8 rows=100001 loops=1) |
For index scan expected rows=50K, fact rows=100K.
explain analyze
select MAX(name) FROM categories WHERE type=1;
EXPLAIN |
---|
-> Aggregate: max(categories.`name`) (cost=10001 rows=1) (actual time=142..142 rows=1 loops=1) -> Index lookup on categories using type (type=1) (cost=5001 rows=50000) (actual time=0.136..122 rows=100001 loops=1) |
Exeqution time statistics
type Cardinality |
without index | with index |
---|---|---|
1 | 0.04764225 | 0.16639675 |
5 | 0.03393750 | 0.05473025 |
10 | 0.03168700 | 0.04383450 |
15 | 0.03087850 | 0.02723575 |
50 | 0.03003600 | 0.00542825 |
500 | 0.03531725 | 0.00101600 |
10000 | 0.02948400 | 0.00046225 |
Update1. For interest
With create index ix_type_name on categories(type,name);
Execution time
Query_ID | Duration | Query |
---|---|---|
1 | 0.00073550 | select MAX(name) FROM categories ignore index(type) WHERE type=1 |
2 | 0.00020950 | select MAX(name) FROM categories WHERE type=1 |
Execution plans for both queries
EXPLAIN |
---|
-> Rows fetched before execution (cost=0..0 rows=1) (actual time=165e-6..230e-6 rows=1 loops=1) |
-> Rows fetched before execution (cost=0..0 rows=1) (actual time=126e-6..196e-6 rows=1 loops=1) |