mysqldateindexing

Index not hit in mysql


When I select all columns, the indexing does not work, but if I select just a few columns, it does.

mysql> explain SELECT  *  FROM fatura  USE INDEX (datapago_serv_pago)   WHERE id_servidor = 10 AND pago = '1' AND data_pago <= '2021-05-05';
+----+-------------+--------+------------+------+--------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys      | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+--------+------------+------+--------------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | fatura | NULL       | ALL  | datapago_serv_pago | NULL | NULL    | NULL | 10199216 |     0.00 | Using where |
+----+-------------+--------+------------+------+--------------------+------+---------+------+----------+----------+-------------+

mysql> explain SELECT  uid_  FROM fatura  USE INDEX (datapago_serv_pago)   WHERE id_servidor = 10 AND pago = '1' AND data_pago <= '2021-05-05';
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+---------+----------+--------------------------+
| id | select_type | table  | partitions | type  | possible_keys      | key                | key_len | ref  | rows    | filtered | Extra                    |
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+---------+----------+--------------------------+
|  1 | SIMPLE      | fatura | NULL       | range | datapago_serv_pago | datapago_serv_pago | 5       | NULL | 5099608 |     0.01 | Using where; Using index |
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+---------+----------+--------------------------+
 CREATE TABLE `fatura` (
  `uid_` bigint unsigned NOT NULL,
  `uid_cliente` bigint unsigned NOT NULL,
  `uid_cliente_servico` bigint unsigned NOT NULL,
  `id_servidor` tinyint unsigned NOT NULL,
  `id` int unsigned NOT NULL DEFAULT '0',
  `data_cadastro` date NOT NULL,
  `valor` decimal(12,2) NOT NULL DEFAULT '0.00',
  `vencimento` date NOT NULL DEFAULT (0),
  `pago` tinyint unsigned NOT NULL DEFAULT '0',
  `data_pago` date NOT NULL DEFAULT (0),
  `valor_pago` decimal(12,2) unsigned NOT NULL DEFAULT '0.00',
  `historico` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
  `id_cliente` int unsigned NOT NULL DEFAULT (0),
  `id_servico` int unsigned NOT NULL DEFAULT (0),
  `nome` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
  `desativada` tinyint unsigned NOT NULL DEFAULT '0',
  `operador_inclusao` varchar(50) NOT NULL,
  `operador_liquidacao` varchar(50) NOT NULL,
  `forma_pago` varchar(50) NOT NULL,
  `status_banco` tinyint NOT NULL DEFAULT (0),
  PRIMARY KEY (`uid_`),
  KEY `vencimento` (`vencimento`) USING BTREE,
  KEY `id_cliente` (`id_cliente`),
  KEY `id_servico` (`id_servico`),
  KEY `pago` (`pago`),
  KEY `id_servidor` (`id_servidor`),
  KEY `id` (`id`) USING BTREE,
  KEY `uid_cliente` (`uid_cliente`),
  KEY `data_pago` (`data_pago`),
  KEY `data_cadastro` (`data_cadastro`),
  KEY `historico` (`historico`),
  KEY `uid_cliente_servico` (`uid_cliente_servico`),
  KEY `desativada` (`desativada`),
  KEY `operador_inclusao` (`operador_inclusao`),
  KEY `operador_liquidacao` (`operador_liquidacao`),
  KEY `venc_serv_pago` (`vencimento`,`id_servidor`,`pago`),
  KEY `forma_pago` (`forma_pago`),
  KEY `status_banco` (`status_banco`),
  KEY `datapago_serv_pago` (`data_pago`,`id_servidor`,`pago`),
  KEY `vencimento_serv_pago` (`data_pago`,`id_servidor`,`pago`)
)

If I select a range of dates, it works normally:

explain SELECT  *  FROM fatura  USE INDEX (datapago_serv_pago)   WHERE id_servidor = 10 AND pago = '1' AND data_pago >= '2021-04-01'  AND data_pago <= '2021-05-05';
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+--------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys      | key                | key_len | ref  | rows   | filtered | Extra                 |
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+--------+----------+-----------------------+
|  1 | SIMPLE      | fatura | NULL       | range | datapago_serv_pago | datapago_serv_pago | 5       | NULL | 158342 |     0.01 | Using index condition |
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+--------+----------+-----------------------+

I've searched a lot throughout the documentation but I couldn't find the reason or solution, does anyone know?


Solution

  • MySQL won't use an index if the cost-based optimizer estimates that it would be more work to use the index than to skip the index and just read all the rows.

    Keep in mind that when making a secondary index lookup in InnoDB, it's really two lookups. One to find an entry in the index, and also a primary key lookup to get the other columns of the table that are not part of the index.

    So depending on how much of the table is matched by the search, it might be more costly to use the index. In my experience, if the query condition matches more than 20% of the table, the optimizer usually chooses not to use that index. I don't think that's an official or documented threshold, it's just based on observation.

    So I'd guess your condition on the first column of the index:

    data_pago <= '2021-05-05'
    

    matches 20% or more of the table.

    And when your conditions were for a more narrow range of dates:

    data_pago >= '2021-04-01'  AND data_pago <= '2021-05-05'
    

    this persuaded the optimizer that it would match a lesser portion of the table, so it was worth using the index.

    The column referenced in this range condition, data_pago, is the first column of the index, which means the subsequent columns of that index won't be used for the search. Basically, you can have several columns in the index help a search if and only if the leftmost columns are all used in equality conditions. Once you have one column used in any kind of inequality or range condition, that's the last column of the index that will be used for searching or sorting.

    For example in your case, the index is on (data_pago,id_servidor,pago), but the conditions are data_pago <= '2021-05-05' AND id_servidor = 10 AND pago = '1' (rearranging these terms to match the order of columns of the index). The first of these columns is a range condition, so the other two columns can't be used.

    If you had defined the index with data_pago in the third position, then all three columns could be used, and that might narrow down the search better, so it matches substantially less than 20% of the table.

    These rules change when you select only the single column, instead of SELECT *. Then the optimizer reasons that it's an index-only query, and it doesn't bear the extra cost of reading the table rows. This means the columns it needs to fetch are all from the index itself (the primary key column is always part of that index in InnoDB). So it changes its mind, and uses the index.

    Note the filtered: 0.01 reported by the second EXPLAIN. This means of the rows it examines, it will discard 99.99% of them by applying non-indexed conditions. This is poor optimization, because it means it must examine a lot of rows only to discard them. Ideal optimization would be filtered: 100.0, which would indicate every row it examined through index lookups are in fact rows that belong in the result set.

    Apparently when doing a table-scan (type: ALL) the filtered column isn't calculated, so it reports 0.0. This is not mentioned in the documentation.