mysqlmysql-jsonfunctional-index

Why is MySQL not considering a (JSON) functional index in a join, but an index on a generated column is considered?


I am using JSON columns a lot in MySQL (version 8.0.35). Usually, I create functional indexes on JSON properties to speed up queries. According to the MySQL docs, Functional indexes are implemented as hidden virtual generated columns. However, they seem to behave differently from indexes on generated columns in joins. I will use a somehow constructed example to explain my point.

Example: Given are two tables product and purchase. purchase has a JSON property $.productUuid that references a product.

CREATE TABLE IF NOT EXISTS product (
    id         BINARY(16) NOT NULL,
    payload    JSON       NOT NULL,
    
    CONSTRAINT pk_product PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE IF NOT EXISTS purchase (
    id         BINARY(16) NOT NULL,
    payload    JSON       NOT NULL,
    
    CONSTRAINT pk_product PRIMARY KEY (id),
    INDEX `i_purchase_product` (
        ( CAST(payload->>'$.productUuid' AS CHAR(36)) COLLATE utf8mb4_bin )
    )
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO product (id, payload)
VALUES
  ( UUID_TO_BIN(UUID(), TRUE), '{ "name": "random drink" }' ),
  ( UUID_TO_BIN(UUID(), TRUE), '{ "name": "random dish" }' ),
  ( UUID_TO_BIN(UUID(), TRUE), '{ "name": "random tool" }' )
;

INSERT INTO purchase (id, payload)
SELECT
  UUID_TO_BIN(UUID(), TRUE), JSON_SET(payload, '$.productUuid', BIN_TO_UUID(id))
FROM product
;

For the following query

SELECT * FROM product a INNER JOIN purchase b ON BIN_TO_UUID(a.id) = b.payload->>'$.productUuid';

MySQL generates the following plan:

+----+-------------+-------+---------------+------+--------------------------------------------+
| id | select_type | table | possible_keys | key  | Extra                                      |
+----+-------------+-------+---------------+------+--------------------------------------------+
|  1 | SIMPLE      | a     | NULL          | NULL | NULL                                       |
|  1 | SIMPLE      | b     | NULL          | NULL | Using where; Using join buffer (hash join) |
+----+-------------+-------+---------------+------+--------------------------------------------+

The plan shows that the functional index is not even considered. The case is different if I create the table with a generated column and a regular index on it.

CREATE TABLE IF NOT EXISTS purchase (
    id           BINARY(16)  NOT NULL,
    payload      JSON        NOT NULL,
    product_uuid VARCHAR(36) GENERATED ALWAYS AS (payload->>'$.productUuid') STORED NOT NULL,
    
    CONSTRAINT pk_purchase PRIMARY KEY (id),
    INDEX `i_purchase_product` (product_uuid)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Now, the plan generated by MySQL is showing that the index is considered:

+----+-------------+-------+--------------------+--------------------+-----------------------+
| id | select_type | table | possible_keys      | key                | Extra                 |
+----+-------------+-------+--------------------+--------------------+-----------------------+
|  1 | SIMPLE      | a     | NULL               | NULL               | NULL                  |
|  1 | SIMPLE      | b     | i_purchase_product | i_purchase_product | Using index condition |
+----+-------------+-------+--------------------+--------------------+-----------------------+

Is there an explanation, preferrably documented, for this behavior?


Solution

  • This is a known bug: https://bugs.mysql.com/bug.php?id=98937

    It's not even about the use of JSON. Any functional index cannot be used in a JOIN, even if the same index can be used in another condition.

    I modified your example to use an arbitrary string function instead of using JSON functions:

    CREATE TABLE IF NOT EXISTS product (
        id         BINARY(16) NOT NULL,
        abc CHAR(3) NOT NULL,
        CONSTRAINT pk_product PRIMARY KEY (id)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
    
    CREATE TABLE IF NOT EXISTS purchase (
        id         BINARY(16) NOT NULL,
        abc CHAR(3) NOT NULL, 
        CONSTRAINT pk_product PRIMARY KEY (id),
        INDEX `i_abc` ((REVERSE(abc)))
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
    
    INSERT INTO product (id, abc)
    VALUES
      ( UUID_TO_BIN(UUID(), TRUE), 'abc' ),
      ( UUID_TO_BIN(UUID(), TRUE), 'abc' ),
      ( UUID_TO_BIN(UUID(), TRUE), 'abc' );
    
    INSERT INTO purchase (id, abc)
    SELECT
      UUID_TO_BIN(UUID(), TRUE),  REVERSE(abc)
    FROM product;
    

    Searching using the expression of the functional index works as expected, it utilizes the index:

    EXPLAIN 
    SELECT * FROM purchase b WHERE REVERSE(b.abc) = 'abc';
    
    +----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------+
    
    | id | select_type | table | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra |
    +----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------+
    |  1 | SIMPLE      | b     | NULL       | ref  | i_abc         | i_abc | 15      | const |    3 |   100.00 | NULL  |
    +----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------+
    

    But joining using the expression of the functional index does not use the index:

    EXPLAIN
    SELECT * FROM product a
    INNER JOIN purchase b
    ON REVERSE(b.abc) = a.abc;
    
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
    | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
    |  1 | SIMPLE      | a     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |   100.00 | NULL                                       |
    |  1 | SIMPLE      | b     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |   100.00 | Using where; Using join buffer (hash join) |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
    

    The bug was reported in March 2020 (over three years ago as we write this). The only resolution was that they agreed to "file a documentation bug to clarify the current limitations." But then they linked to an internal bug (not visible to me).

    So far, I have not found that they document this limitation.