mysqljsondatabasedatabase-normalizationfirst-normal-form

Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?


In MySQL 5.7 a new data type for storing JSON data in MySQL tables has been added. It will obviously be a great change in MySQL. They listed some benefits

Document Validation - Only valid JSON documents can be stored in a JSON column, so you get automatic validation of your data.

Efficient Access - More importantly, when you store a JSON document in a JSON column, it is not stored as a plain text value. Instead, it is stored in an optimized binary format that allows for quicker access to object members and array elements.

Performance - Improve your query performance by creating indexes on values within the JSON columns. This can be achieved with “functional indexes” on virtual columns.

Convenience - The additional inline syntax for JSON columns makes it very natural to integrate Document queries within your SQL. For example (features.feature is a JSON column): SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254;

WOW ! they include some great features. Now it is easier to manipulate data. Now it is possible to store more complex data in column. So MySQL is now flavored with NoSQL.

Now I can imagine a query for JSON data something like

SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN 
( 
SELECT JSON_EXTRACT(data,"$.inverted") 
FROM t1 | {"series": 3, "inverted": 8} 
WHERE JSON_EXTRACT(data,"$.inverted")<4 );

So can I store huge small relations in few json colum? Is it good? Does it break normalization. If this is possible then I guess it will act like NoSQL in a MySQL column. I really want to know more about this feature. Pros and cons of MySQL JSON data type.


Solution

  • SELECT * FROM t1
    WHERE JSON_EXTRACT(data,"$.series") IN ...
    

    Using a column inside an expression or function like this spoils any chance of the query using an index to help optimize the query. The query shown above is forced to do a table-scan.

    The claim about "efficient access" is misleading. It means that after the query examines a row with a JSON document, it can extract a field without having to parse the text of the JSON syntax. But it still takes a table-scan to search for rows. In other words, the query must examine every row.

    By analogy, if I'm searching a telephone book for people with first name "Bill", I still have to read every page in the phone book, even if the first names have been highlighted to make it slightly quicker to spot them.

    MySQL 5.7 allows you to define a virtual column in the table, and then create an index on the virtual column.

    ALTER TABLE t1
      ADD COLUMN series AS (JSON_EXTRACT(data, '$.series')),
      ADD INDEX (series);
    

    Then if you query the virtual column, it can use the index and avoid the table-scan.

    SELECT * FROM t1
    WHERE series IN ...
    

    This is nice, but it kind of misses the point of using JSON. The attractive part of using JSON is that it allows you to add new attributes without having to do ALTER TABLE. But it turns out you have to define an extra (virtual) column anyway, if you want to search JSON fields with the help of an index.

    But you don't have to define virtual columns and indexes for every field in the JSON document—only those you want to search or sort on. There could be other attributes in the JSON that you only need to extract in the select-list like the following:

    SELECT JSON_EXTRACT(data, '$.series') AS series FROM t1
    WHERE <other conditions>
    

    I would generally say that this is the best way to use JSON in MySQL. Only in the select-list.

    When you reference columns in other clauses (JOIN, WHERE, GROUP BY, HAVING, ORDER BY), it's more efficient to use conventional columns, not fields within JSON documents.

    I presented a talk called How to Use JSON in MySQL Wrong at the Percona Live conference in April 2018. I'll update and repeat the talk at Oracle Code One in the fall.

    There are other issues with JSON. For example, in my tests it required 2-3 times as much storage space for JSON documents compared to conventional columns storing the same data.

    MySQL is promoting their new JSON capabilities aggressively, largely to dissuade people against migrating to MongoDB. But document-oriented data storage like MongoDB is fundamentally a non-relational way of organizing data. It's different from relational. I'm not saying one is better than the other, it's just a different technique, suited to different types of queries.

    You should choose to use JSON when JSON makes your queries more efficient.

    Don't choose a technology just because it's new, or for the sake of fashion.


    Edit: The virtual column implementation in MySQL is supposed to use the index if your WHERE clause uses exactly the same expression as the definition of the virtual column. That is, the following should use the index on the virtual column, since the virtual column is defined AS (JSON_EXTRACT(data,"$.series"))

    SELECT * FROM t1
    WHERE JSON_EXTRACT(data,"$.series") IN ...
    

    Except I have found by testing this feature that it does NOT work for some reason if the expression is a JSON-extraction function. It works for other types of expressions, just not JSON functions. UPDATE: this reportedly works, finally, in MySQL 5.7.33.