mysqljson

MySQL JSON datatype count and groupby


Since MySQL 5.7 I am now able to use the JSON datatype which is nice when your working with frameworks like Angular.js.

Now since I'm fairly new to this (and this datatype is also fairly new) I was wondering if I could make some simple operations on that dataset.

For instance I have saved a JSON in my database where each object contains the following fields:

name
country
phone_num

I wish to sum how many that lives in each country normally a select like this would work

select country, count(*) as num_residents from mytable

How can I use this sort of select statement on a table that has a JSON datatype?

Update

My table looks like this:

id  int(11) AI PK
date_created    timestamp
data    json
schema_id   int(11)

The data looks like this:

[{"A": 1, "B": "Debra", "C": "Peters", "D": "dpeters0@accuweather.com", "E": "Female", "F": "Tsagaan-Ovoo"}, {"A": 2, "B": "Marc", "C": "Rasmussen", "D": "Marc@test.dk", "E": "Male", "F": "Copenhagen"}]

And I am trying the following SQL statement:

SELECT
  data,
  JSON_EXTRACT(data, "$.F")        AS country,
  count(JSON_EXTRACT(data, "$.F")) AS num_residents
FROM kpi_data
WHERE schema_id = 39
GROUP BY country, data

But sadly I get this result:

enter image description here


Solution

  • Firstly you are storing all the JSON in one row, if you want to query the data like this each resident should have it's own row of his/her own JSON:

    *other_fields*, {"A": 1, "B": "Debra", "C": "Peters", "D": "dpeters0@accuweather.com", "E": "Female", "F": "Tsagaan-Ovoo"}
    *other_fields*, {"A": 2, "B": "Marc", "C": "Rasmussen", "D": "Marc@test.dk", "E": "Male", "F": "Copenhagen"}
    

    Secondly you don't need to count the extracted data, you just need to count the rows for each country extracted. This will do:

    SELECT JSON_EXTRACT(data, "$.F") AS country,
           COUNT(*) AS num_residents
    

    Thirdly, you don't want to GROUP BY data at all as this is presumably unique per resident. This will leave you with a group for each row. I think you just want:

    GROUP BY country
    

    Putting it all together:

      SELECT JSON_EXTRACT(data, "$.F") AS country,
             COUNT(*) AS num_residents
        FROM kpi_data
       WHERE schema_id = 39
    GROUP BY country
    

    For decent performance, you may consider placing an secondary index on the extracted country.. see DOCS