Since MySQL 5.7 I am now able to use the JSON datatype which is nice when your working with frameworks like Angular.js.
Now since I'm fairly new to this (and this datatype is also fairly new) I was wondering if I could make some simple operations on that dataset.
For instance I have saved a JSON in my database where each object contains the following fields:
name
country
phone_num
I wish to sum how many that lives in each country normally a select like this would work
select country, count(*) as num_residents from mytable
How can I use this sort of select statement on a table that has a JSON datatype?
My table looks like this:
id int(11) AI PK
date_created timestamp
data json
schema_id int(11)
The data looks like this:
[{"A": 1, "B": "Debra", "C": "Peters", "D": "dpeters0@accuweather.com", "E": "Female", "F": "Tsagaan-Ovoo"}, {"A": 2, "B": "Marc", "C": "Rasmussen", "D": "Marc@test.dk", "E": "Male", "F": "Copenhagen"}]
And I am trying the following SQL statement:
SELECT
data,
JSON_EXTRACT(data, "$.F") AS country,
count(JSON_EXTRACT(data, "$.F")) AS num_residents
FROM kpi_data
WHERE schema_id = 39
GROUP BY country, data
But sadly I get this result:
Firstly you are storing all the JSON in one row, if you want to query the data like this each resident should have it's own row of his/her own JSON:
*other_fields*, {"A": 1, "B": "Debra", "C": "Peters", "D": "dpeters0@accuweather.com", "E": "Female", "F": "Tsagaan-Ovoo"}
*other_fields*, {"A": 2, "B": "Marc", "C": "Rasmussen", "D": "Marc@test.dk", "E": "Male", "F": "Copenhagen"}
Secondly you don't need to count the extracted data, you just need to count the rows for each country extracted. This will do:
SELECT JSON_EXTRACT(data, "$.F") AS country,
COUNT(*) AS num_residents
Thirdly, you don't want to GROUP BY data
at all as this is presumably unique per resident. This will leave you with a group for each row. I think you just want:
GROUP BY country
Putting it all together:
SELECT JSON_EXTRACT(data, "$.F") AS country,
COUNT(*) AS num_residents
FROM kpi_data
WHERE schema_id = 39
GROUP BY country
For decent performance, you may consider placing an secondary index on the extracted country.. see DOCS