I have two tables. ing_master is a master list of ingredients; pdet is a product table, with an ingredients field formatted as json array.
For each product's ingredient list in pdet, I want to extract all individual ingredients that match a value in the ing_master table.
So far, I've found that JSON_ARRAYAGG()(plus a regexp_replace() to remove line breaks) works to convert the ing_master table into a json array for the comparison, and I've gotten the two tables into the same query via a left join on a fake ID field and grouping on product.
I'm attempting to use json_search to find all the path locations in the pdet ingredients field that match values in the ingredients_master table array, plus json_extract to take those path locations and return the matched ingredients:
SELECT sync_sku,a.ingAll,
json_extract(json_search(JSON_ARRAYAGG(regexp_replace(s.ingName,'\\r','')),'all',a.ingAll),a.ingAll) as ingMatch
FROM pdet a
LEFT JOIN ing_master s
ON s.fakeMatch = 0
GROUP BY a.sync_sku
This query is running without errors, but I'm getting all NULL on the matched ingredients, even for rows where I know there should be an exact match.
pdet
| sync_sku | ingAll |
|---|---|
| 1 | ["potato","lemon"] |
| 2 | ["apple","banana","orange"] |
| 3 | ["potato","banana","orangutan"] |
ing_master
| fakeMatch | ingName |
|---|---|
| 0 | lemon |
| 0 | banana |
| 0 | orange |
| 0 | apple |
query result
| sync_sku | ingAll | ingMatch |
|---|---|---|
| 1 | ["potato","lemon"] | ["lemon"] |
| 2 | ["apple","banana","orange"] | ["apple","banana","orange"] |
| 3 | ["potato","banana","orangutan"] | ["banana"] |
I've been reading through the documentation and while there are a lot of tools and examples for finding and extracting json values with static search keys (I've read through the oft-recommended "how do I extract values from a json array in mariadb or mysql" post a couple times)
--I can't find much on extracting matched values between two arrays like this. There's the json_overlaps function, but I don't want a true/false on the field, I want an array of the specific matched values by product.
Is there a way to get the info I need?
To perform well with many rows in ing_master, you'll want an index on ingName and to use json_table to expand the array and join ing_master for each ingredient, then reaggregate:
select sync_sku, json_arrayagg(im.ingName order by ing.order_id) ings
from pdet
cross join
json_table(
pdet.ingAll,
'$[*]' columns (
order_id for ordinality,
ingName varchar(16) path '$'
)
) ing
inner join ing_master im using (ingName)
group by sync_sku
That will lose any rows with no ingredients found. To report them with an empty array, you need a left join of the inner join, and extra code in the select to avoid returning [null]:
select sync_sku, if(count(ingName),json_arrayagg(ingName order by ing.order_id),json_array()) ings
from pdet
left join (
json_table(
pdet.ingAll,
'$[*]' columns (
order_id for ordinality,
ingName varchar(16) path '$'
)
) ing
inner join ing_master im using (ingName)
) on im.ingName is not null
group by sync_sku
Note that MySQL's json_arrayagg doesn't have an order by clause (or limit or distinct) like MariaDB does. And MariaDB doesn't allow it as a window function yet, though MySQL does. Just another example of how they have sometimes implemented the same new feature in excitingly incompatible ways since the fork.