arraysjsonmariadb

How do I use a json array to search and extract matched values from a different json array in mariaDB?


Situation

I have two tables. ing_master is a master list of ingredients; pdet is a product table, with an ingredients field formatted as json array.

For each product's ingredient list in pdet, I want to extract all individual ingredients that match a value in the ing_master table.


Attempts and data

So far, I've found that JSON_ARRAYAGG()(plus a regexp_replace() to remove line breaks) works to convert the ing_master table into a json array for the comparison, and I've gotten the two tables into the same query via a left join on a fake ID field and grouping on product.

I'm attempting to use json_search to find all the path locations in the pdet ingredients field that match values in the ingredients_master table array, plus json_extract to take those path locations and return the matched ingredients:

SELECT sync_sku,a.ingAll, 
 json_extract(json_search(JSON_ARRAYAGG(regexp_replace(s.ingName,'\\r','')),'all',a.ingAll),a.ingAll) as ingMatch

FROM pdet a
LEFT JOIN ing_master s
        ON s.fakeMatch = 0
GROUP BY a.sync_sku

This query is running without errors, but I'm getting all NULL on the matched ingredients, even for rows where I know there should be an exact match.

Example tables:

pdet

sync_sku ingAll
1 ["potato","lemon"]
2 ["apple","banana","orange"]
3 ["potato","banana","orangutan"]

ing_master

fakeMatch ingName
0 lemon
0 banana
0 orange
0 apple
Goal:

query result

sync_sku ingAll ingMatch
1 ["potato","lemon"] ["lemon"]
2 ["apple","banana","orange"] ["apple","banana","orange"]
3 ["potato","banana","orangutan"] ["banana"]

I've been reading through the documentation and while there are a lot of tools and examples for finding and extracting json values with static search keys (I've read through the oft-recommended "how do I extract values from a json array in mariadb or mysql" post a couple times)

--I can't find much on extracting matched values between two arrays like this. There's the json_overlaps function, but I don't want a true/false on the field, I want an array of the specific matched values by product.

Is there a way to get the info I need?


Solution

  • To perform well with many rows in ing_master, you'll want an index on ingName and to use json_table to expand the array and join ing_master for each ingredient, then reaggregate:

    select sync_sku, json_arrayagg(im.ingName order by ing.order_id) ings
    from pdet
    cross join
        json_table(
            pdet.ingAll,
            '$[*]' columns (
                order_id for ordinality,
                ingName varchar(16) path '$'
            )
        ) ing
    inner join ing_master im using (ingName)
    group by sync_sku
    

    That will lose any rows with no ingredients found. To report them with an empty array, you need a left join of the inner join, and extra code in the select to avoid returning [null]:

    select sync_sku, if(count(ingName),json_arrayagg(ingName order by ing.order_id),json_array()) ings
    from pdet
    left join (
        json_table(
            pdet.ingAll,
            '$[*]' columns (
                order_id for ordinality,
                ingName varchar(16) path '$'
            )
        ) ing
        inner join ing_master im using (ingName)
    ) on im.ingName is not null
    group by sync_sku
    

    fiddle

    Note that MySQL's json_arrayagg doesn't have an order by clause (or limit or distinct) like MariaDB does. And MariaDB doesn't allow it as a window function yet, though MySQL does. Just another example of how they have sometimes implemented the same new feature in excitingly incompatible ways since the fork.