distinctflattensinglestore

how to do set union operations on arrays?


Here's what I want to accomplish -

  1. I want to have an array[string] column in a singlestore columnar table
  2. Do a set union operation on this column over some time range/conditions.

I don't know how to express this in Singlestore sql.
Singlestore doesn't support array columns, so I've resorted to using json col.
Any pointers ?

Sample table

| key   | locations                         |
| ----- | --------------------------------- |
| alice | ["sanjose", "sancarlos", "miami"] |
| alice | ["sanjose","milpitas"]            |
| alice | ["miami"]                         |
| alice | ["sanmateo","sanfrancisco"]       |
| alice | ["redwoodshores","sanfrancisco"]  |
| bob   | ["sanjose","milpitas"]            |
| bob   | ["freemont","onioncity"]          |
| bob   | ["onioncity"]                     |
| bob   | ["newark","milpitas"]             |
| bob   | ["sanjose"]                       |
| bob   | ["santacruz"]                     |

Expected output

| key   | array_agg(distinct elem)   OR   array_distinct(flatten(array_agg(locations)))  |
| ----- | ------------------------------------------------------------------------------ |
| alice | [ miami, milpitas, redwoodshores, sancarlos, sanfrancisco, sanjose, sanmateo ] |
| bob   | [ freemont, milpitas, newark, onioncity, sanjose, santacruz ]                  |

Any pointers on how I can accomplish this with Singlestore ?

How I would do this in other DBs/Frameworks:


Solution

  • You can use JSON_AGG function to achieve this. This function does not support distinct like json_agg(distinct(key_col)), so using the distinct clause in CTE itself.

    Docs: https://docs.singlestore.com/managed-service/en/reference/sql-reference/json-functions/json_agg.html

    Sample Code

    create table json_array_example (key_col text, locations json);
    
    
    insert into json_array_example values('alice', '["sanjose", "sancarlos", "miami"]');
    insert into json_array_example values('alice', '["sanjose","milpitas"]');
    insert into json_array_example values('alice', '["miami"]');
    insert into json_array_example values('alice', '["sanmateo","sanfrancisco"]');
    insert into json_array_example values('alice', '["redwoodshores","sanfrancisco"]');
    insert into json_array_example values('bob'  , '["sanjose","milpitas"]');
    insert into json_array_example values('bob'  , '["freemont","onioncity"]');
    insert into json_array_example values('bob'  , '["onioncity"]');
    insert into json_array_example values('bob'  , '["newark","milpitas"]');
    insert into json_array_example values('bob'  , '["sanjose"]');
    insert into json_array_example values('bob'  , '["santacruz"]');
    
    
    WITH t AS (SELECT distinct key_col, table_col AS locations
               FROM json_array_example
                        JOIN TABLE(JSON_TO_ARRAY(locations)))
    select key_col, json_agg(locations) as locations
    from t
    group by key_col;