I have a simple table in redshift with the below schema:
Column | Type | Collation | Nullable | Default
-----------------+-----------------------------+-----------+----------+---------
request_id | character varying(256) | | not null |
user_segments | super | | |
I am using user_segments as a JSON ARRAY. This is how I INSERT a sample record
insert into test_table (request_id,json_parse('[4,5,6]'));
Now I want to write an IN query on user_segments,something like this:
select * from test_table where user_segments in (5,6);
From Redshift documentation it looks like one has to use PartiQL syntax which makes use of [...] and dot notation
The above obviously does not work. What works is:
select * from test_table where user_segments[0] in (5,6);
But this only looks for the value 5 OR 6 in the 0th index of the array. How do I write the above query to make it check the complete array ? Something like
select * from test_table where user_segments[x] in (5,6);
You are trying to do a many to many comparison - the super array can be arbitrarily long and so can the IN comparison list. This is complex and no single function will do this.
You can unnest the array (see: https://docs.aws.amazon.com/redshift/latest/dg/query-super.html) and then simply use an IN clause like you are trying. This basically unrolls all the arrays and does a row by row comparison which could get expensive for large tables with large arrays.
You can alternatively use json_serialize() to convert the super data into a json string then use text functions to see if the values in question exist in the arrays. This clearly has limitation and will only work if there are some constraints on how the super values are constructed. For your example it could look like:
select * from test_table
where translate(json_serialize(user_segments),'][',',,') LIKE '%,5,%'
OR translate(json_serialize(user_segments),'][',',,') LIKE '%,6,%';