I'm having some trouble formulating the correct recursive behaviour to get what I want in SQL. I'm limited to the BigQuery environment and I want to avoid using any JavaScript if I can, so I wanted to try and recreate this behaviour just using CTEs if possible. I can do exactly what I want in Python, but have been unable to do it in SQL. Yes I know SQL is not really designed for this, but I do have a reason for it so would like to try and get it to work. I want to be able to provide any length array and return a list of permutations. For example:
INPUT [1, 1, 1] OUTPUT [0, 0, 0]
INPUT [2, 1, 1] OUTPUT [[0, 0, 0], [1, 0, 0]]
INPUT [2, 1, 2] OUTPUT [[0, 0, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
INPUT [1, 5] OUTPUT [[0,0], [0,1], [0,2], [0,3], [0,4]]
I don't mind how the inputs and outputs are formatted as long as I can parse them one way or another. E.g. returning rows of strings are fine, each element in a separate row is fine as I can use array_agg, etc.
In Python doing this recursively or using for-loops is pretty trivial. I start with a list of zeroes and work my way along each index, incrementing the value at that index until it reaches input value-1 and store each permutation before moving on.
In SQL this is what I have so far, which is only really the first step:
WITH RECURSIVE
OrigSeq AS (SELECT [2, 1, 2] as some_numbers), -- input sequence
BaseSeq AS (SELECT ARRAY(SELECT 0 as a FROM UNNEST(OrigSeq.some_numbers)) AS base FROM OrigSeq), -- get array of 0s of same size as input sequence
Sequences AS (
(SELECT 0 AS perm_id, 0 as idx, BaseSeq.base[0] as iter, OrigSeq.some_numbers[0]-1 AS orig, BaseSeq.base as base, OrigSeq.some_numbers as num FROM OrigSeq, BaseSeq)
UNION ALL
-- increment index
SELECT perm_id, idx+1 as idx, base[idx+1] as iter, num[idx+1]-1 as orig, base, num FROM Sequences WHERE idx < ARRAY_LENGTH(num)-1
),
Seq2 AS (
SELECT * FROM Sequences
UNION ALL
-- parse iters - not doing this quite right and my brain stops functioning around here
SELECT perm_id+1, idx, base[idx]+1 as iter, orig, base, num FROM Sequences where iter <= orig
),
Seq3 AS (
Select * From Seq2
UNION ALL
SELECT perm_id, idx, iter, orig, base, num from Sequences
)
SELECT perm_id, idx, orig, iter
FROM Seq3
ORDER BY perm_id, idx
Any guidance appreciated.
A first recursive loop can compute all possible values at each position,
then a second recursive one will append to an array starting from the empty array.
WITH RECURSIVE
OrigSeq AS (SELECT ARRAY[2, 1, 2] AS a),
-- List the indices of our array (which will give us the number of cells too):
Pos AS (SELECT n, pos FROM OrigSeq, UNNEST(a) AS n WITH OFFSET AS pos),
-- For each position list its possible values:
Poss AS
(
SELECT n - 1 AS n, Pos.pos FROM Pos -- Per specification, "2" should give a range from 0 to 1 (2 is the count, not the max), so start one step beyond.
UNION ALL
SELECT n - 1, pos FROM Poss WHERE n > 0
),
-- Start from an empty array, generate as many arrays of 1 element as there are possibilities for cell 1, and so on:
Combi AS
(
SELECT [] a, 0 next
UNION ALL
SELECT ARRAY_CONCAT(a, [ n ]), next + 1
FROM Combi, Poss
WHERE Poss.pos = next
)
-- Keep only the completed arrays:
SELECT * from Combi WHERE next = (SELECT COUNT(1) FROM Pos);
a | lvl |
---|---|
{1,0,1} | 3 |
{0,0,1} | 3 |
{1,0,0} | 3 |
{0,0,0} | 3 |
See the PostgreSQL equivalent in a fiddle:
There is a way to first generate all positions as posToDecrement
,
then recursively starting from the initial numbers array as a
, joining unnest(a)
to posToDecrement
to generate the different combinations, and array_agg(a.value) over (order by pos rows between unbounded preceding and unbounded following)
, discarding rows where any value falls under 0 (min() < 0).
You'll find a fiddle proof-of-concept for PostgreSQL.
However my intuition tells there is room for improvement:
[3,2,4]
, iteration 1 should produce [2,2,4],[3,1,4],[3,2,3]
); relying on a UNION non-ALL
to discard duplicates (at iteration 2 two paths lead to [2,1,4]
: [2,2-1,4]
and [3-1,1,4]
).UNNEST
peudo-table.ARRAY_AGG(prev.n - CASE WHEN v.pos = CURRENT_ROW THEN 1 ELSE 0 END) FROM UNNEST(…) v
DISTINCT
after having joined Pos
(listing all possible indices)ARRAY_CONCAT(ARRAY_AGG(v.n) OVER (… TO 1 PRECEDING), v.n - 1, ARRAY_AGG(v.n) OVER (… FROM 1 FOLLOWING))