sqlgoogle-bigquerycommon-table-expression

SQL recursive CTEs to perm array


I'm having some trouble formulating the correct recursive behaviour to get what I want in SQL. I'm limited to the BigQuery environment and I want to avoid using any JavaScript if I can, so I wanted to try and recreate this behaviour just using CTEs if possible. I can do exactly what I want in Python, but have been unable to do it in SQL. Yes I know SQL is not really designed for this, but I do have a reason for it so would like to try and get it to work. I want to be able to provide any length array and return a list of permutations. For example:

INPUT [1, 1, 1] OUTPUT [0, 0, 0]
INPUT [2, 1, 1] OUTPUT [[0, 0, 0], [1, 0, 0]]
INPUT [2, 1, 2] OUTPUT [[0, 0, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
INPUT [1, 5] OUTPUT [[0,0], [0,1], [0,2], [0,3], [0,4]]

I don't mind how the inputs and outputs are formatted as long as I can parse them one way or another. E.g. returning rows of strings are fine, each element in a separate row is fine as I can use array_agg, etc.

In Python doing this recursively or using for-loops is pretty trivial. I start with a list of zeroes and work my way along each index, incrementing the value at that index until it reaches input value-1 and store each permutation before moving on.

In SQL this is what I have so far, which is only really the first step:

WITH RECURSIVE
  OrigSeq AS (SELECT [2, 1, 2] as some_numbers), -- input sequence
  BaseSeq AS (SELECT ARRAY(SELECT 0 as a FROM UNNEST(OrigSeq.some_numbers)) AS base FROM OrigSeq), -- get array of 0s of same size as input sequence
  Sequences AS (
    (SELECT 0 AS perm_id, 0 as idx, BaseSeq.base[0] as iter, OrigSeq.some_numbers[0]-1 AS orig, BaseSeq.base as base, OrigSeq.some_numbers as num FROM OrigSeq, BaseSeq)
    UNION ALL
    -- increment index
    SELECT perm_id, idx+1 as idx, base[idx+1] as iter, num[idx+1]-1 as orig, base, num FROM Sequences WHERE idx < ARRAY_LENGTH(num)-1
  ),
  Seq2 AS (
    SELECT * FROM Sequences
    UNION ALL
    -- parse iters - not doing this quite right and my brain stops functioning around here
    SELECT perm_id+1, idx, base[idx]+1 as iter, orig, base, num FROM Sequences where iter <= orig
  ),
  Seq3 AS (
    Select * From Seq2
    UNION ALL
    SELECT perm_id, idx, iter, orig, base, num from Sequences
  )
SELECT perm_id, idx, orig, iter
FROM Seq3
ORDER BY perm_id, idx

Any guidance appreciated.


Solution

  • Recursivity to append

    A first recursive loop can compute all possible values at each position,
    then a second recursive one will append to an array starting from the empty array.

    WITH RECURSIVE
      OrigSeq AS (SELECT ARRAY[2, 1, 2] AS a),
      -- List the indices of our array (which will give us the number of cells too):
      Pos AS (SELECT n, pos FROM OrigSeq, UNNEST(a) AS n WITH OFFSET AS pos),
      -- For each position list its possible values:
      Poss AS
      (
        SELECT n - 1 AS n, Pos.pos FROM Pos -- Per specification, "2" should give a range from 0 to 1 (2 is the count, not the max), so start one step beyond.
        UNION ALL
        SELECT n - 1, pos FROM Poss WHERE n > 0
      ),
      -- Start from an empty array, generate as many arrays of 1 element as there are possibilities for cell 1, and so on:
      Combi AS
      (
        SELECT [] a, 0 next
        UNION ALL
        SELECT ARRAY_CONCAT(a, [ n ]), next + 1
        FROM Combi, Poss
        WHERE Poss.pos = next
      )
    -- Keep only the completed arrays:
    SELECT * from Combi WHERE next = (SELECT COUNT(1) FROM Pos);
    
    a lvl
    {1,0,1} 3
    {0,0,1} 3
    {1,0,0} 3
    {0,0,0} 3

    See the PostgreSQL equivalent in a fiddle:

    Recursivity to decrement

    There is a way to first generate all positions as posToDecrement,
    then recursively starting from the initial numbers array as a, joining unnest(a) to posToDecrement to generate the different combinations, and array_agg(a.value) over (order by pos rows between unbounded preceding and unbounded following), discarding rows where any value falls under 0 (min() < 0).

    You'll find a fiddle proof-of-concept for PostgreSQL.

    However my intuition tells there is room for improvement: