I have a Postgres query I'm trying to aggregate an array of arrays with different dimensions, a simplified form of the query can be written as:
SELECT n, ARRAY(SELECT UNNEST(a) FROM (SELECT ARRAY_AGG(a) AS a) a) AS a, SUM(q) AS q
FROM (
SELECT 'foo' AS n, '{1,2}'::integer[] AS a, 1 AS q
UNION ALL
SELECT 'foo' AS n, '{3,4,5}'::integer[] AS a, 1 AS q
) results
GROUP BY n
This results in an error:
Query 1 ERROR at Line 1:
ERROR: cannot accumulate arrays of different dimensionality
The result I'm hoping for is:
n | a | q |
---|---|---|
'foo' | {1,2,3,4,5} | 2 |
However, if the arrays have the same dimensions it works fine, eg:
SELECT n, ARRAY(SELECT UNNEST(a) FROM (SELECT ARRAY_AGG(a) AS a) a) AS a, SUM(q) AS q
FROM (
SELECT 'foo' AS n, '{1,2,3}'::integer[] AS a, 1 AS q
UNION ALL
SELECT 'foo' AS n, '{4,5,6}'::integer[] AS a, 1 AS q
) results
GROUP BY n
Output:
n | a | q |
---|---|---|
'foo' | {1,2,3,4,5,6} | 2 |
I have tried asking every LLM on the internet and none can answer it, so it's down to you humans now, please help.
The problem is in the intermediary result of ARRAY_AGG(a)
in your nested subquery. That builds a multi-dimensional array, for which arrays with identical cardinalities in the input are required. See:
The basic difficulty for Postgres is that an array type can contain arbitrary array dimensions, but multidimensional arrays must have matching extents for each dimension.
There is a lot of fine-print to this. NULL values? Empty arrays? Duplicates? Preserve array positions? ....
Simpler cases than yours can be solved by simply unnesting first. But you also want to aggregate a sum. You could run two separate aggregations:
WITH results(n,a,q) AS (
VALUES
('foo', '{1,2}'::int[], 1)
, ('foo', '{3,4,5}', 1)
)
SELECT n, a, q
FROM (
SELECT n, sum(q) AS q
FROM results
GROUP BY n
) q
FULL JOIN (
SELECT n, array_agg(elem) AS a
FROM results, unnest(a) AS elem
GROUP BY n
) a USING (n);
Complicates the query, but uses only basic tools.
Alternatively, create a custom aggregate function that simply concats arrays - with matching dimensions!
Create once per database:
CREATE AGGREGATE array_concat (anycompatiblearray) (
sfunc = array_cat
, stype = anycompatiblearray
, initcond = '{}'
);
Then:
SELECT n, array_concat(a) AS a, sum(q) AS q
FROM (
SELECT 'foo' AS n, '{1,2}'::int[] AS a, 1 AS q
UNION ALL
SELECT 'foo' AS n, '{3,4,5}'::int[] AS a, 1 AS q
) results
GROUP BY n;
Note: Demands matching array dimensions in the input. I added a violating example in the fiddle.
(Works fine for all 1-dimensional arrays like in your example.)
In a quick test on a big table with large arrays, the custom aggregate function was 10 - 100 times faster than any workaround here that has to unnest arrays first.
jsonb
You posted a related solution. Here is another version with jsonb_path_query()
:
SELECT n
, ARRAY(SELECT jsonb_path_query(jsonb_agg(a), '$[*][*]')::int) AS a
, sum(q) AS q
FROM results
GROUP BY n;
But it performed poorly in a test.
Related: