Moving to Redshift from another provider, I see that GROUP BY ALL
(which groups by all non-aggregate columns in the SELECT statement) syntax is not supported. This was super useful for me when I had loads of non-aggregate columns and didn't want to manually type out GROUP BY 1,2,3,4...
etc.
I was wondering if there is in fact a similar command on Redshift or is it simply a limitation of PostgreSQL?
There is a group by
all
in both Redshift and Postgres but in PostgreSQL, it means something different. You might also be interested in group by
cube
, rollup
or grouping sets
. Postgres has them, Redshift has them too.
As mentioned by @John Rotenstein, on February 5th 2025 Redshift released patch 188 adding GROUP BY ALL
syntax for grouping by everything except aggregate expressions detected in the select list:
ALL
indicates to group by all columns specified in the SELECT list except those that are aggregated. For example, consider the following query which groups bycol1
andcol2
without having to specify them individually in the GROUP BY clause. The columncol3
is the argument of theSUM
function and thus not grouped.SELECT col1, col2 sum(col3) FROM testtable GROUP BY ALL
If you EXCLUDE a column in the SELECT list, the GROUP BY ALL clause does not group the results based on that specific column.
SELECT * EXCLUDE col3 FROM testtable GROUP BY ALL
PostgreSQL's version of the group by
all
syntax is actually the default group by
behaviour, the alternative being group by
distinct
which removes duplicate grouping sets. Quoting the doc:
When specifying multiple grouping items together, the final set of grouping sets might contain duplicates. For example:
GROUP BY ROLLUP (a, b), ROLLUP (a, c)
is equivalent to
GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a, b), (a, c), (a), (a), (a, c), (a), () )
If these duplicates are undesirable, they can be removed using the
DISTINCT
clause directly on theGROUP BY
. Therefore:GROUP BY DISTINCT ROLLUP (a, b), ROLLUP (a, c)
is equivalent to
GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a, c), (a), () )
It's similar to how select
is actually select all
unless you make it select distinct
. Again, the doc:
If
SELECT DISTINCT
is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates).SELECT ALL
specifies the opposite: all rows are kept; that is the default.
If there's a join, you can use a group by tablename.*
to make all columns of the table your grouping set but
join
conditions. The exception is if all their names match and you're willing to risk a natural join
.grouping()
function, but it'll just show that all rows belong to the 0th grouping set.select sum(test_vals.v)
from test
natural join test_vals
group by all test.*;
sum |
---|
774.07 |
2114.39 |
806.75 |
966.23 |
1644.51 |
The scenario I think @Lajos Arpad is referring to is this:
demo2 at db-fiddle
select test.*,sum(v)
from test
join test_vals using(a)
group by test.a;
Error: column "test.b" must appear in the GROUP BY clause or be used in an aggregate function
Suggested addition of b
to the group by
list makes that go away only to be replaced by a request to do something about c
and all following columns. That's no longer the case if a
is a primary key
- you can group by only the PK, skipping the rest:
alter table test add primary key(a);
select test.*,sum(v)
from test
join test_vals using(a)
group by test.a;
a | b | c | sum |
---|---|---|---|
1 | 7 | 3 | 4.42 |
2 | 8 | 1 | 3.50 |
3 | 1 | 2 | 5.18 |
You can find this, once more, in the doc:
If the products table is set up so that, say,
product_id
is the primary key, then it would be enough to group byproduct_id
in the above example, since name and price would be functionally dependent on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group.