I have the following code in PostgreSQL, which I need to reproduce on Amazon Redshift:
select *
from
( select distinct
a.id
, sum(case when a.is_batch_empty then 1 else 0 end)
over (partition by a.client_id order by a.id) as empty_count
from my_temp_table a
) a
where a.id = 111
In Amazon Redshift, this code execution fails with the following error:
Aggregate window functions with an
ORDER BY
clause require a frame clause
So, on Redshift, I need to add ROWS BETWEEN
section.
The problem is that I cannot find the formula which gives the same result as on PostgreSQL.
What I tried so far:
Without ORDER BY
With a frame ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW
With a frame ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
I have different result with every of these options, and no one of them matches the PostgreSQL solution.
So, what I have to do with this formula to get identical results on PostgreSQL and Redshift?
In PostgreSQL order by
in window spec implies between unbounded preceding and current row
frame clause. If you remove order by
, the frame spans the whole partition. Quoting the doc:
With
ORDER BY
, this sets the frame to be all rows from the partition start up through the current row's lastORDER BY
peer. WithoutORDER BY
, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.
And that's mentioned right after what you were looking for:
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
Your second attempt was the closest, but note that the default is RANGE
, not ROWS
and I don't see RANGE
in Redshift window function doc. If a.id
isn't unique, the results will differ as demonstrated here.
You could try to emulate the RANGE
frame clause behaviour with a correlated scalar subquery:
select distinct a.id,
(select sum(case when t2.is_batch_empty then 1 else 0 end)
from my_temp_table t2
where a.client_id=t2.client_id
and a.id>=t2.id)
from my_temp_table a