amazon-web-servicesamazon-redshift

Moving RedShift Query to Sub Query Causes Problem


When the following query runs in RedShift it returns 659 rows very quickly.

select
-- a list of fields
from
Table1 as t1
inner join Table2 as t2 on t2.id=t1.accountid
inner join Table3 as t3 on t3.subscriptionid=t1.id
inner join Table4 as t4 on t4.id=t3.orderid
inner join Table5 as t5 on t3.id=t5.actionid
inner join Table6 as t6 on t5.rateid=t6.id
inner join Table7 as t7 on t7.rateid=t6.id
where
t4.status='Completed'
and TRUNC(t4.createddate) >= TRUNC(SYSDATE) - 1
and t7.islastvalue='TRUE'
and t7.effectivestartdate <= t3.effectivedate
and t7.effectiveenddate >= t3.effectivedate
and t1.iscurrentversion ='true'

When it gets wrapped as a sub query like below it takes many hours to run. (NOTE: I do not have control over the tool wrapping the query so I cannot prevent it. The question is why does RedShift take much longer to complete this second version?

select top 1 * from 
(
select
-- a list of fields
from
Table1 as t1
inner join Table2 as t2 on t2.id=t1.accountid
inner join Table3 as t3 on t3.subscriptionid=t1.id
inner join Table4 as t4 on t4.id=t3.orderid
inner join Table5 as t5 on t3.id=t5.actionid
inner join Table6 as t6 on t5.rateid=t6.id
inner join Table7 as t7 on t7.rateid=t6.id
where
t4.status='Completed'
and TRUNC(t4.createddate) >= TRUNC(SYSDATE) - 1
and t7.islastvalue='TRUE'
and t7.effectivestartdate <= t3.effectivedate
and t7.effectiveenddate >= t3.effectivedate
and t1.iscurrentversion ='true'
) as "_"

I thought that the queries would run nearly the same, and only the amount of data returned would be affected.

--- EDIT 1 --- Below is the query plan for the failing query. It looks like the sub-query has completed, but the parent query just doesn't want to return.

enter image description here


Solution

  • top in redshift is equivalent to limit and order:

    TOP number TOP takes a positive integer as its argument, which defines the number of rows that are returned to the client. The behavior with the TOP clause is the same as the behavior with the LIMIT clause. The number of rows that is returned is fixed, but the set of rows isn't. To return a consistent set of rows, use TOP or LIMIT in conjunction with an ORDER BY clause.

    https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html#r_SELECT_list-parameters

    So in order to "revert" the discrepancy - just use limit instead of top - it should speed up.