sql subquery amazon-redshift correlated-subquery

ERROR: This type of correlated subquery pattern is not supported yet

I have a subquery that runs without problems on MySQL and I wanted to use the same query on AWS-Redshift, but I am getting this error: [0A000] ERROR: This type of correlated subquery pattern is not supported yet, this is the query:

SELECT 
    COALESCE(r.id, r2.id) AS region_id,
FROM 
    search s
LEFT JOIN 
    region r ON s.region_id = r.id
LEFT JOIN 
    region r2 ON r2.id = (SELECT id
                          FROM region AS r
                          WHERE (acos(sin(radians(s.latitude))
                                      * sin(radians(r.latitude))
                                 + cos(radians(s.latitude))
                                    * cos(radians(r.latitude))
                                    * cos(radians(r.longitude) - radians(s.longitude))
                                ) * 3959 < :dis
                            )
                            AND type IN (1)
                            ORDER BY
                            (
                                acos
                                (
                                    sin(radians(s.latitude))
                                    * sin(radians(r.latitude))
                                    + cos(radians(s.latitude))
                                    * cos(radians(r.latitude))
                                    * cos(radians(r.longitude) - radians(s.longitude))
                                )
                                * 3959
                            ) ASC LIMIT 1)
WHERE 
    s.user IS NOT NULL
ORDER BY
    s.date_created DESC;

So far, what I have found is that this part of the code is the problem:

                            ( acos
                                (
                                    sin(radians(s.latitude))
                                    * sin(radians(r.latitude))
                                    + cos(radians(s.latitude))
                                    * cos(radians(r.latitude))
                                    * cos(radians(r.longitude) - radians(s.longitude))
                                ) * 3959 < :dis
                            )
                            AND type IN (1)
                            ORDER BY
                            (
                                acos
                                (
                                    sin(radians(s.latitude))
                                    * sin(radians(r.latitude))
                                    + cos(radians(s.latitude))
                                    * cos(radians(r.latitude))
                                    * cos(radians(r.longitude) - radians(s.longitude))
                                )
                                * 3959
                            )

But I do not know how to do it without a subquery.

Solution

The problem is that the subquery in the ON clause needs to be re-evaluated for each join possibility. On a clustered db this is prohibitively expensive. So you need to flatten this out to an additional set of JOIN information that has all the needed information to join r and r2.

I can take a stab at this but I don't know your data and I'm just guessing on what is important in the query.

SELECT 
    COALESCE(r.id, c.region_id) AS region_id,
FROM 
    search s
LEFT JOIN 
    region r ON s.region_id = r.id
LEFT JOIN ( SELECT search_id, region_id
            FROM ( SELECT s.id as search_id, x.id as region_id, 
                     acos(sin(radians(t.latitude))
                     * sin(radians(x.latitude))
                     + cos(radians(t.latitude))
                     * cos(radians(x.latitude))
                     * cos(radians(x.longitude) - radians(t.longitude))
                     ) * 3959 as calc,
                     ROW_NUMBER() OVER (partition by id1 order by calc asc) as rn
                   FROM region AS x
                   CROSS JOIN search t 
                   WHERE calc < :dis
                     AND type IN (1)
                     AND t.user IS NOT NULL)
            WHERE rn = 1) c 
ON c.search_id = s.id
WHERE 
    s.user IS NOT NULL
ORDER BY
    s.date_created DESC;

I can't test this so hopefully this change gives you a place to start from.

Note the CROSS JOIN. This is a slightly less expensive replacement for the correlated subquery. Either way you are calculating a distance(?) for every row combination between region and search and then finding the smallest value.