sqlpostgresqlcross-joinredundancy

reducing redundant product from SQL cross join


I was stepping through a common SQL exercise and ran into an annoyance. Question is whether you have a better solve than mine below. The initial problem was "find all the coordinate sets on a plain that have the shortest overall distance." That's straightforward. However, when you select all those rows that have the shortest distance, you get (for example):

x y x2 y2 dist globalmin
-2 0 -1 -2 2.23606797749979 2.23606797749979
-2 0 0 1 2.23606797749979 2.23606797749979
-1 -2 -2 0 2.23606797749979 2.23606797749979
0 1 -2 0 2.23606797749979 2.23606797749979

You see the problem -- I'm pulling redundant rows because an earlier cross join has mirrored the x and y the values.

The solution I found was to to distinct a least, greatest swap like so:

select
distinct
least(t1.x,t1.x2) x1, greatest(t1.x,t1.x2) x2, least(t1.y,t1.y2) y1, greatest(t1.y,t1.y2) y2
,dist,globalmin
from dists2 t1
where t1.dist=t1.globalmin
;

Result:

x1 x2 y1 y2 dist globalmin
-2 -1 -2 0 2.23606797749979 2.23606797749979
-2 0 0 1 2.23606797749979 2.23606797749979

...so that approach will fix the redundancy issue, but it feels inelegant, like there's a classic maneuver I'm missing. I welcome any advice on a better approach.

Thanks!


Solution

  • When you have 2 distinct points A and B, your query says:

    Armed with that knowledge, all we have to do is create an order between points. One case that would work:

    You can show that it is indeed an order relation (to be precise, a lexicographic order), although the way I described it makes it a strict order (no points is "before" itself).
    Also, the set of points is not just partially ordered: every pair of points can be compared. It is said to be totally ordered.

    Now that we have a way to order every point against every other point, we can work with a non-equi join.


    Quick example: the query below works on 4 points placed on the 4 corners of a square.

    WITH Points(x, y) AS (
    VALUES (0,0), (1,0), (0,1), (1,1)
    )
    SELECT p1.x as x1, p1.y as y1, p2.x as x2, p2.y as y2
    FROM Points p1
    JOIN Points p2 ON p1.x < p2.x OR (p1.x = p2.x AND p1.y < p2.y)
    ORDER BY x1, y1, x2, y2
    

    From there, it is easy to turn a point_table p1 CROSS JOIN point_table p2 into the proper point_table p1 INNER JOIN point_table p2 ON ... on the model shown above.
    However, contrary to your question title, you seem to have a single dists2 table/view.

    SELECT x1, x2, y1, y2 ,dist,globalmin
    FROM dists2
    WHERE x1 < x2 OR (x1 = x2 AND y1 < y2)
    AND dist=globalmin