sqlpostgresqlduplicatesgreatest-n-per-grouppostgresql-9.0

SQL two criteria from one group-by


I have a table with some "functionally duplicate" records - different IDs, but the 4 columns of "user data" (of even more columns) are identical. I've got a query working that will select all records that have such duplicates.

Now I want to select, from each group of duplicates, first any of them that have column A not null - and I've verified from the data that there are at most 1 such rows per group - and if there are none in this particular group, then the minimum of column ID.

How do I select that? I can't exactly use a non-aggregate in the THEN of a CASE and an aggregate in the ELSE. E.g. this doesn't work:

SELECT CASE
           WHEN d.A IS NULL THEN d.ID
           ELSE MIN(d.ID) END,
       d.B,
       d.C,
       d.E,
       d.F
FROM TABLE T
JOIN (my duplicate query here) D ON T.B=D.B
AND T.C=D.C
AND T.E=D.E
AND T.F=D.F
GROUP BY T.B,
         T.C,
         T.E,
         T.F

Error being:

column A must appear in the GROUP BY clause or be used in an aggregate function.


Solution

  • This can be radically simpler:

    SELECT DISTINCT ON (b, c, e, f)
           b, c, e, f, id   -- add more columns freely
    FROM   (<duplicate query here>) sub
    ORDER  BY b, c, e, f, (a IS NOT NULL), id