sqlpostgresqlduplicatesrow

Remove rows based on duplicated values in one column


I have the below sample table and I need a second table without duplicate grouped values from column 'Values'

+----+--------+
| Id | Values |
+----+--------+
| 1  | A      |
| 2  | B      |
| 3  | B      | <--remove
| 4  | A      |
| 5  | A      | <--remove
| 6  | A      | <--remove
| 7  | B      |
| 8  | A      |
| 9  | A      | <--remove
| 11 | B      |
| 12 | B      | <--remove
| 13 | B      | <--remove
| 14 | B      | <--remove
| 15 | C      | 
+----+--------+

The result table must be:

+----+--------+
| Id | Values |
+----+--------+
| 1  | A      |
| 2  | B      |
| 4  | A      |
| 7  | B      |
| 8  | A      |
| 11 | B      |
| 15 | C      |
+----+--------+

I've been trying with GROUP BY, DISTINCT, and ROW_NUMBER() statements without success. I can't figure out how construct a working SQL query.


Solution

  • As suggested in comment, you can use a LAG function.

    SELECT id,values, LAG(values) OVER (ORDER BY id) AS previous
    FROM t
    

    This gives you all the rows along with the previous value,for example

    id  values  previous
    1   A       null
    2   B       A
    3   B       B
    

    and so on

    previous for id 1 is null as this is the first row,after this a filter can be applied where value of current row is not equal to value of previous row.Another condition is added to handle the first row previous IS NULL

    WITH lag AS (
    SELECT id,values, LAG(values) OVER (ORDER BY id) AS previous
    FROM t
    )
    SELECT id, values
    FROM lag
    WHERE values != previous OR previous IS NULL
    ORDER BY id;
    

    Output

    id values
    1 A
    2 B
    4 A
    7 B
    8 A
    11 B
    15 C