I have this table of fruits:
name1 name2 year1 year2
apple pear 2010 2001
apple pear 2011 2002
pear apple 2010 2003
pear apple 2011 2004
apple null 2009 2005
pear orange 2008 2006
apple pear 2010 2007
apple grape 2010 2008
Problem: In each year1, I only want names to appear once ... e.g. apple pear 2010 is the same as pear apple 2010 . That is, when there are duplicates... I only want to keep the first occurrence of each duplicate (e.g. first occurence)
I think the correct output should look like this:
name1 name2 year1 year2
apple pear 2010 2001
apple pear 2011 2002
apple null 2009 2005
pear orange 2008 2006
apple grape 2010 2008
I tried the following code:
SELECT
name1,
name2,
year1,
year2,
ROW_NUMBER() OVER (PARTITION BY name1, name2, year1 ORDER BY year2) AS rn
FROM
fruits
)
SELECT
name1,
name2,
year1,
year2
FROM
ranked_names
WHERE
rn = 1;
But this is not producing the correct results:
name1 name2 year1 year2
apple grape 2010 2008
apple null 2009 2005
apple pear 2010 2001
apple pear 2011 2002
pear apple 2010 2003
pear apple 2011 2004
pear orange 2008 2006
E.g. (apple pear 2010 2001) and ( pear apple 2010 2003) appear twice even though only one of them should appear.
Can someone please show me how to correct this? Is it possible to do this without GREATEST and LEAST functions?
Thanks!
Alternative Idea?
WITH ranked_names AS (
SELECT
name1,
name2,
year1,
year2,
ROW_NUMBER() OVER (PARTITION BY year1,
CASE WHEN name1 < name2 THEN name1 ELSE name2 END,
CASE WHEN name1 < name2 THEN name2 ELSE name1 END
ORDER BY year2) AS rn
FROM
fruits
)
SELECT
name1,
name2,
year1,
year2
FROM
ranked_names
WHERE
rn = 1;
USING LEAST and GREATEST - Method 1:
WITH ranked_names AS (
SELECT
name1,
name2,
year1,
year2,
ROW_NUMBER() OVER (PARTITION BY LEAST(name1, name2), GREATEST(name1, name2), year1 ORDER BY year2) AS rn
FROM
fruits
)
SELECT
name1,
name2,
year1,
year2
FROM
ranked_names
WHERE
rn = 1;
Demo: https://dbfiddle.uk/dQk17bSS
Go throught the LEAST and GREATES Cncept in SQL.
USING CASE- Method 2:
WITH ranked_names AS (
SELECT
name1,
name2,
year1,
year2,
ROW_NUMBER() OVER (PARTITION BY
CASE WHEN name1 < name2 THEN name1 ELSE name2 END,
CASE WHEN name1 < name2 THEN name2 ELSE name1 END,
year1
ORDER BY year2) AS rn
FROM
fruits
)
SELECT
name1,
name2,
year1,
year2
FROM
ranked_names
WHERE
rn = 1;