sqlnetezza

Keeping One Fruit Combination by Year


I have this table of fruits:

 name1  name2 year1 year2
 apple   pear  2010  2001
 apple   pear  2011  2002
  pear  apple  2010  2003
  pear  apple  2011  2004
 apple   null  2009  2005
  pear orange  2008  2006
 apple   pear  2010  2007
 apple  grape  2010  2008

Problem: In each year1, I only want names to appear once ... e.g. apple pear 2010 is the same as pear apple 2010 . That is, when there are duplicates... I only want to keep the first occurrence of each duplicate (e.g. first occurence)

I think the correct output should look like this:

 name1  name2 year1 year2
 apple   pear  2010  2001
 apple   pear  2011  2002
 apple   null  2009  2005
  pear orange  2008  2006
 apple  grape  2010  2008

I tried the following code:

  SELECT 
    name1,
    name2,
    year1,
    year2,
    ROW_NUMBER() OVER (PARTITION BY name1, name2, year1 ORDER BY year2) AS rn
  FROM 
    fruits
)
SELECT 
  name1,
  name2,
  year1,
  year2
FROM 
  ranked_names
WHERE 
  rn = 1;

But this is not producing the correct results:

 name1  name2 year1 year2
 apple  grape  2010  2008
 apple   null  2009  2005
 apple   pear  2010  2001
 apple   pear  2011  2002
  pear  apple  2010  2003
  pear  apple  2011  2004
  pear orange  2008  2006

E.g. (apple pear 2010 2001) and ( pear apple 2010 2003) appear twice even though only one of them should appear.

Can someone please show me how to correct this? Is it possible to do this without GREATEST and LEAST functions?

Thanks!

Alternative Idea?

WITH ranked_names AS (
    SELECT 
        name1,
        name2,
        year1,
        year2,
        ROW_NUMBER() OVER (PARTITION BY year1, 
                                       CASE WHEN name1 < name2 THEN name1 ELSE name2 END, 
                                       CASE WHEN name1 < name2 THEN name2 ELSE name1 END 
                          ORDER BY year2) AS rn
    FROM 
        fruits
)
SELECT 
    name1,
    name2,
    year1,
    year2
FROM 
    ranked_names
WHERE 
    rn = 1;

Solution

  • USING LEAST and GREATEST - Method 1:

    WITH ranked_names AS (
      SELECT 
        name1,
        name2,
        year1,
        year2,
        ROW_NUMBER() OVER (PARTITION BY LEAST(name1, name2), GREATEST(name1, name2), year1 ORDER BY year2) AS rn
      FROM 
        fruits
    )
    SELECT 
      name1,
      name2,
      year1,
      year2
    FROM 
      ranked_names
    WHERE 
      rn = 1;
    

    Demo: https://dbfiddle.uk/dQk17bSS

    Go throught the LEAST and GREATES Cncept in SQL.

    USING CASE- Method 2:

    WITH ranked_names AS (
      SELECT 
        name1,
        name2,
        year1,
        year2,
        ROW_NUMBER() OVER (PARTITION BY 
                              CASE WHEN name1 < name2 THEN name1 ELSE name2 END, 
                              CASE WHEN name1 < name2 THEN name2 ELSE name1 END, 
                              year1 
                           ORDER BY year2) AS rn
      FROM 
        fruits
    )
    SELECT 
      name1,
      name2,
      year1,
      year2
    FROM 
      ranked_names
    WHERE 
      rn = 1;
    

    Demo :https://dbfiddle.uk/m8fp-TEu