sqldb2

Keeping the entire set of duplicates with the highest year (per name)


I have this table in SQL:

CREATE TABLE books_returned 
(
    name VARCHAR(50),
    book_id VARCHAR(50),
    year_book_returned INT
);

INSERT INTO books_returned (name, book_id, year_book_returned) 
VALUES
('john', 'julius ceasar', 2010),
('john', 'julius caesar', 2010),
('john', 'hamlet', 2010),
('john', 'hamlet', 2010),
('john', 'othello', 2009),
('john', 'othello', 2009),
('kevin', 'macbeth', 2015),
('kevin', 'tempest', 2020),
('david', 'romeojuliet', 2010),
('david', 'romeojuliet', 2010),
('david', 'romeojuliet', 2010),
('david', 'king lear', 2005);

 name       book_id           year_book_returned
 ------------------------------------------------
  john     julius ceasar      2010
  john     julius caesar      2010
  john     hamlet             2010
  john     hamlet             2010
  john     othello            2009
  john     othello            2009
 kevin     macbeth            2015
 kevin     tempest            2020
 david     romeojuliet        2010
 david     romeojuliet        2010
 david     romeojuliet        2010
 david     king lear          2005

For each name, I want to keep all columns and all rows for the book with the largest year.

Since there are ties, I do not care which set of duplicates are picked. Both of the following options are OK for me:

Output #1: acceptable

 name      book_id      year_book_returned
 ------------------------------------------
 john      hamlet               2010
 john      hamlet               2010
 evin      tempest              2020
 david     romeojuliet          2010
 david     romeojuliet          2010
 david     romeojuliet          2010

Output #2: acceptable

 name       book_id year_book_returned
 --------------------------------------------------
  john julius ceasar               2010
  john julius caesar               2010
 kevin       tempest               2020
 david   romeojuliet               2010
 david   romeojuliet               2010
 david   romeojuliet               2010

I tried this query that first finds the max year in a subquery and joins it back to the original table:

SELECT br.*
FROM books_returned br
JOIN (
    SELECT name, MAX(year_book_returned) as max_year
    FROM books_returned
    GROUP BY name
) as subquery
ON br.name = subquery.name AND br.year_book_returned = subquery.max_year;

But this is incorrect: this shows all books for John :

 name       book_id year_book_returned
  john julius ceasar               2010
  john julius caesar               2010
  john        hamlet               2010
  john        hamlet               2010
 kevin       tempest               2020
 david   romeojuliet               2010
 david   romeojuliet               2010
 david   romeojuliet               2010

Can someone please show me how to do this correctly? I was thinking of using the partition row_number order by random() statements to create a rank variable and then select the lowest rank?


Solution

  • One option is to use Max() Over() analytic function for year (used as join condition in ON clause) and Row_Number() Over() analytic functions twice (per name and book_id both ordered by year descending) in a subquery and then filter the result using those row numbers.

    --      S Q L :
    SELECT    br.name, br.book_id, br.year_book_returned
    FROM      books_returned br
    INNER JOIN ( Select   name, book_id, 
                          Max(year_book_returned) Over(Partition By name, book_id) as max_yr,
                          Row_Number() Over(Partition By name 
                                            Order By year_book_returned desc) as rn, 
                          Row_Number() Over(Partition By book_id 
                                            Order By year_book_returned desc) as rn_book_id
                 From     books_returned
               ) yr ON( yr.name = br.name And 
                        yr.book_id = br.book_id and
                        yr.max_yr = br.year_book_returned) 
    WHERE      yr.rn = 1 and yr.rn_book_id = 1;
    
    /*    R e s u l t :
    NAME    BOOK_ID        YEAR_BOOK_RETURNED
    ------- -------------- ------------------
    john    hamlet                       2010
    john    hamlet                       2010
    kevin   tempest                      2020
    david   romeojuliet                  2010
    david   romeojuliet                  2010
    david   romeojuliet                  2010    */