sqldb2

Calculating average number of days between two events


I have this table in SQL (called "myt") about library books that are borrowed by different names:

CREATE TABLE myt (
    name VARCHAR(10),
    date_library_book_borrowed DATE
);


INSERT INTO myt (name, date_library_book_borrowed) VALUES
('red', '2010-01-01'),
('red', '2010-05-05'),
('red', '2011-01-01'),
('blue', '2015-01-01'),
('blue', '2015-09-01'),
('green', '2020-01-01'),
('green', '2021-01-01'),
('yellow', '2012-01-01');

Based on this table, I am trying to answer the following question:

I tried to do this using LEAD and LAG functions:

WITH RankedBorrowings AS (
  SELECT
    name,
    date_library_book_borrowed,
    ROW_NUMBER() OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS borrowing_rank
  FROM
    myt
),
BorrowingPairs AS (
  SELECT
    name,
    borrowing_rank AS from_rank,
    LEAD(borrowing_rank) OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS to_rank,
    date_library_book_borrowed AS current_borrowing_date,
    LEAD(date_library_book_borrowed) OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS next_borrowing_date
  FROM
    RankedBorrowings
)
SELECT
  from_rank,
  to_rank,
  AVG(next_borrowing_date - current_borrowing_date) AS avg_days_between_borrowings
FROM
  BorrowingPairs
WHERE
  next_borrowing_date IS NOT NULL
GROUP BY
  from_rank,
  to_rank
ORDER BY
  from_rank,
  to_rank;

Can someone please tell me if this is the correct way to approach this problem? Or is it better to join the table to itself and then perform the same calculations?

Thanks!


Solution

  • Your approach is correct. You don't need to join the table to itself, it's not efficient.

    But your solution is overcomplicated. to_rank always equals to from_rank + 1 So, you can calculate all analytical functions in one step:

    WITH BorrowingPairs AS (
      SELECT
        name,
        ROW_NUMBER() OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS from_rank,
        date_library_book_borrowed AS current_borrowing_date,
        LEAD(date_library_book_borrowed) OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS next_borrowing_date
      FROM
        myt
    )
    SELECT
      from_rank,
      from_rank + 1 AS to_rank,
      AVG(next_borrowing_date - current_borrowing_date) AS avg_days_between_borrowings
    FROM
      BorrowingPairs
    WHERE
      next_borrowing_date IS NOT NULL
    GROUP BY
      from_rank
    ORDER BY
      from_rank