The users to submit an interval of starting and ending pages that he/she read in a specific book. Note that the user can submit multiple intervals for the same book. I need the query announce the most recommended five books in the system, which are picked based on how many unique pages have been read for all the users that submitted their intervals in the first operation (sorted by books with the most read pages to books with the least read pages).
The book_user table is the pivot table which I need the query on so how to get the result below for the inserted records which are:
Reading intervals:
User 1 read from page 10 to page 30 in Book 1
User 2 read from page 2 to page 25 in Book 1
User 1 read from page 40 to page 50 in Book 2
User 3 read from page 1 to page 10 in Book 2
The most read books results:
Book 1 -> 28 pages
Book 2 -> 20 pages
I try this query:
select 'book_id',books.name as book_name,SUM(end_page - start_page) AS num_of_read_pages FROM book_user JOIN books ON books.id=book_user.book_id GROUP BY book_id ORDER BY num_of_read_pages DESC;
But it's not get the unique page for overlap intervals
And when I ask chatgpt it gives me this query which is recursive cte but it not work it just looping
WITH RECURSIVE cte AS (
SELECT book_id, MIN(start_page) AS start_page, MAX(end_page) AS end_page
FROM book_user
GROUP BY book_id, start_page
UNION ALL
SELECT cte.book_id, cte.start_page, cte.end_page
FROM cte
JOIN book_user ON cte.book_id = book_user.book_id AND cte.start_page <= book_user.start_page AND cte.end_page >= book_user.end_page
)
SELECT book_id, SUM(end_page - start_page + 1) AS total_pages
FROM cte
GROUP BY book_id
ORDER BY total_pages DESC;
See example.
with recursive t as(
-- join ranges with same start_page
-- and row_number() for sequence join
select book_id,start_page,max(end_page)end_page
,row_number()over(partition by book_id order by start_page) rn
from book_user
group by book_id,start_page
)
,r as( -- recursive join
-- anchor - ranges with a "free" start_page
select 0 lvl,bu.book_id,bu.start_page,bu.end_page,bu.rn
from t bu
where not exists(select 1 from t bu2
where bu2.book_id=bu.book_id and bu2.rn<bu.rn
and bu.start_page between bu2.start_page and bu2.end_page)
union all
select lvl+1,r.book_id,r.start_page,t.end_page,t.rn
from r inner join t on t.book_id=r.book_id and t.rn>r.rn
and r.end_page between t.start_page and r.end_page
)
select book_id,sum(end_page-start_page+1) total_pages
from ( -- again, we group segments with the same start_page and different end_page
select book_id,start_page,max(end_page) end_page
from r
group by book_id,start_page
) gr
group by book_id
Details here demo
Output
book_id | total_pages | path |
---|---|---|
1 | 38 | 2-2:25,26,2:25,26:31,33-33:40 |
2 | 25 | 1-1:10,1:10:11,1:10:11:14,40-40:50 |
3 | 31 | 1-1:10,20-20:40 |
Test data
create table books(id int,book_name varchar(20));
insert into books values(1,'Book 1'),(2,'Book 2'),(3,'Book 3');
create table users(id int,user_name varchar(20));
insert into users values(10,'User 10'),(20,'User 20'),(30,'User 40'),(30,'User 40');
create table book_user(user_id int,book_id int,start_page int,end_page int);
insert into book_user values
(10,1, 10,30)
,(20,1, 2,25)
,(30,1, 2,26)
,(30,1, 10,31)
,(40,1, 33,40)
,(10,2, 40,50)
,(30,2, 1,10)
,(40,2, 10,11)
,(20,2, 11,14)
,(10,3, 1,10)
,(20,3, 20,40)
;