I have a comments table with id, parent_id, content, created_at, and other fields. I want to be able to select the root comments first (i.e., comments where the parent_id IS NULL), and limit them to the 5 most recent ones.Then, for each selected root comment, I want to limit the number of second-level child comments to 5. And so on for all deeper levels. The issue I'm facing is how to limit the child comments at every level of the recursion without scanning the entire table. So far, I have only managed to do that for the top-level comments using a workaround. How can I add to that recursive query so that it respects the 5-child-per-parent rule during recursion without scanning the entire table and filtering afterward?
The table schema and source data look like this:
-- create
CREATE TABLE comments (
id INT NOT NULL PRIMARY KEY,
user_id BIGINT NOT NULL,
post_id BIGINT NOT NULL,
parent_id INT DEFAULT NULL,
content VARCHAR(10000) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_comments_parent FOREIGN KEY (parent_id)
REFERENCES comments(id) ON DELETE CASCADE ON UPDATE CASCADE
);
-- Root Comments (Level 1)
INSERT INTO comments (id, user_id, post_id, parent_id, content, created_at) VALUES
(1, 101, 1, NULL, 'Root Comment 1', '2025-04-10 10:00:00'),
(2, 102, 1, NULL, 'Root Comment 2', '2025-04-10 10:05:00'),
(3, 103, 1, NULL, 'Root Comment 3', '2025-04-10 10:10:00'),
(4, 104, 1, NULL, 'Root Comment 4', '2025-04-10 10:15:00'),
(5, 105, 1, NULL, 'Root Comment 5', '2025-04-10 10:20:00'),
(18, 105, 1, NULL, 'Root Comment 6', '2025-04-11 10:20:00');
-- Second-level Comments (Level 2) for Root Comments 1, 2, and 3
INSERT INTO comments (id, user_id, post_id, parent_id, content, created_at) VALUES
(6, 106, 1, 1, 'Second-level Comment 1 (Child of Root 1)', '2025-04-10 10:25:00'),
(7, 107, 1, 1, 'Second-level Comment 2 (Child of Root 1)', '2025-04-10 10:30:00'),
(8, 108, 1, 2, 'Second-level Comment 3 (Child of Root 2)', '2025-04-10 10:35:00'),
(9, 109, 1, 3, 'Second-level Comment 4 (Child of Root 3)', '2025-04-10 10:40:00'),
(10, 110, 1, 3, 'Second-level Comment 5 (Child of Root 3)', '2025-04-10 10:45:00');
-- Third-level Comments (Level 3) for second-level comments
INSERT INTO comments (id, user_id, post_id, parent_id, content, created_at) VALUES
(11, 111, 1, 6, 'Third-level Comment 1 (Child of Second-level 1)', '2025-04-10 10:50:00'),
(12, 112, 1, 6, 'Third-level Comment 2 (Child of Second-level 1)', '2025-04-10 10:55:00'),
(13, 113, 1, 7, 'Third-level Comment 3 (Child of Second-level 2)', '2025-04-10 11:00:00'),
(14, 114, 1, 8, 'Third-level Comment 4 (Child of Second-level 3)', '2025-04-10 11:05:00'),
(15, 115, 1, 9, 'Third-level Comment 5 (Child of Second-level 4)', '2025-04-10 11:10:00');
-- Fourth-level Comments (Level 4) for third-level comments
INSERT INTO comments (id, user_id, post_id, parent_id, content, created_at) VALUES
(16, 116, 1, 11, 'Fourth-level Comment 1 (Child of Third-level 1)', '2025-04-10 11:15:00'),
(17, 117, 1, 12, 'Fourth-level Comment 2 (Child of Third-level 2)', '2025-04-10 11:20:00');
And the query I have written, which only fetches the 5 most recent root comments and all their children:
WITH RECURSIVE comment_tree AS (
-- Base case: Selecting the root comments
SELECT id, user_id, parent_id, content, created_at, 1 AS level
FROM comments
WHERE post_id = 1 AND parent_id IS NULL
AND id IN (SELECT * FROM (
SELECT id FROM comments
WHERE post_id = 1 AND parent_id IS NULL
ORDER BY created_at DESC
LIMIT 5)temp_tab)
UNION ALL
-- Recursive case: Selecting child comments
SELECT c.id, c.user_id, c.parent_id, c.content, c.created_at, ct.level + 1 AS level
FROM comments c
JOIN comment_tree ct ON c.parent_id = ct.id
-- Limiting the recursion to some level
WHERE ct.level < 4
)
SELECT * FROM comment_tree
ORDER BY created_at;
Table of the desired result (irrelevant columns have been omitted, every parent_id has at most 5 children):
id | parent_id | level | created_at |
---|---|---|---|
1 | NULL | 1 | 2024-04-10 10:00:00 |
2 | NULL | 1 | 2024-04-10 09:58:00 |
3 | NULL | 1 | 2024-04-10 09:55:00 |
4 | NULL | 1 | 2024-04-10 09:54:00 |
5 | NULL | 1 | 2024-04-10 09:53:00 |
6 | NULL | 1 | 2024-04-10 09:52:00 |
11 | 1 | 2 | 2024-04-10 10:01:00 |
12 | 1 | 2 | 2024-04-10 10:00:30 |
13 | 1 | 2 | 2024-04-10 09:59:00 |
14 | 1 | 2 | 2024-04-10 09:58:30 |
15 | 1 | 2 | 2024-04-10 09:57:00 |
21 | 11 | 3 | 2024-04-10 10:02:00 |
22 | 11 | 3 | 2024-04-10 10:01:30 |
23 | 11 | 3 | 2024-04-10 10:01:15 |
24 | 11 | 3 | 2024-04-10 10:01:10 |
25 | 11 | 3 | 2024-04-10 10:01:05 |
31 | 12 | 3 | 2024-04-10 10:00:40 |
Before recursive query, set row_number() for childs in every level (every parent_id).
Then, in anchor and recursive part of query use limitation by rn<N.
There for example, I use created_at for numbering. And limited anchor part with rn<5, recursive part rn<2.
with recursive
ordComments as(
select *
,row_number()over(partition by coalesce(parent_id,0) order by created_at) rn
from comments
)
,r as(
select 0 as lvl,id as root, t.*
from ordComments t
where parent_id is null and rn<5
union all
select lvl+1 as lvl,r.root,t.*
from r inner join ordComments t on t.parent_id=r.id
and t.rn<2 -- count of childs <2
)
select *
from r
order by root,lvl
lvl | root | id | user_id | post_id | parent_id | content | created_at | rn |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 101 | 1 | null | Root Comment 1 | 2025-04-10 10:00:00 | 1 |
1 | 1 | 6 | 106 | 1 | 1 | Second-level Comment 1 (Child of Root 1) | 2025-04-10 10:25:00 | 1 |
2 | 1 | 11 | 111 | 1 | 6 | Third-level Comment 1 (Child of Second-level 1) | 2025-04-10 10:50:00 | 1 |
3 | 1 | 16 | 116 | 1 | 11 | Fourth-level Comment 1 (Child of Third-level 1) | 2025-04-10 11:15:00 | 1 |
0 | 2 | 2 | 102 | 1 | null | Root Comment 2 | 2025-04-10 10:05:00 | 2 |
1 | 2 | 8 | 108 | 1 | 2 | Second-level Comment 3 (Child of Root 2) | 2025-04-10 10:35:00 | 1 |
2 | 2 | 14 | 114 | 1 | 8 | Third-level Comment 4 (Child of Second-level 3) | 2025-04-10 11:05:00 | 1 |
0 | 3 | 3 | 103 | 1 | null | Root Comment 3 | 2025-04-10 10:10:00 | 3 |
1 | 3 | 9 | 109 | 1 | 3 | Second-level Comment 4 (Child of Root 3) | 2025-04-10 10:40:00 | 1 |
2 | 3 | 15 | 115 | 1 | 9 | Third-level Comment 5 (Child of Second-level 4) | 2025-04-10 11:10:00 | 1 |
0 | 4 | 4 | 104 | 1 | null | Root Comment 4 | 2025-04-10 10:15:00 | 4 |
Test data
id | user_id | post_id | parent_id | content | created_at |
---|---|---|---|---|---|
1 | 101 | 1 | null | Root Comment 1 | 2025-04-10 10:00:00 |
2 | 102 | 1 | null | Root Comment 2 | 2025-04-10 10:05:00 |
3 | 103 | 1 | null | Root Comment 3 | 2025-04-10 10:10:00 |
4 | 104 | 1 | null | Root Comment 4 | 2025-04-10 10:15:00 |
5 | 105 | 1 | null | Root Comment 5 | 2025-04-10 10:20:00 |
6 | 106 | 1 | 1 | Second-level Comment 1 (Child of Root 1) | 2025-04-10 10:25:00 |
7 | 107 | 1 | 1 | Second-level Comment 2 (Child of Root 1) | 2025-04-10 10:30:00 |
8 | 108 | 1 | 2 | Second-level Comment 3 (Child of Root 2) | 2025-04-10 10:35:00 |
9 | 109 | 1 | 3 | Second-level Comment 4 (Child of Root 3) | 2025-04-10 10:40:00 |
10 | 110 | 1 | 3 | Second-level Comment 5 (Child of Root 3) | 2025-04-10 10:45:00 |
11 | 111 | 1 | 6 | Third-level Comment 1 (Child of Second-level 1) | 2025-04-10 10:50:00 |
12 | 112 | 1 | 6 | Third-level Comment 2 (Child of Second-level 1) | 2025-04-10 10:55:00 |
13 | 113 | 1 | 7 | Third-level Comment 3 (Child of Second-level 2) | 2025-04-10 11:00:00 |
14 | 114 | 1 | 8 | Third-level Comment 4 (Child of Second-level 3) | 2025-04-10 11:05:00 |
15 | 115 | 1 | 9 | Third-level Comment 5 (Child of Second-level 4) | 2025-04-10 11:10:00 |
16 | 116 | 1 | 11 | Fourth-level Comment 1 (Child of Third-level 1) | 2025-04-10 11:15:00 |
17 | 117 | 1 | 12 | Fourth-level Comment 2 (Child of Third-level 2) | 2025-04-10 11:20:00 |
18 | 105 | 1 | null | Root Comment 6 | 2025-04-11 10:20:00 |