sqlmysql

MySQL query to retrieve a hierarchical structure while limiting each level of child comments to a maximum of 5 per parent


I have a comments table with id, parent_id, content, created_at, and other fields. I want to be able to select the root comments first (i.e., comments where the parent_id IS NULL), and limit them to the 5 most recent ones.Then, for each selected root comment, I want to limit the number of second-level child comments to 5. And so on for all deeper levels. The issue I'm facing is how to limit the child comments at every level of the recursion without scanning the entire table. So far, I have only managed to do that for the top-level comments using a workaround. How can I add to that recursive query so that it respects the 5-child-per-parent rule during recursion without scanning the entire table and filtering afterward?

The table schema and source data look like this:

-- create
CREATE TABLE comments (
    id         INT NOT NULL PRIMARY KEY,
    user_id    BIGINT NOT NULL,
    post_id    BIGINT NOT NULL,
    parent_id  INT DEFAULT NULL,
    content    VARCHAR(10000) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT fk_comments_parent FOREIGN KEY (parent_id) 
      REFERENCES comments(id) ON DELETE CASCADE ON UPDATE CASCADE
);

-- Root Comments (Level 1)
INSERT INTO comments (id, user_id, post_id, parent_id, content, created_at) VALUES
(1, 101, 1, NULL, 'Root Comment 1', '2025-04-10 10:00:00'),
(2, 102, 1, NULL, 'Root Comment 2', '2025-04-10 10:05:00'),
(3, 103, 1, NULL, 'Root Comment 3', '2025-04-10 10:10:00'),
(4, 104, 1, NULL, 'Root Comment 4', '2025-04-10 10:15:00'),
(5, 105, 1, NULL, 'Root Comment 5', '2025-04-10 10:20:00'),
(18, 105, 1, NULL, 'Root Comment 6', '2025-04-11 10:20:00');

-- Second-level Comments (Level 2) for Root Comments 1, 2, and 3
INSERT INTO comments (id, user_id, post_id, parent_id, content, created_at) VALUES
(6, 106, 1, 1, 'Second-level Comment 1 (Child of Root 1)', '2025-04-10 10:25:00'),
(7, 107, 1, 1, 'Second-level Comment 2 (Child of Root 1)', '2025-04-10 10:30:00'),
(8, 108, 1, 2, 'Second-level Comment 3 (Child of Root 2)', '2025-04-10 10:35:00'),
(9, 109, 1, 3, 'Second-level Comment 4 (Child of Root 3)', '2025-04-10 10:40:00'),
(10, 110, 1, 3, 'Second-level Comment 5 (Child of Root 3)', '2025-04-10 10:45:00');

-- Third-level Comments (Level 3) for second-level comments
INSERT INTO comments (id, user_id, post_id, parent_id, content, created_at) VALUES
(11, 111, 1, 6, 'Third-level Comment 1 (Child of Second-level 1)', '2025-04-10 10:50:00'),
(12, 112, 1, 6, 'Third-level Comment 2 (Child of Second-level 1)', '2025-04-10 10:55:00'),
(13, 113, 1, 7, 'Third-level Comment 3 (Child of Second-level 2)', '2025-04-10 11:00:00'),
(14, 114, 1, 8, 'Third-level Comment 4 (Child of Second-level 3)', '2025-04-10 11:05:00'),
(15, 115, 1, 9, 'Third-level Comment 5 (Child of Second-level 4)', '2025-04-10 11:10:00');

-- Fourth-level Comments (Level 4) for third-level comments
INSERT INTO comments (id, user_id, post_id, parent_id, content, created_at) VALUES
(16, 116, 1, 11, 'Fourth-level Comment 1 (Child of Third-level 1)', '2025-04-10 11:15:00'),
(17, 117, 1, 12, 'Fourth-level Comment 2 (Child of Third-level 2)', '2025-04-10 11:20:00');

And the query I have written, which only fetches the 5 most recent root comments and all their children:

WITH RECURSIVE comment_tree AS (
    -- Base case: Selecting the root comments
    SELECT id, user_id, parent_id, content, created_at, 1 AS level
    FROM comments
    WHERE post_id = 1 AND parent_id IS NULL
    AND id IN (SELECT * FROM (
        SELECT id FROM comments 
        WHERE post_id = 1 AND parent_id IS NULL
        ORDER BY created_at DESC
        LIMIT 5)temp_tab)
    
    UNION ALL

    -- Recursive case: Selecting child comments
    SELECT c.id, c.user_id, c.parent_id, c.content, c.created_at, ct.level + 1 AS level
    FROM comments c
    JOIN comment_tree ct ON c.parent_id = ct.id
    -- Limiting the recursion to some level
    WHERE ct.level < 4
)

SELECT * FROM comment_tree
ORDER BY created_at;

Table of the desired result (irrelevant columns have been omitted, every parent_id has at most 5 children):

id parent_id level created_at
1 NULL 1 2024-04-10 10:00:00
2 NULL 1 2024-04-10 09:58:00
3 NULL 1 2024-04-10 09:55:00
4 NULL 1 2024-04-10 09:54:00
5 NULL 1 2024-04-10 09:53:00
6 NULL 1 2024-04-10 09:52:00
11 1 2 2024-04-10 10:01:00
12 1 2 2024-04-10 10:00:30
13 1 2 2024-04-10 09:59:00
14 1 2 2024-04-10 09:58:30
15 1 2 2024-04-10 09:57:00
21 11 3 2024-04-10 10:02:00
22 11 3 2024-04-10 10:01:30
23 11 3 2024-04-10 10:01:15
24 11 3 2024-04-10 10:01:10
25 11 3 2024-04-10 10:01:05
31 12 3 2024-04-10 10:00:40

Solution

  • Before recursive query, set row_number() for childs in every level (every parent_id).
    Then, in anchor and recursive part of query use limitation by rn<N.

    There for example, I use created_at for numbering. And limited anchor part with rn<5, recursive part rn<2.

    with recursive 
      ordComments as(
      select *
        ,row_number()over(partition by coalesce(parent_id,0) order by created_at) rn
      from comments
    )
    ,r as(
      select 0 as lvl,id as root, t.* 
      from ordComments t
      where parent_id is null and rn<5
      union all
      select lvl+1 as lvl,r.root,t.* 
      from  r inner join ordComments t on t.parent_id=r.id 
         and t.rn<2  -- count of childs <2
    )
    select * 
    from r
    order by root,lvl
    
    lvl root id user_id post_id parent_id content created_at rn
    0 1 1 101 1 null Root Comment 1 2025-04-10 10:00:00 1
    1 1 6 106 1 1 Second-level Comment 1 (Child of Root 1) 2025-04-10 10:25:00 1
    2 1 11 111 1 6 Third-level Comment 1 (Child of Second-level 1) 2025-04-10 10:50:00 1
    3 1 16 116 1 11 Fourth-level Comment 1 (Child of Third-level 1) 2025-04-10 11:15:00 1
    0 2 2 102 1 null Root Comment 2 2025-04-10 10:05:00 2
    1 2 8 108 1 2 Second-level Comment 3 (Child of Root 2) 2025-04-10 10:35:00 1
    2 2 14 114 1 8 Third-level Comment 4 (Child of Second-level 3) 2025-04-10 11:05:00 1
    0 3 3 103 1 null Root Comment 3 2025-04-10 10:10:00 3
    1 3 9 109 1 3 Second-level Comment 4 (Child of Root 3) 2025-04-10 10:40:00 1
    2 3 15 115 1 9 Third-level Comment 5 (Child of Second-level 4) 2025-04-10 11:10:00 1
    0 4 4 104 1 null Root Comment 4 2025-04-10 10:15:00 4

    fiddle

    Test data

    id user_id post_id parent_id content created_at
    1 101 1 null Root Comment 1 2025-04-10 10:00:00
    2 102 1 null Root Comment 2 2025-04-10 10:05:00
    3 103 1 null Root Comment 3 2025-04-10 10:10:00
    4 104 1 null Root Comment 4 2025-04-10 10:15:00
    5 105 1 null Root Comment 5 2025-04-10 10:20:00
    6 106 1 1 Second-level Comment 1 (Child of Root 1) 2025-04-10 10:25:00
    7 107 1 1 Second-level Comment 2 (Child of Root 1) 2025-04-10 10:30:00
    8 108 1 2 Second-level Comment 3 (Child of Root 2) 2025-04-10 10:35:00
    9 109 1 3 Second-level Comment 4 (Child of Root 3) 2025-04-10 10:40:00
    10 110 1 3 Second-level Comment 5 (Child of Root 3) 2025-04-10 10:45:00
    11 111 1 6 Third-level Comment 1 (Child of Second-level 1) 2025-04-10 10:50:00
    12 112 1 6 Third-level Comment 2 (Child of Second-level 1) 2025-04-10 10:55:00
    13 113 1 7 Third-level Comment 3 (Child of Second-level 2) 2025-04-10 11:00:00
    14 114 1 8 Third-level Comment 4 (Child of Second-level 3) 2025-04-10 11:05:00
    15 115 1 9 Third-level Comment 5 (Child of Second-level 4) 2025-04-10 11:10:00
    16 116 1 11 Fourth-level Comment 1 (Child of Third-level 1) 2025-04-10 11:15:00
    17 117 1 12 Fourth-level Comment 2 (Child of Third-level 2) 2025-04-10 11:20:00
    18 105 1 null Root Comment 6 2025-04-11 10:20:00