sqlruby-on-railsdatabasepostgresqlself-referencing-table

How to query a self-referential comments table to find the comments with replies, ordered by the latest replies?


In an app I'm working in, I've got a Comments table in my Postgres DB that which can be simplified to something like this:

+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id |     created_at      |
+----+-----------+-----------------+---------------------+
| 1  | NULL      | 1               | 2023-08-01 12:00:00 |
| 2  | NULL      | 2               | 2023-08-01 12:00:01 |
| 3  | NULL      | 2               | 2023-08-01 12:00:02 |
| 4  | 3         | 1               | 2023-08-01 12:00:03 |
| 5  | 2         | 1               | 2023-08-01 12:00:04 |
| 6  | 1         | 1               | 2023-08-01 12:00:05 |
| 7  | 2         | 2               | 2023-08-01 12:00:06 |
| 8  | 2         | 1               | 2023-08-01 12:00:07 |
+----+-----------+-----------------+---------------------+

The parent_id column, when populated, refers to another row in the comment table. This sets up a parent/child relationship. In other words, a comment can have other comments that are replies. Any parent comment can have many replies. For the purposes of this app, the comment parent/child relationship is only one level deep.

There's also a Group Members table that is referenced by the comments table via a foreign key. This represents each comment's author, and can be reduced to this:

+----+---------------+
| id |     name      |
+----+---------------+
|  1 | Johnny Tables |
|  2 | Susan Select  |
+----+---------------+

For any given group member, I want to find the latest distinct root comments to which they've replied, in the order they've replied to them. For instance, Johnny Tables (id 1) would get:

+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id |     created_at      |
+----+-----------+-----------------+---------------------+
|  2 | NULL      |               2 | 2023-08-01 12:00:01 |
|  1 | NULL      |               1 | 2023-08-01 12:00:00 |
|  3 | NULL      |               2 | 2023-08-01 12:00:02 |
+----+-----------+-----------------+---------------------+

And Susan Select (id 2) would get:

+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id |     created_at      |
+----+-----------+-----------------+---------------------+
|  2 | NULL      |               2 | 2023-08-01 12:00:01 |
+----+-----------+-----------------+---------------------+

Here is a fiddle that sets up the table structure with sample data.

I've tried various incantations with subqueries and DISTINCT and ORDER BY, but I'm missing something. With this query, for example, I seem to get the correct replies, but they don't actually get ordered by the created_at date. They're ordered by ascending parent_ids

SELECT DISTINCT ON (parent_id)
    parent_id,
    created_at
FROM
    comments
WHERE
    comments.group_member_id = 1
    AND comments.parent_id IS NOT NULL
ORDER BY
    comments.parent_id,
    comments.created_at DESC

And once I have that, I don't quite know how to leverage that to get the root comments and keep them in the chronological order of the replies. What am I missing?

Note: While I'm asking for general SQL guidance here, since the app I'm working on is a Rails app, an Active Record or Arel interpretation would also be welcome.

EDIT: Added sample tables and expected results.


Solution

  • Your fiddle data looks to have the parent_id and group_member_id columns transposed.

    Since there is only a single-level in the root/child comment hierarchy, you have overcomplicated your query.

    This is a simple group by with a join back into the root comment:

    select gm.name, r.id, r.group_member_id, r.parent_id, r.created_at,
           max(c.created_at) as last_reply_at
      from comments c
           join group_members gm on gm.id = c.group_member_id
           join comments r on r.id = c.parent_id
     group by gm.name, r.id, r.group_member_id, r.parent_id, r.created_at
     order by gm.name, max(c.created_at) desc;
    

    Updated fiddle