In an app I'm working in, I've got a Comments table in my Postgres DB that which can be simplified to something like this:
+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id | created_at |
+----+-----------+-----------------+---------------------+
| 1 | NULL | 1 | 2023-08-01 12:00:00 |
| 2 | NULL | 2 | 2023-08-01 12:00:01 |
| 3 | NULL | 2 | 2023-08-01 12:00:02 |
| 4 | 3 | 1 | 2023-08-01 12:00:03 |
| 5 | 2 | 1 | 2023-08-01 12:00:04 |
| 6 | 1 | 1 | 2023-08-01 12:00:05 |
| 7 | 2 | 2 | 2023-08-01 12:00:06 |
| 8 | 2 | 1 | 2023-08-01 12:00:07 |
+----+-----------+-----------------+---------------------+
The parent_id
column, when populated, refers to another row in the comment table. This sets up a parent/child relationship. In other words, a comment can have other comments that are replies. Any parent comment can have many replies. For the purposes of this app, the comment parent/child relationship is only one level deep.
There's also a Group Members table that is referenced by the comments table via a foreign key. This represents each comment's author, and can be reduced to this:
+----+---------------+
| id | name |
+----+---------------+
| 1 | Johnny Tables |
| 2 | Susan Select |
+----+---------------+
For any given group member, I want to find the latest distinct root comments to which they've replied, in the order they've replied to them. For instance, Johnny Tables
(id 1) would get:
+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id | created_at |
+----+-----------+-----------------+---------------------+
| 2 | NULL | 2 | 2023-08-01 12:00:01 |
| 1 | NULL | 1 | 2023-08-01 12:00:00 |
| 3 | NULL | 2 | 2023-08-01 12:00:02 |
+----+-----------+-----------------+---------------------+
And Susan Select
(id 2) would get:
+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id | created_at |
+----+-----------+-----------------+---------------------+
| 2 | NULL | 2 | 2023-08-01 12:00:01 |
+----+-----------+-----------------+---------------------+
Here is a fiddle that sets up the table structure with sample data.
I've tried various incantations with subqueries and DISTINCT
and ORDER BY
, but I'm missing something. With this query, for example, I seem to get the correct replies, but they don't actually get ordered by the created_at
date. They're ordered by ascending parent_id
s
SELECT DISTINCT ON (parent_id)
parent_id,
created_at
FROM
comments
WHERE
comments.group_member_id = 1
AND comments.parent_id IS NOT NULL
ORDER BY
comments.parent_id,
comments.created_at DESC
And once I have that, I don't quite know how to leverage that to get the root comments and keep them in the chronological order of the replies. What am I missing?
Note: While I'm asking for general SQL guidance here, since the app I'm working on is a Rails app, an Active Record or Arel interpretation would also be welcome.
EDIT: Added sample tables and expected results.
Your fiddle data looks to have the parent_id
and group_member_id
columns transposed.
Since there is only a single-level in the root/child comment hierarchy, you have overcomplicated your query.
This is a simple group by
with a join back into the root comment:
select gm.name, r.id, r.group_member_id, r.parent_id, r.created_at,
max(c.created_at) as last_reply_at
from comments c
join group_members gm on gm.id = c.group_member_id
join comments r on r.id = c.parent_id
group by gm.name, r.id, r.group_member_id, r.parent_id, r.created_at
order by gm.name, max(c.created_at) desc;
Updated fiddle