I have a table "messages", that stores messages sent to people over time, regarding some items.
message_id user_id date_sent created_at
For each user, I can have multiple tuples in the table. Some of these messages are already sent, and some are not sent yet.
I'm trying to get the last created message for each user. I'm using max(created_at) and a group_by(user_id), but the associated message_id is not the one associated with the max(created_id) tuple.
Table data:
message_id | user_id | date_sent | created_at
----------------------------------------------
1 1 2021-07-01 2021-07-01
2 1 2021-07-02 2021-07-02
3 2 2021-07-01 2021-07-01
4 3 2021-07-04 2021-07-04
5 1 2021-07-22 2021-07-22
6 1 NULL 2021-07-23
7 2 NULL 2021-07-29
8 1 NULL 2021-07-29
9 3 2021-07-29 2021-07-29
My Select:
select * from messages ma right join
( SELECT max(mb.created_at), message_id
FROM `messages` mb WHERE mb.created_at <= '2021-07-24'
group by user_id)
mc on ma.message_id=mc.message_id
the result is
message_id | user_id | date_sent | created_at
----------------------------------------------
5 1 2021-07-22 2021-07-23
3 2 2021-07-01 2021-07-01
4 3 2021-07-04 2021-07-04
I don't know why but for user 1, the message_id returned is not the one associated with the tuple that has the max(created_at).
I was expecting to be: (get the tuple with the max(date_sent) of the select grouped by user_id)
message_id | user_id | date_sent | created_at
----------------------------------------------
6 1 NULL 2021-07-23
3 2 2021-07-01 2021-07-01
4 3 2021-07-04 2021-07-04
Any idea? Any help? thank you.
You're stumbling over MySQL's notorious nonstandard extension to GROUP BY. It gives you the illusion you can do things you can't. Example
SELECT max(created_at), message_id
FROM messages
GROUP BY user_id
actually means
SELECT max(created_at), ANY_VALUE(message_id)
FROM messages
GROUP BY user_id
where ANY_VALUE() means MySQL can choose any message_id it finds most convenient from among that user's messages. That's not what you want.
To solve your problem, you need first to use a subquery to find the latest created_at
date for each user_id
. Fiddle.
SELECT user_id, MAX(created_at) created_at
FROM messages
WHERE created_at <= '2021-07-24'
GROUP BY user_id
Then, you need to find the message for the particular user_id created on that date. Use the subquery for that. Fiddle
SELECT a.*
FROM messages a
JOIN (
SELECT user_id, MAX(created_at) created_at
FROM messages
WHERE created_at <= '2021-07-24'
GROUP BY user_id
) b ON a.user_id = b.user_id AND a.created_at = b.created_at
See how that JOIN works? It pulls out the rows matching the latest date for each user.
There's a possible optimization. If
then the most recent message for each user_id is also the message with the largest message_id. In that case you can use this query instead. Fiddle
SELECT a.*
FROM messages a
JOIN (
SELECT user_id, MAX(message_id) message_id
FROM messages
WHERE created_at <= '2021-07-24'
GROUP BY user_id
) b ON a.message_id=b.message_id
Due to the way primary key indexes work, this can be faster.
You want an ordinary JOIN rather than a RIGHT or LEFT JOIN here: the ordinary JOIN only returns rows that match the ON condition.
Pro tip almost nobody actually uses RIGHT JOIN. When you want that kind of JOIN, use LEFT JOIN. You don't want that kind of join to solve this problem.