I want to return all rows that were public in May (2019-05), so if a row was turned to draft (and not back to public) at any point before the end of May, I don't want it. For example:
id | post_id | status | date
-------------------------
1 | 1 | draft | 2019-03-25
2 | 1 | public | 2019-04-02
3 | 1 | draft | 2019-05-25
4 | 2 | draft | 2019-03-10
5 | 2 | public | 2019-04-01
6 | 2 | draft | 2019-06-01
The desired result for the above would return post_id 2
because its last status change prior to the end of May was to public
.
post_id 1
was put back in draft before the end of May, so it would not be included.
I'm not sure how to use the correct join or sub-queries to do this as efficiently as possible.
You seem to want the status as of 2019-05-31. A correlated subquery seems like the simplest solution:
select t.*
from t
where t.date = (select max(t2.date)
from t t2
where t2.post_id = t.post_id and
t2.date <= '2019-05-31'
);
To get the ones that are public, just add a WHERE
condition:
select t.*
from t
where t.date = (select max(t2.date)
from t t2
where t2.post_id = t.post_id and
t2.date <= '2019-05-31'
) and
t.status = 'public';
For performance, you want an index on (post_id, date)
.
You can also phrase this using a JOIN
:
select t.*
from t join
(select t2.post_id, max(t2.date) as max_date
from t t2
where t2.date <= '2019-05-31'
group by t2.post_id
) t2
on t2.max_date = t.date
where t.status = 'public';
I would expect the correlated subquery to have better performance with the right indexes. However, sometimes MySQL surprises me.