mysqlmariadbsubquerygroupwise-maximum

Optimize correlated subquery for order events table


I have a MariaDB table with order events:

EVENTID ORDERID DATA TIMESTAMP
1 1 'stuff1' 2021-12-17 11:48:00.000
2 1 'newstuff1' 2021-12-17 11:49:00.000
3 1 'newerstuff1' 2021-12-17 11:49:30.000
4 2 'stuff2' 2021-12-17 11:50:00.000
5 3 'stuff3' 2021-12-17 11:51:10.000
6 3 'newstuff3' 2021-12-17 11:52:00.000

And I need to get the most recent event for every order. So for this dataset the result should be

EVENTID ORDERID DATA TIMESTAMP
3 1 'newerstuff1' 2021-12-17 11:49:30.000
4 2 'stuff2' 2021-12-17 11:50:00.000
6 3 'newstuff3' 2021-12-17 11:52:00.000

I am achieving this using a correlated subquery:

SELECT *
FROM MESSAGES m1
WHERE TIMESTAMP = (SELECT MAX(TIMESTAMP) 
FROM MESSAGES m2 WHERE m1.ORDERID = m2.ORDERID);

But this is slow for large volumes so I'd like to know if there is a way to improve the query using a join or grouping. Otherwise I'm open to splitting the data into different tables if that would help.


Solution

  • SELECT m3.*
        FROM ( SELECT ORDERID, MAX(TIMESTAMP) AS TIMESTAMP
                  FROM MESSAGES m1
                  GROUP BY ORDERID ) m2
        JOIN MESSAGES m3 USING(ORDERID, TIMESTAMP);
    

    and have

    INDEX(ORDERID, TIMESTAMP)
    

    The query is a "groupwise-max" pattern.

    The derived table (subquery) will hop through the index, then look up the rest of the columns from the table.

    If there are any duplicate timestamps for a given order id, it will deliver multiple rows. If that is not desired, see this for more options.