mysqlsqlgroup-bydatabase-optimization

Optimising SQL query using group by and conditions


I have two tables fiches and fiches_actions, the entries in the fiches_actions table are actions made on entries in fiches table, every action has an action_id representing the action.

Here is the schema of the two tables

schema of the two tables

Problem

For every fiche, get the last action made on it (action_id), then get the count of every action made.

Another formulation: Get the count of every last action made on the fiches

I get the last action made on a fiche by getting the last id inserted in fiches_actions table for that fiche max(fiches_actions.id)

Those fiches have to verify some conditions

`fiches`.`created_at` >= '2016-01-01 00:00:00'
    AND `fiches`.`created_at` <= '2017-02-01 00:00:00'
    AND `fiches`.`status` = 0 
    AND `fiches`.`agent_id` = '51'

My Solution

I did get a result by using this method :

First I created a view to get for every fiche the action made on it

 CREATE VIEW v_fiches_actions AS 
  SELECT max(fiches_actions.id) as id, 
    `fiches_actions.action_id`,
     fiches_actions.fiche_id 
 FROM fiches_actions group by fiche_id;

Then, I select the count from this view

select v_fiches_actions.action_id, count(*) from v_fiches_actions where fiche_id in 
( select `fiches`.`id` from fiches where
    `fiches`.`created_at` >= '2016-01-01 00:00:00'
    AND `fiches`.`created_at` <= '2017-02-01 00:00:00'
    AND `fiches`.`status` = 0 
    AND `fiches`.`agent_id` = '51'
) group by action_id;

And this is the result I get : which seems correct

    | action_id | count(*)
    | 3         | 6
    | 7         | 1

My Questions

1- Is my method correct and I am getting the correct results

2- Is there a way to do this in a single query (without using the view)


Solution

  • Your approach is not incorrect, but it is more verbose and requires more work than necessary.

    Here is another method:

    select fa.action_id, count(*)
    from fiches_actions fa join
         fiches f
         on fa.fiche_id = f.id
    where f.created_at >= 2016-01-01' and
          f.created_at <= '2017-02-01' and
          f.status = 0 and
          f.agent_id = 51 and
          fa.created_at = (select max(fa2.created_at)
                           from fiches_actions fa2
                           where fa2.fiche_id = f.id
                          )
    group by action_id;
    

    For performance, indexes on fiches(agent_id, status, created_at, id) and fiches_actions(fiche_id, created_at).

    The correlated subquery (particularly with the right indexes) should be much faster than the aggregation. Why? The correlated subquery is only run on the rows that remain after the filter. On the other hand, the aggregation has to aggregate all rows in the fiche_actions table.

    Notes: