sqlmysqlindexinggreatest-n-per-group

How to choose index for complex sql queries


I'm working with a complex SQL query that involves multiple joins, subqueries, grouping, and ordering. The main goal is to improve the performance of this query, especially by choosing the right indexes. While I have a general understanding of how to approach index selection for simpler queries, I'm not sure how to prioritize different parts of a query like this one. Should I focus first on optimizing subqueries, grouping, or ORDER BY clauses? Additionally, I’m unsure if there are redundant filters or if certain joins could be optimized for better performance.

SELECT
    S.id AS salesperson_id,
    S.name AS salesperson_name,
    S.mobile AS salesperson_phone_number,
    B.name AS brand_name,
    ci.id AS collection_invoice_id,
    ci.invoice_no AS invoice_number,
    ci.status as status,
    ci.invoice_date AS invoice_date,
    DATE_FORMAT(ci.collection_date,'%d-%m-%Y') AS collection_date,
    DAYOFWEEK(ci.collection_date) AS collection_weekday_number,
    ci.invoice_amount AS invoice_value,
    ci.invoice_assigned_by AS invoice_assigned_by,
    ci.invoice_updated_by AS invoice_updated_by,
    ci.verified_by_cashier_id AS verified_by_cashier_id,
    ci.verified_by_segregator_id AS verified_by_segregator_id,
    ci.invoice_verification_status AS invoice_verification_status,
    DATEDIFF(CURDATE(), DATE(ci.invoice_date)) AS invoice_age,
    ci.invoice_amount AS invoice_value,
    (SELECT COALESCE( SUM(amount), 0) from payments WHERE collection_invoice_id = ci.id) AS collected_amount,
    ci.initial_outstanding_amount AS outstanding_amount,
    ci.current_outstanding_amount AS new_outstanding,
    ci.fc_id AS collection_fc_id,
    ci.brand_id AS collection_brand_id,
    B.id AS brand_id,
    B.name AS brand_name,
    B.code AS brand_code,
    ci.store_id AS collection_store_id,
    store.id AS store_id,
    store.name AS store_name,
    store.code AS store_code,
    ci.salesman_id AS collection_salesman_id,
    ci.handover_old_salesman_id  AS old_collection_salesman_id,
    co.beat_name as beat_name,
    (SELECT name FROM Salesmen WHERE id = ci.handover_old_salesman_id) AS old_collection_salesperson_name,
    (SELECT mobile FROM Salesmen WHERE id = ci.handover_old_salesman_id) AS old_collection_salesperson_phone_number
FROM collection_invoices AS ci
JOIN Brands AS B ON ci.brand_id = B.id
JOIN Stores AS store ON ci.store_id = store.id
JOIN Salesmen AS S ON ci.salesman_id = S.id
LEFT JOIN Orders AS Orde ON ci.order_id = Orde.id
LEFT JOIN Allocations AS alcn ON Orde.allocation_id = alcn.id
JOIN ChampOutstandingInvoices co on ci.invoice_no = co.invoice_no and co.fc_id = ci.fc_id
WHERE ci.brand_id IN (35,32,24,21,5,4,3,1,37) 
    AND ci.fc_id IN (1) AND ci.salesman_id != 0 
    AND (ci.collection_date >= DATE_ADD(DATE(CURDATE()), INTERVAL -3 DAY) AND ci.collection_date <= DATE_ADD(DATE(CURDATE()), INTERVAL 3 DAY)  ) 
    AND (ci.salesman_id = 16552 ) 
    AND ci.fc_id IN (1) 
    AND ci.brand_id IN (21) 
    AND (DAYOFWEEK(ci.collection_date) IN ( 4 )) 
    AND (ci.brand_id IN ( 21 )) 
    AND ci.id IN (
        SELECT max(id) AS id
        FROM collection_invoices AS ci
        WHERE ci.brand_id IN (35,32,24,21,5,4,3,1,37) 
            AND ci.fc_id IN (1) 
            AND ci.salesman_id != 0 
            AND (ci.collection_date >= DATE_ADD(DATE(CURDATE()), INTERVAL -3 DAY) 
            AND ci.collection_date <= DATE_ADD(DATE(CURDATE()), INTERVAL 3 DAY)  ) 
            AND (ci.salesman_id = 16552 ) 
            AND ci.fc_id IN (1) 
            AND ci.brand_id IN (21) 
            AND (DAYOFWEEK(ci.collection_date) IN ( 4 )) 
            AND (ci.brand_id IN ( 21 ))
        GROUP BY invoice_no, fc_id, brand_id
        ) 
    AND ci.invoice_assigned_by IS NULL 
    AND ci.initial_outstanding_amount > 0 
    AND (Orde.status IN ('DL','PD') 
    AND alcn.return_status = 'Complete')  
ORDER BY ci.invoice_no ASC
limit 50 offset 0

Solution

  • Please do not re-use the same alias (ci); it makes the query hard to parse.

    These might be useful for that query (and some similar queries)

    collection_invoices:  INDEX(salesman_id, collection_date) 
    collection_invoices:  INDEX(brand_id,    collection_date)
    

    It seems obvious that your queries may have lots of variations. There is no good solution. I like to have 2- or 3-column composite indexes where the first column(s) are tested with = and, optionally, the last column being a range.

    That looks like a groupwise-max problem, but I am not sure. I added a tag.

    More discussion on choosing an index for a query: Index Cookbook