mysqlsqljoinquery-optimizationmysql-dependent-subquery

optimizing SQL counts


I have to select a list of Catalogs from one table, and perform counts in two other tables: Stores and Categories. The counters should show how many Stores and Categories are linked to each Catalog. I have managed to get the functionality I need using this SQL query:

 SELECT `catalog`.`id` AS `id`,
       `catalog`.`name` AS `name`,
       (
            SELECT COUNT(*)
              FROM `category`
              WHERE `category`.`catalog_id` = `catalog`.`id`
               AND `category`.`is_archive` = 0
               AND `category`.`company_id` = 2
       ) AS `category_count`,
       (
            SELECT COUNT(*)
              FROM `store`
              WHERE `store`.`catalog_id` = `catalog`.`id`
               AND `store`.`is_archive` = 0
               AND `store`.`company_id` = 2
       ) AS `store_count`
  FROM `catalog`
 WHERE `catalog`.`company_id` = 2
   AND `catalog`.`is_archive` = 0
 ORDER BY `catalog`.`id` ASC;

This works as expected. But I don't like to perform sub-queries, as they are slow and this query may perform badly on LARGE lists.. Is there any method of optimizing this SQL using JOINs? Thanks in advance.


Solution

  • You can make this a lot faster by refactoring the dependent subqueries in your SELECT clause into, as you mention, JOINed aggregate subqueries.

    The first subquery you can write this way.

                    SELECT COUNT(*) num, catalog_id, company_id
                      FROM category
                     WHERE is_archive = 0
                     GROUP BY catalog_id, company_id
    

    The second one like this.

                    SELECT COUNT(*) num, catalog_id, company_id
                      FROM store
                     WHERE is_archive = 0
                     GROUP BY catalog_id, company_id
    

    Then, use those in your main query aas if they were tables containing the counts you want.

    SELECT catalog.id,
           catalog.name,
           category.num category_count,
           store.num store_count
      FROM catalog
      LEFT JOIN (
                SELECT COUNT(*) num, catalog_id, company_id
                  FROM category
                 WHERE is_archive = 0
                 GROUP BY catalog_id, company_id
           ) category  ON catalog.id = category.catalog_id
                      AND catalog.company_id = category.company_id
      LEFT JOIN (
                SELECT COUNT(*) num, catalog_id, company_id
                  FROM store
                 WHERE is_archive = 0
                 GROUP BY catalog_id, company_id
           ) store  ON catalog.id = store.catalog_id
                   AND catalog.company_id = store.company_id
     WHERE catalog.is_archive = 0
       AND catalog.company_id = 2
     ORDER BY catalog.id ASC;
    

    This is faster than your example because each subquery need only run once, rather than once per catalog entry. It also has the nice feature that you only need say WHERE catalog.company_id = 2 once. The MySQL optimizer knows what to do with that.

    I suggest LEFT JOIN operations so you'll still see catalog entries even if they're not mentioned in your category or store tables.