sqlmysqlgroup-byhavingsql-in

SQL query for items joined to multiple categories


I've got these 2 tables, content item and a join table to categories. I simply need a query to select items that are in multiple categories (in ALL categories, not in any category).

-- content items
CREATE TABLE `items` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  [...]
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- a join table to a categories table
CREATE TABLE `item_categories` (
  `item_id` bigint(20) DEFAULT NULL,
  `category_id` bigint(20) DEFAULT NULL,
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Here is a query to select entries in the join table in both categories:

SELECT `item_categories`.* FROM `item_categories`
  WHERE `item_categories`.`category_id` IN (11811, 11911)

Now joining that with the items table gives me items in ANY category. Simple:

SELECT DISTINCT `items`.* FROM `items`
  INNER JOIN `item_categories` ON `item_categories`.`item_id` = `items`.`id`
  WHERE `item_categories`.`category_id` IN (11811, 11911)

This is a little silly, yes, but an attempt with an AND (0 results every time):

SELECT DISTINCT `items`.* FROM `items`
  INNER JOIN `item_categories` ON `item_categories`.`item_id` = `items`.`id`
  WHERE `item_categories`.`category_id` = 11811
    AND `item_categories`.`category_id` = 11911

This is the closest I've gotten. It seems overly complicated, a sort of manual join with a WHERE ID IN for each category:

SELECT DISTINCT `items`.* FROM `items`
  WHERE ID IN (
    SELECT `item_categories`.item_id FROM `item_categories`
      WHERE `item_categories`.`category_id` = 11811
  )
  AND ID IN (
    SELECT `item_categories`.item_id FROM `item_categories`
      WHERE `item_categories`.`category_id` = 11911
  )

Is there a better (simpler) way to achieve this?

I'm thinking there must be a way to put a condition on the join with a self join or group by to select only item_categories in all categories.


Solution

  • Use aggregation in item_categories to get the item_ids that belong to all the categories:

    SELECT *
    FROM items
    WHERE id IN (
      SELECT item_id 
      FROM item_categories
      WHERE category_id IN (11811, 11911)
      GROUP BY item_id
      HAVING COUNT(DISTINCT category_id) = 2 -- the number of categories
    );
    

    I don't see in the definition of item_categories any unique constraint for the combination of item_id and category_id.
    If there is one, you may remove DISTINCT from the HAVING clause.