I have two tables in a this is Postgres database representing simple orders from a market. A master table with information about the order, and a detail table with containing specifics of the purchase, with a foreign key back to master. Easy enough.
Over thousands of orders from the market, I'd like to find some specific orders based on what was purchased and in what quantity.
I have two more tables, in similar fashion, a master and a child where I create a "pack" and detail items from the market.
For example: Pack A contains 2 Apples and 3 Oranges. I define that in the tables. Now I'd like to find how many orders, and which orders from the market match that specific combination exactly.
It's important that it is an exact match. An order containing additional products or with any different quantity does not match.
In the SQL Fiddle, I've setup the simple example with data. The raw DDL is below. Two of the orders in the table should match Pack A.
http://sqlfiddle.com/#!17/b4f55
CREATE TABLE customer_order(
order_id serial PRIMARY KEY NOT NULL,
customer_name VARCHAR(100) NOT NULL
);
CREATE TABLE order_detail(
id serial PRIMARY KEY,
order_id INTEGER,
item_sku VARCHAR(50),
item_quantity INTEGER,
FOREIGN KEY(order_id) REFERENCES customer_order(order_id)
);
INSERT INTO customer_order (customer_name) VALUES ('John');
INSERT INTO customer_order (customer_name) VALUES ('Mary');
INSERT INTO customer_order (customer_name) VALUES ('Bill');
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (1, 'APPLE', 2);
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (1, 'ORANGE', 3);
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (2, 'ORANGE', 5);
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (3, 'APPLE', 2);
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (3, 'ORANGE', 3);
CREATE TABLE pack_master(
pack_id serial PRIMARY KEY NOT NULL,
name VARCHAR(100) NOT NULL
);
CREATE TABLE pack_child(
id serial PRIMARY KEY,
pack_id INTEGER,
item_sku VARCHAR(50),
item_quantity INTEGER,
FOREIGN KEY(pack_id) REFERENCES pack_master(pack_id)
);
INSERT INTO pack_master (name) VALUES ('Pack A');
INSERT INTO pack_master (name) VALUES ('Pack B');
INSERT INTO pack_child (pack_id, item_sku, item_quantity) VALUES (1, 'APPLE', 2);
INSERT INTO pack_child (pack_id, item_sku, item_quantity) VALUES (1, 'ORANGE', 3);
INSERT INTO pack_child (pack_id, item_sku, item_quantity) VALUES (2, 'GRAPES', 5);
Assuming that pack_child (pack_id, item_sku)
, as well as order_detail (order_id, item_sku)
are defined UNIQUE
, this would work:
SELECT pc.pack_id, od.order_id
FROM pack_child pc
LEFT JOIN order_detail od USING (item_sku, item_quantity)
GROUP BY 1, 2
HAVING count(*) = count(od.id) -- every item of the pack has a match
AND NOT EXISTS (
SELECT
FROM order_detail od1
LEFT JOIN pack_child pc1 ON pc1.item_sku = od1.item_sku
AND pc1.item_quantity = od1.item_quantity
AND pc1.pack_id = pc.pack_id
WHERE od1.order_id = od.order_id
AND pc1.id IS NULL -- and order has no additional item
);
Returns all pairs of pack_id
and order_id
that match exactly.
db<>fiddle here
There are a hundred-and-one alternative ways to write the query. Which is the fastest depends on cardinalities, data distribution, constraints and, most importantly, available indexes.
It's a special application of relational-division. Here is an arsenal of techniques:
One alternative, probably faster: create views or materialized views of the parent tables including the item count:
CREATE MATERIALIZED VIEW v_pack_master AS
SELECT *
FROM pack_master
JOIN (
SELECT pack_id, count(*) AS items
FROM pack_child
GROUP BY 1
) c USING (pack_id);
CREATE MATERIALIZED VIEW v_customer_order AS
SELECT *
FROM customer_order
JOIN (
SELECT order_id, count(*) AS items
FROM order_detail
GROUP BY 1
) c USING (order_id);
(Orders typically don't change later, so might be viable candidates for a materialized view.)
Only if there can be many order items, an index might pay (index expressions in this order):
CREATE INDEX foo ON v_customer_order (items, order_id);
The query now only considers orders with a matching item count to begin with:
SELECT * -- pack_id, order_id
FROM v_pack_master pm
LEFT JOIN v_customer_order co USING (items)
JOIN LATERAL (
SELECT count(*) AS items
FROM pack_child pc
JOIN order_detail od USING (item_sku, item_quantity)
WHERE pc.pack_id = pm.pack_id
AND od.order_id = co.order_id
) x USING (items);
.. then, if all items match, we don't have to rule out additional items any more. And we have all columns from the parent table at our disposal right away, to return whatever it is you want to return ...