INPUT: this is the Order_Tbl
table with each row equivalent to a record of a transaction:
ORDER_DAY | ORDER_ID | PRODUCT_ID | QUANTITY | PRICE |
---|---|---|---|---|
2015-05-01 | ODR1 | PROD1 | 5 | 5 |
2015-05-01 | ODR2 | PROD2 | 2 | 10 |
2015-05-01 | ODR3 | PROD3 | 10 | 25 |
2015-05-01 | ODR4 | PROD1 | 20 | 5 |
2015-05-02 | ODR5 | PROD3 | 5 | 25 |
2015-05-02 | ODR6 | PROD4 | 6 | 20 |
2015-05-02 | ODR7 | PROD1 | 2 | 5 |
2015-05-02 | ODR8 | PROD5 | 1 | 50 |
2015-05-02 | ODR9 | PROD6 | 2 | 50 |
2015-05-02 | ODR10 | PROD2 | 4 | 10 |
EXPECTED OUTPUT: the task is to write a T-SQL query to get products that was ordered on 02-May-2015
but not on any other days before that:
ORDER_DAY | ORDER_ID | PRODUCT_ID | QUANTITY | PRICE |
---|---|---|---|---|
2015-05-02 | ODR6 | PROD4 | 6 | 20 |
2015-05-02 | ODR8 | PROD5 | 1 | 50 |
2015-05-02 | ODR9 | PROD6 | 2 | 50 |
I did try 2 solutions with a same approach: using a (completely identical) subquery to get a list of distinct products that were ordered before 02-May-2015
, and then somehow putting it after the NOT IN
operator inside the WHERE
clause of the main query.
Solution 1: the subquery was passed in as an CTE
. It throws a syntax error ...
WITH CTE AS
(
SELECT DISTINCT PRODUCT_ID
FROM Order_Tbl
WHERE ORDER_DAY < '2015-05-02'
)
SELECT *
FROM Order_Tbl
WHERE ORDER_DAY = '2015-05-02'
AND PRODUCT_ID NOT IN CTE
Solution 2: the subquery was embedded into the WHERE
clause of the main query. This worked!
SELECT *
FROM Order_Tbl
WHERE ORDER_DAY = '2015-05-02'
AND PRODUCT_ID NOT IN (SELECT DISTINCT PRODUCT_ID
FROM Order_Tbl
WHERE ORDER_DAY < '2015-05-02')
What was the nuance that made SQL behave and return different results? I would appreciate it if you guys could give me a clear explanation as well as some useful notes for further SQL implementations.
The task is to write a T-SQL query to get products that was ordered on 02-May-2015 but not on any other days before that
It has been explained by Thom A in the comments that NOT IN
does not accept the first syntax you tried, and that, even with the right syntax, NOT EXISTS
is in general preferable to NOT IN
:
select *
from order_tbl o
where order_day = '2015-05-02'
and not exists (
select 1
from order_tbl o1
where o1.product_id = o.product_id and o1.order_day < o.order_day
)
The query ensures that there is no row in the table for the same product and an erlier order date. For performance, consider an index on order_tbl(product_id, order_day)
.
But overall, it is probably simpler and more efficient to use window functions ; the subquery can just be replaced with a window min()
:
select *
from (
select o.*, min(order_day) over(partition by product_id) min_order_day
from order_tbl o
) o
where order_day = '2015-05-02' and min_order_day = order_day