sqlsql-serversubquerywhere-clausenotin

Can't combine 'NOT IN' with CTE in a 'WHERE' clause


INPUT: this is the Order_Tbl table with each row equivalent to a record of a transaction:

ORDER_DAY ORDER_ID PRODUCT_ID QUANTITY PRICE
2015-05-01 ODR1 PROD1 5 5
2015-05-01 ODR2 PROD2 2 10
2015-05-01 ODR3 PROD3 10 25
2015-05-01 ODR4 PROD1 20 5
2015-05-02 ODR5 PROD3 5 25
2015-05-02 ODR6 PROD4 6 20
2015-05-02 ODR7 PROD1 2 5
2015-05-02 ODR8 PROD5 1 50
2015-05-02 ODR9 PROD6 2 50
2015-05-02 ODR10 PROD2 4 10

EXPECTED OUTPUT: the task is to write a T-SQL query to get products that was ordered on 02-May-2015 but not on any other days before that:

ORDER_DAY ORDER_ID PRODUCT_ID QUANTITY PRICE
2015-05-02 ODR6 PROD4 6 20
2015-05-02 ODR8 PROD5 1 50
2015-05-02 ODR9 PROD6 2 50

I did try 2 solutions with a same approach: using a (completely identical) subquery to get a list of distinct products that were ordered before 02-May-2015, and then somehow putting it after the NOT IN operator inside the WHERE clause of the main query.

Solution 1: the subquery was passed in as an CTE. It throws a syntax error ...

WITH CTE AS
(
    SELECT DISTINCT PRODUCT_ID
    FROM Order_Tbl
    WHERE ORDER_DAY < '2015-05-02'
)
SELECT *
FROM Order_Tbl
WHERE ORDER_DAY = '2015-05-02'
  AND PRODUCT_ID NOT IN CTE

Solution 2: the subquery was embedded into the WHERE clause of the main query. This worked!

SELECT *
FROM Order_Tbl
WHERE ORDER_DAY = '2015-05-02'
  AND PRODUCT_ID NOT IN (SELECT DISTINCT PRODUCT_ID
                         FROM Order_Tbl
                         WHERE ORDER_DAY < '2015-05-02')

What was the nuance that made SQL behave and return different results? I would appreciate it if you guys could give me a clear explanation as well as some useful notes for further SQL implementations.


Solution

  • The task is to write a T-SQL query to get products that was ordered on 02-May-2015 but not on any other days before that

    It has been explained by Thom A in the comments that NOT IN does not accept the first syntax you tried, and that, even with the right syntax, NOT EXISTS is in general preferable to NOT IN:

    select *
    from order_tbl o
    where order_day = '2015-05-02'
      and not exists (
        select 1 
        from order_tbl o1 
        where o1.product_id = o.product_id and o1.order_day < o.order_day 
    )
    

    The query ensures that there is no row in the table for the same product and an erlier order date. For performance, consider an index on order_tbl(product_id, order_day).

    But overall, it is probably simpler and more efficient to use window functions ; the subquery can just be replaced with a window min():

    select *
    from (
        select o.*, min(order_day) over(partition by product_id) min_order_day
        from order_tbl o
    ) o
    where order_day = '2015-05-02' and min_order_day = order_day