I have this query in Snowflake (which runs on a generator so it doesn't need a table if you want to just run it):
SELECT
DATEADD(day, SEQ4() * 7, '2022-11-13'::DATE) AS date_column,
DATEADD(week, -98, CURRENT_DATE) as anchor_date,
current_date,
DATEADD(day, SEQ4() * 7, '2022-11-13'::DATE) BETWEEN DATEADD(week, -98, CURRENT_DATE) AND CURRENT_DATE as truthy
FROM
TABLE(GENERATOR(ROWCOUNT => 5000))
WHERE
DATEADD(day, SEQ4() * 7, '2022-11-13'::DATE) BETWEEN DATEADD(week, -98, CURRENT_DATE) AND CURRENT_DATE
order by date_column
Which returns these records (and more):
date_column | anchor_date | current_date | truthy |
---|---|---|---|
2022-11-13 | 2022-12-13 | 2024-10-29 | FALSE |
2022-11-20 | 2022-12-13 | 2024-10-29 | FALSE |
2022-11-27 | 2022-12-13 | 2024-10-29 | FALSE |
2022-12-04 | 2022-12-13 | 2024-10-29 | FALSE |
2022-12-11 | 2022-12-13 | 2024-10-29 | FALSE |
2022-12-18 | 2022-12-13 | 2024-10-29 | TRUE |
2022-12-25 | 2022-12-13 | 2024-10-29 | TRUE |
2023-01-01 | 2022-12-13 | 2024-10-29 | TRUE |
2023-01-08 | 2022-12-13 | 2024-10-29 | TRUE |
2023-01-15 | 2022-12-13 | 2024-10-29 | TRUE |
Clearly it knows that the WHERE clause is FALSE for the first five records, but it returns those records anyway. Why? I'm guessing the table-free / generator setup may have something to do with it, but I can't figure out why.
I know I could wrap this in a SELECT * FROM () where truthy;
- solving the problem isn't so much the issue as understanding why this doesn't work. What's going on?
so first to prove the SEQ4
are stable in the select as implied:
SELECT
SEQ4() * 7 as a,
SEQ4() * 7 as b,
FROM TABLE(GENERATOR(ROWCOUNT => 5000))
WHERE A != b
Query produced no results
good
now to use the results of the select
region in the where
:
SELECT
DATEADD(day, SEQ4() * 7, '2022-11-13'::DATE) AS date_column,
DATEADD(week, -98, CURRENT_DATE) as anchor_date,
current_date,
date_column BETWEEN anchor_date AND CURRENT_DATE as truthy
FROM TABLE(GENERATOR(ROWCOUNT => 5000))
WHERE truthy
order by date_column
change back to using a seperate expression in the where clause (how you have it):
SELECT
DATEADD(day, SEQ4() * 7, '2022-11-13'::DATE) AS date_column,
DATEADD(week, -98, CURRENT_DATE) as anchor_date,
current_date,
date_column BETWEEN anchor_date AND CURRENT_DATE as truthy
FROM TABLE(GENERATOR(ROWCOUNT => 5000))
WHERE DATEADD(day, SEQ4() * 7, '2022-11-13'::DATE) BETWEEN DATEADD(week, -98, CURRENT_DATE) AND CURRENT_DATE
order by date_column
I am going to guess the independant code is the same as explicitly using a sub-select:
SELECT
DATEADD(day, SEQ4() * 7, '2022-11-13'::DATE) AS date_column,
DATEADD(week, -98, CURRENT_DATE) as anchor_date,
current_date,
date_column BETWEEN anchor_date AND CURRENT_DATE as truthy
FROM (
select SEQ4() as sub_seq
from TABLE(GENERATOR(ROWCOUNT => 5000))
) WHERE DATEADD(day, sub_seq * 7, '2022-11-13'::DATE) BETWEEN DATEADD(week, -98, CURRENT_DATE) AND CURRENT_DATE
order by date_column
which has unwanted results which can be seen like this:
SELECT
SEQ4() as a,
a > 1000 as assumptions
FROM (
select SEQ4() as b
from TABLE(GENERATOR(ROWCOUNT => 5000))
)
WHERE b > 1000
order by 2;
Now if the where clause here was a != b
there are no unwanted results, as the rows are only processed once.