Found what I believe to be a bug. Posting here for posterity. Comes from consecutive filter operations on a dataframe.
Expected output is the empty dataframe (no rows meet the A=0
) condition. However, I believe the consecutive filter is being parsed as A=1 or (B=3 and A=0)
However, I believe the consecutive filter is being parsed as
A=1 or (B=3 and A=0)
Correct. It is parsed as A = 1 OR B=3 AND A = 0
(AND has precedence over OR).
It can be easily confirmed by using snowflake.snowpark.Session.query_history:
import snowflake.snowpark
session = snowflake.snowpark.context.get_active_session()
with session.query_history(True) as query_history:
df = session.sql("SELECT 1 AS a, 2 AS B")
df.filter('A = 1 OR B=3') \
.filter('A = 0') \
.show()
query_history.queries[1]
Output:
Chained filter conditions are connected with AND
, therefore for correct evaluation the condition with OR
it should be wrapped with parenthesis:
df.filter('(A = 1 OR B=3)') \
.filter('A = 0') \
.show()
Sidenote: Usage of verbatim SQL text in the condition can be replaced with col('<name>')
syntax:
import snowflake.snowpark
from snowflake.snowpark.functions import col
session = snowflake.snowpark.context.get_active_session()
with session.query_history(True) as query_history:
df = session.sql("SELECT 1 AS A, 2 AS B")
df.filter((col('A') == 1) | (col('B') == 3)) \
.filter(col('A') == 0) \
.show()
query_history.queries[1]
It generates:
SELECT *
FROM (SELECT 1 AS A, 2 AS B)
WHERE ((("A" = 1) OR ("B" = 3)) AND ("A" = 0))
LIMIT 10