pysparkdatabricksspark-window-function

Window function ignore nulls not working in Databricks


I am new to Databricks and was required to implement the snowflake code in Databricks.

The snowflake table, code and output look like below:

table:

id col1 hn
ee1 null 1
ee1 null 2
ee1 test 3
ee1 test 4
ee1 test2 5

Query used:

SELECT ID, FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS last_value FROM table

Output:

id first_value last_value
ee1 test test2
ee1 test test2
ee1 test test2
ee1 test test2
ee1 test test2

When I tried the same query in Databricks using Spark SQL, ignore nulls did not work properly.

Can anyone provide the equivalent query for this in Databricks?


Solution

  • The key point is the window frame specification:

    SELECT ID, 
      FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, 
      LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn 
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_value 
    FROM table;
    

    If not defined explicitly the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW