I am new to Databricks and was required to implement the snowflake code in Databricks.
The snowflake table, code and output look like below:
table:
id | col1 | hn |
---|---|---|
ee1 | null | 1 |
ee1 | null | 2 |
ee1 | test | 3 |
ee1 | test | 4 |
ee1 | test2 | 5 |
Query used:
SELECT ID, FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS last_value FROM table
Output:
id | first_value | last_value |
---|---|---|
ee1 | test | test2 |
ee1 | test | test2 |
ee1 | test | test2 |
ee1 | test | test2 |
ee1 | test | test2 |
When I tried the same query in Databricks using Spark SQL, ignore nulls did not work properly.
Can anyone provide the equivalent query for this in Databricks?
The key point is the window frame specification:
SELECT ID,
FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value,
LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_value
FROM table;
If not defined explicitly the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW