I'm facing an issue when inserting records into struct field in Azure Databricks using SQL.
In my INSERT INTO statement I have a condition based on which I want to insert either null or a struct.
INSERT INTO my_table (target_column_name)
SELECT
CASE
WHEN condition = true THEN NAMED_STRUCT(<some_fields>)
ELSE NULL
END AS target_column_name
FROM source_table
When I run SELECT only, I'm getting struct when condition is met and null when not met.
However, when I run it as INSERT INTO and check the target table, I'm getting a struct with all fields set to null instead of a single null value.
When I run two separate INSERT statements (one for met condition and second for not met), I'm getting single null value.
I want to avoid having two separate insert statements, how can I force the null value?
Your
SELECT
query behaves correctly: it returns either a validStruct
orNULL
. But when used inside anINSERT INTO
, theNULL
becomes aSTRUCT
with all fields set tonull
(not a true NULL).
This is happening because Spark infers the result type of the CASE
expression as a STRUCT
. When NULL
is used without a cast, it’s interpreted as a struct with null fields, not an actual NULL
.
To force the insertion of an actual NULL, explicitly cast the NULL
to the struct type like this:
INSERT INTO my_table (target_column_name)
SELECT
CASE
WHEN condition = true THEN NAMED_STRUCT('field1', value1, 'field2', value2)
ELSE CAST(NULL AS STRUCT<field1: TYPE1, field2: TYPE2>)
END AS target_column_name
FROM source_table;
Casting NULL
explicitly tells Spark/Databricks that you want a true SQL NULL
of type STRUCT
, not a STRUCT
object with all fields null
.
So, I tested this whole scenario and here's the result after inserting the data using the explicit cast method:
Creating source and target table:
Running the INSERT query:
Verifying the result with this select:
is_struct_null = true
confirms a proper null was inserted instead of a struct with null fields.