pysparkdatabricksdelta-lakedata-ingestionaws-databricks

Delta Lake Data Load Datatype mismatch


I am loading data from SQL Server to Delta lake tables. Recently i had to repoint the source to another table(same columns), but the data type is different in new table. This is causing error while loading data to delta table. Getting following error:

Failed to merge fields 'COLUMN1' and 'COLUMN1'. Failed to merge incompatible data types LongType and DecimalType(32,0)

Command i use to write data to delta table:

DF.write.mode("overwrite").format("delta").option("mergeSchema", "true").save("s3 path)

The only option i can think of right now is to enable OverWriteSchema to True.

But this will rewrite my target schema completely. I am just concerned about any sudden change in source schema that will replace existing target schema without any notification or alert.

Also i can't explicitly convert these columns because the databricks notebook i am using is a parametrized one used to to load data from source to Target(We are reading data from a CSV file that contain all the details about Target table, Source table, partition key etc)

Is there any better way to tackle this issue? Any help is much appreciated!


Solution

  • Did a workaround for now. Wrote a custom function that compares Source and Target schema and converts Source datatype to target datatype.( Only columns common between source and target will be considered)