I have the following code
Previously I have a delta table with 180 columns in my_path
´, I select a column and try to overwrite
columns_to_select = ["one_column"]
df_one_column = df.select(*columns_to_select)
df_one_column.write.format("delta").mode("overwrite").option("mergeSchema", "true").save(my_path)
new_schema = spark.read.format("delta").load(my_path).schema
target_column = [field.name for field in new_schema.fields]
print(len(target_column)) # return 180
I expected to return 1 as I just select one column from my Dataframe but is returning 180 colums
You need to use option("overwriteSchema", "True")
while writing
Here is the sample example
df.write.format("delta").mode("overwrite").save(my_path)
df_first = spark.read.format("delta").load(my_path)
print(df_first.columns, len(df_first.columns))
columns_to_select = ["firstname"]
df_one_column = df.select(*columns_to_select)
df_one_column.write.format("delta").mode("overwrite").option("overwriteSchema", "True").option("mergeSchema", "true").save(my_path)
df_second = spark.read.format("delta").load(my_path)
print(df_second.columns, len(df_second.columns))
Please refer following links for more info
mergeSchema: https://delta.io/blog/2023-02-08-delta-lake-schema-evolution/
overwriteSchema: https://docs.databricks.com/en/delta/update-schema.html#explicitly-update-schema-to-change-column-type-or-name