apache-sparkpysparkazure-databricksdelta-laketable-partitioning

How to perform insert overwrite dynamically on partitions of Delta file using PySpark?


I'm new to pyspark and looking for overwriting a delta partition dynamically. From the other resources available online I could see that spark supports dynamic partition by setting the below conf as "dynamic"

spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")

However, when I try overwriting the partitioned_table with a dataframe, the below line of code in pyspark (databricks) overwrites the entire table instead of a single partition on delta file.

data.write.insertInto("partitioned_table", overwrite = True)

I did come across the option of using Hive external table, but it is not straight forward in my case since the partitioned_table is based out of Delta file.

Please let me know what am I missing here. Thanks in advance!


Solution

  • Look at this issue and details regarding dynamic overwrite on delta tables : https://github.com/delta-io/delta/issues/348

    You can use replaceWhere