I'm new to pyspark and looking for overwriting a delta partition dynamically. From the other resources available online I could see that spark supports dynamic partition by setting the below conf as "dynamic"
spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
However, when I try overwriting the partitioned_table with a dataframe, the below line of code in pyspark (databricks) overwrites the entire table instead of a single partition on delta file.
data.write.insertInto("partitioned_table", overwrite = True)
I did come across the option of using Hive external table, but it is not straight forward in my case since the partitioned_table is based out of Delta file.
Please let me know what am I missing here. Thanks in advance!
Look at this issue and details regarding dynamic overwrite on delta tables : https://github.com/delta-io/delta/issues/348
You can use replaceWhere