databricksdelta-lake

Databricks - How to change a partition of an existing Delta table?


I have a table in Databricks delta which is partitioned by transaction_date. I want to change the partition column to view_date. I tried to drop the table and then create it with a new partition column using PARTITIONED BY (view_date).

However my attempt failed since the actual files reside in S3 and even if I drop a hive table the partitions remain the same. Is there any way to change the partition of an existing Delta table? Or the only solution will be to drop the actual data and reload it with a newly indicated partition column?


Solution

  • There's actually no need to drop tables or remove files. All you need to do is read the current table, overwrite the contents AND the schema, and change the partition column:

    val input = spark.read.table("mytable")
    
    input.write.format("delta")
      .mode("overwrite")
      .option("overwriteSchema", "true")
      .partitionBy("colB") // different column
      .saveAsTable("mytable")
    

    UPDATE: There previously was a bug with time travel and changes in partitioning that has now been fixed.