databricksazure-databricksdelta-laketimedelta

Databricks DeltaLake : Cannot time travel Delta table to version 1. Available versions: [3, 23]


I have been using the following code to determine the latest table using Databricks TimeTravel feature for the past few years without any issues. I recently added a new row to the table that I have been using the code on but now I'm getting the error:

AnalysisException: Cannot time travel Delta table to version 1. Available versions: [3, 23].

This is very strange that this should be happening now.

The code is as follows:

from delta.tables import DeltaTable import pyspark.sql.functions

dt = DeltaTable.forPath(spark, saveloc)
latest_version = int(dt.history().select(max(col("version"))).collect()[0][0])
lastest_table_dropped = spark.read.format("delta").option("versionAsof", latest_version).load(saveloc).createOrReplaceTempView('maxversion')
start_table_dropped = spark.read.format("delta").option("versionAsof", 1).load(saveloc).createOrReplaceTempView('allprior')

I appreciate the that it has been determined by Databricks that the latest version is 3, but I don't understand where it's now not possible to use the latest version of 1?

My dt history is as follows:

enter image description here


Solution

  • If predictive optimization was enabled on your account, then vacuums could be happening automatically. You can check the system table to see what vacuums are occurring due to predictive optimization:

    SELECT *
    FROM system.storage.predictive_optimization_operations_history
    WHERE operation_type = "VACUUM"
    

    Predictive optimization system table reference