I have been using the following code to determine the latest table using Databricks TimeTravel feature for the past few years without any issues. I recently added a new row to the table that I have been using the code on but now I'm getting the error:
AnalysisException: Cannot time travel Delta table to version 1. Available versions: [3, 23].
This is very strange that this should be happening now.
The code is as follows:
from delta.tables import DeltaTable import pyspark.sql.functions
dt = DeltaTable.forPath(spark, saveloc)
latest_version = int(dt.history().select(max(col("version"))).collect()[0][0])
lastest_table_dropped = spark.read.format("delta").option("versionAsof", latest_version).load(saveloc).createOrReplaceTempView('maxversion')
start_table_dropped = spark.read.format("delta").option("versionAsof", 1).load(saveloc).createOrReplaceTempView('allprior')
I appreciate the that it has been determined by Databricks that the latest version is 3, but I don't understand where it's now not possible to use the latest version of 1?
My dt history is as follows:
AnalysisException: Cannot time travel Delta table to version 1.Available versions: [3, 23].
The error you are getting indicates that the Delta table no longer has version 1 available for time travel.
This will happen due to the retention policies and VACUUM
operations that manage the storage of old versions of the table.
Delta Lake retains table versions based on the retention threshold for transaction log files and the frequency and specified retention for VACUUM
operations.
If you run VACUUM
with the default settings, you will only be able to time travel up to the last 7 days.
Any older versions, like version 1 in your case, might have been removed if they’re beyond the retention period.
Set delta.checkpointRetentionDuration
to X days to retain checkpoints longer, allowing access to older versions.
Next, execute the following command on your Delta table:
spark.sql(f"""
ALTER TABLE delta.`path`
SET TBLPROPERTIES (
delta.logRetentionDuration = 'interval X days',
delta.deletedFileRetentionDuration = 'interval X days',
delta.checkpointRetentionDuration = 'X days'
)
""")
Reference: Remove unused data files with vacuum Work with Delta Lake table history