databricksdelta-lakedeltatime-travel

Impact of VACUUM and retention settings on Delta Lake


I have a table that needs to support time travel for up to 6 months. To preserve the necessary metadata and data files, I’ve already configured the table with the following properties:

ALTER TABLE table_x SET TBLPROPERTIES (
  'delta.logRetentionDuration' = 'interval 180 days',
  'delta.deletedFileRetentionDuration' = 'interval 180 days'
);

However, there are delta lake maintanance job which runs

VACUUM table_x RETAIN 720 HOURS; -- 30 days

Will running VACUUM table_x RETAIN 720 HOURS ignore the 180-day table properties and potentially delete files needed for time travel?

Thanks


Solution

  • Yes. The VACUUM command will ignore your 180-day table setting and delete files needed for time travel.

    Here is the simple rule of precedence:

    1. The RETAIN clause in the VACUUM command always wins. It overrides any other setting for that specific run.

    2. The table property (delta.deletedFileRetentionDuration) is only used as a default when you run VACUUM without a RETAIN clause.

    The result: Your maintenance job, VACUUM table_x RETAIN 720 HOURS, will permanently delete all data files that are older than 30 days, breaking your 6-month time travel capability.

    The Fix

    Modify your maintenance job to respect your 180-day policy. The best way is to remove the RETAIN clause entirely:

          -- This command will now use the 180-day property you already set on the table.
    VACUUM table_x;
        
    

    Alternatively, you can make the command explicit:

          -- This also works and is very clear.
    VACUUM table_x RETAIN 180 DAYS;