amazon-redshiftvacuumamazon-redshift-serverless

Redshift VACUUM effect on concurrent queries


I have a process that runs VACUUM manually on a list of redshift tables on a daily basis to maintain consistent query performance. But sometimes, vacuuming one table takes about 2 hours. Is this normal? I was thinking of the effect of the vacuum runtime on the concurrent queries.


Solution

  • This is likely normal but some more info will be needed to be sure. Redshift vacuum sorts the table and reclaims space. The sort process is limited by the threshold percentage which determines if sorting needs to be done. The default for this is 95% so if 95% or more of the blocks in the table are marked as sorted the sorting will be skipped. If skipped the vacuum will run much faster.

    If this is a large table sorting after more than 5% of blocks have been changed may be a lot of work and take a few hours or more. Since you are running vacuum regularly you likely want it to sort the table each time so that the work doesn’t pile up. You can do this by setting the threshold. If you set it to 100 percent then Redshift will resort the table every time vacuum runs.

    VACUUM <table> to 100 percent;