databaseamazon-redshiftquery-optimizationvacuum

Why Redshift vacuum + analyse do not update stats_off value in svv_table_info?


Checking stats_off in svv_table_info I saw that some tables have around 10.0 of this stats off.

In AWS docs the description for that field is:

Number that indicates how stale the table's statistics are; 0 is current, 100 is out of date.

In other docs I found that to update statistics you have to run ANALYZE statement.

I run several times VACUUM FULL and ANALYZE and stats_off didn't change at all. On the other hand the unsorted row values decreased to almost.

Why is this happening? Maybe is because 10 is a valid value and Redshift do not update stats despite forcing it?


Solution

  • First off Vacuum reorders rows / removed unneeded dead rows and doesn't updates stats. Analyze updates table stats.

    Analyze has a "threshold" value that it uses to determine if it should update stats or skip a table. The default value for this is 10 percent. So by default if 10% or less of the rows are unchanged it won't do anything. This threshold can be configured so you can run analyze to any threshold you like. For example run this before analyze:

    set analyze_threshold_percent to 1;
    

    and analyze will update stats if 1% of rows have changed.

    See - https://docs.aws.amazon.com/redshift/latest/dg/r_ANALYZE.html

    As state above vacuum reorders tables so it will cause row changes if it needs to perform work on a table. However, it doesn't change things if it doesn't need to. So you cannot be sure that vacuum will create enough changes to cross the analyze threshold.

    Just remember that repeatedly running analyze with a low threshold may waste compute resources updating stats in minor ways that will have no positive impact on database performance.