tableau-api

Tableau - Extract Size on Disk 100x Larger than "Stats for Space Usage" Admin Viz Shows


I was about to upgrade from 10.4 to 10.5.3 when I saw that the drive that we installed Tableau onto was almost full - 115GB out of 127GB was used. I ran tabadmin cleanup and now it is at 110GB used.

I searched to see what was using the space and "T:\Program Files\Tableau\Tableau Server\data\tabsvc\dataengine\extract\" is consuming 100GB. I checked on the "Stats for Space Usage" across all sites and all datasource extracts come to about 900MB. Adding in some workbooks that have been saved as .TWBX files as it comes to less than 1GB across all sites.

  1. What is the cause of this discrepancy please?
  2. What is consuming this space?
  3. How do I recover this space?

I am postponing my upgrade in order to have sufficient space so that the upgrade doesn't fail. I am also concerned about the server crashing should it run out of space - I am quite glad that I caught this issue in time!!

After the initial tabadmin cleanup I executed the following commands, but they haven't helped:

  1. tabadmin cleanup
  2. tabadmin stop
  3. tabadmin clearcache
  4. tabadmin cleanup
  5. tabadmin start

Thank you Andrew


Clarification: Where the space is used:

The directory in question. The directory in question.

The details of the data extract directory. There are 256 sub-directories, from 00 to FF. The details of the data extract directory. There are 256 sub-directories, from 00 to FF.


Solution

  • OK, time to answer my own question with what happened so that future Andrews who might also have this problem know how to solve it:

    The hardware that we were running Tableau Server on was quite old (budget constraints) so the scheduled cleanup jobs weren't running properly. Extracts that were old and were supposed to get deleted weren't getting deleted.

    I discovered this by querying the PostgreSQL database on the Tableau server itself, seeing which extracts were present in the database, and comparing that to the files that existed on the hard drive. IIRC, the files were named with GUIDs, and stored like Squid stores it's files, in subdirectories "00" to "FF" in the \data\TabSvc\DataEngine\Extract directory.

    So I created a C# program to query the internal Postgres database, get a list of extracts, iterate over the files in the extract subdirectories, and delete whatever GUIDs weren't in the database. PLEASE NOTE THAT WHEN WE MIGRATED TO NEWER HARDWARE WITH A FASTER CPU, THIS WAS NO LONGER AN ISSUE, OLD EXTRACTS WERE BEING HARVESTED (yes, they call it "harvest" or "harvesting" old extracts) CORRECTLY.

    For reference, the table was "public.extracts" (or something like that, consult Tableau's documentation for your specific version):
    https://tableau.github.io/tableau-data-dictionary/2025.1/data_dictionary.htm#public.extracts_anchor

    Edit: No, migrating to a newer version didn't solve the problem. We did several version changes and the issue remained. A faster CPU (with the same number of cores and threads, BTW) fixed it.