sql-serversql-server-2008t-sqltable-statistics

What is the format of TSQL STATS_STREAM (undocumented feature)? Does it contain all the Statistics Data?


I have a huge table on one server, and need to copy the table to the production server. I can do this, but then I need to generate statistics, which will take a long time on a system that I don't want to ask to handle a huge job. If I move the entire database, I need to take down the system in order to remove the existing database, which is a no-no.

Pinal Dave has a post, http://blog.sqlauthority.com/2011/01/05/sql-server-copy-statistics-from-one-server-to-another-server/ In the post, he outlines how to copy statistics from one table to a new one using Microsoft's "Generate Scripts," which allows copying via the undocumented STATS_STREAM feature. This is for a new table. I'm not sure if I can create a table with the statistics and then write in all of the data, since that should make the system want to update the statistics - possibly even if I have autogenerate statistics off.

I am using the following code: (The Table is InvTbl, and the statistics on that table I am updating is GammaTheta. The statistics exist, and the data is there.)

Update STATISTICS dbo.InvTbl [GammaTheta]  WITH STATS_STREAM = 0x0100002000...

First, does anyone know if Stats_Stream contains all of the statistics data? Then, importantly, how is it being stored in binary?

Note: The table stores a statistical distribution of two variables which we look up based on a join, and has some really important features about the distribution of the inputs. It is static, and I will never need to recalculate the values or the stats once loaded. Not using fullscan is not an option, based on testing, as join speed matters and is affected.

Note 2: No, the number does not end with an ellipsis. I left the data off.


Solution

  • If I understand your question, the stats stream is a binary representation of the statistics data and can be used to fully re-create the statistics, even on another table with different data (though the same schema). Kevin Kline blogged about that here. I took a swing at better automation of it here.