Friday, June 12, 2009

SQL Server Auto Statistics

I was asked a question, on the MSDN forums, regarding a custom function to automatically update statistics when a threshold is breached.  I went on to say that SQL Server already handles this for us automatically. Also that he should use sp_updatestats or UPDATE STATISTICS, if manual statistic updates were needed.  I wanted to briefly discuss some of the SQL Server built-in functionality regarding statistics. SQL Server has two options enabled by default, which help the database engine create and manage statistics.  These two options are AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS. 

First let’s talk a about what statistics are and how the optimizer uses them.  Statistics are the information used to represent the distribution of column values, within a table or indexed view.  The optimizer uses this statistical information to make cardinality estimates.  So what does this mean…  it means the optimizer uses the information to make a estimate of the number of rows, in the query result. The more accurate the estimate, the better the query plan is.  In some cases, good statistics can make the difference between getting an index seek and getting an index scan.

Now that we know more about what statistics are, let’s talk about how SQL Server creates statistics.  The AUTO_CREATE_STATISTICS option allows the database engine to automatically create statistics on individual columns that appear in query predicates.  You can check statistics by querying the sys.stats catalog view.  When statistics are automatically created they are prefixed with “_WA”, but also have the flag auto_created set to 1.

SELECT 
    OBJECT_NAME(s.object_id) AS object_name,
    COL_NAME(sc.object_id, sc.column_id) AS column_name,
    s.name AS statistics_name
FROM sys.stats s 
INNER JOIN sys.stats_columns sc
    ON  s.stats_id = sc.stats_id 
        AND s.OBJECT_ID = sc.Object_id    
INNER JOIN sys.tables t
    ON t.OBJECT_ID = s.object_id    
WHERE 
    s.name like '_WA%'
    AND s.auto_created = 1
    AND t.TYPE = 'U'
    AND t.is_ms_shipped = 0
ORDER BY 
    s.name;

You should note that auto created statistics only are present on a single column, to obtain more dense statistical data you will need to create an index or manual statistics. Now that we have statistics, how does this help our query run faster? In short, if a query predicate contains a column with statistics, the optimizer does not have to guess at the number of rows affected by the query, thus the optimizer has enough information to intelligently create the query plan.

Now that we have statistics, we can rely on the database engine to keep them up. SQL Server uses AUTO_UPDATE_STATISTICS (enabled by default) to manage statistics.  Statistics are checked before query compilation or before executing a cached query plan. If a table has just been issued a large insert/delete transaction,the next query may experience slowness or delay because the statistics  have to be checked and updated. In this type of situation it is best to update the statistics manually, after the DML operation.  So how does the optimizer know the statistics are out-of-date? When checking statistics, the optimizer compares the number of data modifications, since the last update, against a threshold.  If the number of data modifications is greater than the threshold, the statistics are updated.  The technical threshold limits are listed below.

Statistics are considered out-of-date when:

  1. The table size has gone from 0 to >0 rows.

  2. The number of rows in the table when the statistics were gathered was 500 or less, and the column modification counter of the leading column of the statistics object has changed by more than 500 since then.

  3. The table had more than 500 rows when the statistics were gathered, and the column modification counter of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered.

  4. If the statistics object is defined on a temporary table, it is out of date as discussed above, except that there is an additional threshold for recomputation at 6 rows, with a test otherwise identical to test 2 in the previous list.

You can view the statistical information for an object by executing DBCC SHOW_STATISTICS.  Here is a sample using the Adventure Works database:

DBCC show_statistics('[Production].ProductProductPhoto','_WA_Sys_00000002_01142BA1')

DBCC SHOW_STATISTICS: http://msdn.microsoft.com/en-us/library/ms174384.aspx

This is how SQL Server creates and manages statistics.  Remember that just because SQL Server does this for you, does not mean you can forget about it.  You always want to stay on top of statistics, like indexes. In some cases, AUTO_UPDATE_STATISTICS is not enough and using sp_updatestats every so often may help alleviate query performance issues. The bottom line is, if your statistics are bad, there is a good chance the optimizer may miss index seeks or choose a sub optimal joins.There is a lot more to know about statistics and I have hardly scratched the surface here.  If you would like to learn more about SQL Server Statistics, you should take a look at the referenced links, as these have a wealth of information.

References:

http://msdn.microsoft.com/en-us/library/ms190397.aspx

http://technet.microsoft.com/en-us/library/cc966419.aspx

2 comments:

daspeac said...

I have heard about another way of repair a pdf file. Besides, you can visit my blogs at: http://daspeac.livejournal.com/ or http://daspeac.blogspot.com/ where I’m trying to share my experience with regard to data corruption issues.

Mark Willium said...

Repair and restore corrupt sql server database by advanced sql recovery application.