Udostępnij za pośrednictwem


UPDATE STATISTICS undocumented options

If you read the Books Online page describing the UPDATE STATISTICS command, you will see that there are some undocumented options.

UPDATE STATISTICS table | view
[
{
{ index | statistics_name }
| ( { index |statistics_name } [ ,...n ] )
}
]
[ WITH
[
[ FULLSCAN ]
| SAMPLE number { PERCENT | ROWS } ]
| RESAMPLE
| <update_stats_stream_option> [ ,...n ]
]
[ [ , ] [ ALL | COLUMNS | INDEX ]
[ [ , ] NORECOMPUTE ]
] ;

<update_stats_stream_option> ::=
[ STATS_STREAM = stats_stream ]
[ ROWCOUNT = numeric_constant ]
[ PAGECOUNT = numeric contant ]

 

<update_stats_stream_option>

    This syntax is for internal use only and is not supported. Microsoft reserves the right to change this syntax at any time.

 

There is a very good reason why these options are undocumented. They are meant for testing and debugging purposes, and should never ever be used on production systems.

 

However, these options can also be extremely helpful to create examples and sample scripts to demonstrate various features and behaviors of the Query Optimizer and the related query plan shapes. We decided to explain what ROWCOUNT and PAGECOUNT do, so that to be able to use these commands in examples we’ll be posting in some future. Feel free to use these options on development systems for experimental and educational purposes, but please do not play with fire and do not use them in production!

 

As the name of these options suggest, ROWCOUNT and PAGECOUNT alter the internal metadata of the specified table or index by overriding the counters containing the row and page counts of the object. These counters are in turn read by the Query Optimizer when processing queries that access the table and/or index in question. These commands can basically cheat the Optimizer into thinking that a table or index is extremely large.

 

SQL Server’s Query Optimization process is structured in multiple stages. Further optimization stages consider a progressively larger and more sophisticated set of possible tree transformations and query optimizations. Later stages of optimization are only entered when the estimated cost of the query is sufficiently high, in order to avoid wasting precious CPU cycles against simple queries that do not need that level of sophistication anyway. The multiple optimization stages are a mean to produce efficient query plans without consuming excessive amounts of CPU. Typically, in order to make the Optimizer “think” a lot and enter these later stages it is necessary to have big tables with a large number of rows and pages, which in turn take time and space to populate. Using ROWCOUNT and PAGECOUNT allows us to exercise these code paths with relatively simple scripts that do not require an extremely complex setup phase.

 

Here is an example. When running this simple script on your SQL 2005 instance you will likely see a different query plan for the two selects before and after updating the statistics. The recompile option is used to ensure that the query plans are regenerated. From the statistics profile, you’ll also see very different estimated row counts and consequently costs.

 

use tempdb

go

create table t1(i int, j int)

go

create table t2(h int, k int)

go

set statistics profile on

go

select distinct(i) from t1

go

select * from t1, t2 where i = k order by j + k

go

update statistics t1 with rowcount = 10000, pagecount = 10000

update statistics t2 with rowcount = 100000, pagecount = 100000

go

select distinct(i) from t1 option (recompile)

go

select * from t1, t2 where i = k order by j + k option (recompile)

go

 

Enjoy!

Stefano

Comments

  • Anonymous
    July 24, 2006
    These kind of posts are exactly why I love blogs.  This could save us hours of data generation effort.  Most of the DBA's here tend to 'forget' testing with a realistic amount of data.

    DTA (ITW) also allows you to fool SQL Server into having more rows, is this based on the same principle?

  • Anonymous
    July 25, 2006
    Hi Wesley - thank you for the interest and feedback.

    I believe that DTA uses these same interfaces, but am not 100% sure.

    About testing with realistic amounts of data: i want to be perfectly clear about the fact that UPDATE STATISTICS WITH ROWCOUNT and PAGECOUNT will only fool the Optimizer into believing that the table is actually bigger (or smaller for that matter) than in reality. However, the content of the actual tables and indexes will remain perfectly intact. So while it is true that these commands will let you see what kind of query plans the Optimizer would generate against larger amounts of data, actually running these plans would not provide particularly interesting information. All query plans will complete extremely quickly when run against empty or very small tables.

    What you can do, however, is to look at the progression of the estimated query cost based on the size of the tables. If the cost grows significantly with the size – i.e. the query performances do not seem to scale with the amount of data - it might be a hint that maybe some tuning is required.

    Stefano

  • Anonymous
    July 25, 2006
    Thanks for the extra explanation.

    I do understand that nothing can compete with realistic data.  But I'd rather see people try some first optimizations with artificial statistics than not optimizing at all because they have no time to generate test data, especially since I'm typically the one who has to solve it once it goes to production :-)

    I still love the black art of query tuning though ;-)

    Wesley

  • Anonymous
    July 30, 2006
    The comment has been removed

  • Anonymous
    July 31, 2006
    Unfortunately, at the moment there is no way to change the statistics for internal hidden tables like those used for Full Text Search. This is a limitation we should definitely attempt to remove in some future release.

    Thanks you for the great feedback!
    Stefano

  • Anonymous
    November 30, 2006
    PingBack from http://www.dosql.com/blog/?p=87

  • Anonymous
    March 09, 2007
    these are undocumented for a reason as they will affect performance and you will break your sql production

  • Anonymous
    March 12, 2007
    This is very cool - an undocumented option to UPDATE STATISTICS that allows you to fool the server into

  • Anonymous
    March 17, 2007
    In SQL Server designing efficient indexes is paramount to achieving good database and application performance.

  • Anonymous
    June 09, 2009
    PingBack from http://quickdietsite.info/story.php?id=4048

  • Anonymous
    June 15, 2009
    PingBack from http://edebtsettlementprogram.info/story.php?id=24748

  • Anonymous
    June 18, 2009
    PingBack from http://thestoragebench.info/story.php?id=9514