sys.dm_db_stats_histogram (Transact-SQL)

Άρθρο
11/18/2022

Applies to: SQL Server 2016 (13.x) and later Azure SQL Database Azure SQL Managed Instance

Returns the statistics histogram for the specified database object (table or indexed view) in the current SQL Server database. Similar to DBCC SHOW_STATISTICS WITH HISTOGRAM.

Note

This DMF is available starting with SQL Server 2016 (13.x) SP1 CU2

Syntax

sys.dm_db_stats_histogram (object_id, stats_id)

Arguments

object_id
Is the ID of the object in the current database for which properties of one of its statistics is requested. object_id is int.

stats_id
Is the ID of statistics for the specified object_id. The statistics ID can be obtained from the sys.stats dynamic management view. stats_id is int.

Table Returned

Column name	Data type	Description
object_id	int	ID of the object (table or indexed view) for which to return the properties of the statistics object.
stats_id	int	ID of the statistics object. Is unique within the table or indexed view. For more information, see sys.stats (Transact-SQL).
step_number	int	The number of step in the histogram.
range_high_key	sql_variant	Upper bound column value for a histogram step. The column value is also called a key value.
range_rows	real	Estimated number of rows whose column value falls within a histogram step, excluding the upper bound.
equal_rows	real	Estimated number of rows whose column value equals the upper bound of the histogram step.
distinct_range_rows	bigint	Estimated number of rows with a distinct column value within a histogram step, excluding the upper bound.
average_range_rows	real	Average number of rows with duplicate column values within a histogram step, excluding the upper bound (`RANGE_ROWS / DISTINCT_RANGE_ROWS` for `DISTINCT_RANGE_ROWS > 0`).

Remarks

The resultset for sys.dm_db_stats_histogram returns information similar to DBCC SHOW_STATISTICS WITH HISTOGRAM and also includes object_id, stats_id, and step_number.

Because the column range_high_key is a sql_variant data type, you may need to use CAST or CONVERT if a predicate does comparison with a non-string constant.

Histogram

A histogram measures the frequency of occurrence for each distinct value in a data set. The query optimizer computes a histogram on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view. If the histogram is created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers.

To create the histogram, the query optimizer sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps. Each step includes a range of column values followed by an upper bound column value. The range includes all possible column values between boundary values, excluding the boundary values themselves. The lowest of the sorted column values is the upper boundary value for the first histogram step.

The following diagram shows a histogram with six steps. The area to the left of the first upper boundary value is the first step.

Image of how a histogram is calculated from sampled column values.

For each histogram step:

Bold line represents the upper boundary value (range_high_key) and the number of times it occurs (equal_rows)
Solid area left of range_high_key represents the range of column values and the average number of times each column value occurs (average_range_rows). The average_range_rows for the first histogram step is always 0.
Dotted lines represent the sampled values used to estimate total number of distinct values in the range (distinct_range_rows) and total number of values in the range (range_rows). The query optimizer uses range_rows and distinct_range_rows to compute average_range_rows and does not store the sampled values.

The query optimizer defines the histogram steps according to their statistical significance. It uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values. The maximum number of steps is 200. The number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points. For example, a column with 100 distinct values can have a histogram with fewer than 100 boundary points.

Permissions

Requires that the user has select permissions on statistics columns or the user owns the table or the user is a member of the sysadmin fixed server role, the db_owner fixed database role, or the db_ddladmin fixed database role.

Examples

A. Simple example

The following example creates and populates a simple table. Then creates statistics on the Country_Name column.

CREATE TABLE Country
(Country_ID int IDENTITY PRIMARY KEY,
Country_Name varchar(120) NOT NULL);
INSERT Country (Country_Name) VALUES ('Canada'), ('Denmark'), ('Iceland'), ('Peru');

CREATE STATISTICS Country_Stats  
    ON Country (Country_Name) ;

The primary key occupies stat_id number 1, so call sys.dm_db_stats_histogram for stat_id number 2, to return the statistics histogram for the Country table.

SELECT * FROM sys.dm_db_stats_histogram(OBJECT_ID('Country'), 2);

B. Useful query:

SELECT hist.step_number, hist.range_high_key, hist.range_rows, 
    hist.equal_rows, hist.distinct_range_rows, hist.average_range_rows
FROM sys.stats AS s
CROSS APPLY sys.dm_db_stats_histogram(s.[object_id], s.stats_id) AS hist
WHERE s.[name] = N'<statistic_name>';

C. Useful query:

The following example selects from table Country with a predicate on column Country_Name.

SELECT * FROM Country 
WHERE Country_Name = 'Canada';

The following example looks at the previously created statistic on table Country and column Country_Name for the histogram step matching the predicate in the query above.

SELECT ss.name, ss.stats_id, shr.steps, shr.rows, shr.rows_sampled, 
    shr.modification_counter, shr.last_updated, sh.range_rows, sh.equal_rows
FROM sys.stats ss
INNER JOIN sys.stats_columns sc 
    ON ss.stats_id = sc.stats_id AND ss.object_id = sc.object_id
INNER JOIN sys.all_columns ac 
    ON ac.column_id = sc.column_id AND ac.object_id = sc.object_id
CROSS APPLY sys.dm_db_stats_properties(ss.object_id, ss.stats_id) shr
CROSS APPLY sys.dm_db_stats_histogram(ss.object_id, ss.stats_id) sh
WHERE ss.[object_id] = OBJECT_ID('Country') 
    AND ac.name = 'Country_Name'
    AND sh.range_high_key = CAST('Canada' AS CHAR(8));

Next steps

DBCC SHOW_STATISTICS (Transact-SQL)
Object Related Dynamic Management Views and Functions (Transact-SQL)
sys.dm_db_stats_properties (Transact-SQL)

Κοινή χρήση μέσω