檢查專用 SQL 集區上的統計資料正確性

發行項
11/20/2024

適用於：Azure Synapse Analytics

更新的統計資料對於產生最佳執行計畫非常重要。您應該評估兩個不同的檢視方塊，因為它與判斷統計數據精確度有關：

步驟 1：驗證控制節點資料列計數精確度

在專用 SQL 集區中，建立分散式查詢計劃的主要引擎必須持續更新計算節點上的數據列數目。執行下列查詢，以識別數據列計數之間有差異的數據表：

SELECT objIdsWithStats.[object_id]
    ,actualRowCounts.[schema]
    ,actualRowCounts.logical_table_name
    ,statsRowCounts.stats_row_count
    ,actualRowCounts.actual_row_count
    ,row_count_difference = CASE 
        WHEN actualRowCounts.actual_row_count >= statsRowCounts.stats_row_count
            THEN actualRowCounts.actual_row_count - statsRowCounts.stats_row_count
        ELSE statsRowCounts.stats_row_count - actualRowCounts.actual_row_count
        END
    ,percent_deviation_from_actual = CASE 
        WHEN actualRowCounts.actual_row_count = 0
            THEN statsRowCounts.stats_row_count
        WHEN statsRowCounts.stats_row_count = 0
            THEN actualRowCounts.actual_row_count
        WHEN actualRowCounts.actual_row_count >= statsRowCounts.stats_row_count
            THEN CONVERT(NUMERIC(18, 0), CONVERT(NUMERIC(18, 2), (actualRowCounts.actual_row_count - statsRowCounts.stats_row_count)) / CONVERT(NUMERIC(18, 2), actualRowCounts.actual_row_count) * 100)
        ELSE CONVERT(NUMERIC(18, 0), CONVERT(NUMERIC(18, 2), (statsRowCounts.stats_row_count - actualRowCounts.actual_row_count)) / CONVERT(NUMERIC(18, 2), actualRowCounts.actual_row_count) * 100)
        END
    ,'UPDATE STATISTICS ' + quotename(actualRowCounts.[schema]) + '.' + quotename(actualRowCounts.logical_table_name) + ';' as update_stats_stmt
FROM (
    SELECT DISTINCT object_id
    FROM sys.stats
    WHERE stats_id > 1
    ) objIdsWithStats
LEFT JOIN (
    SELECT object_id
        ,sum(rows) AS stats_row_count
    FROM sys.partitions
    GROUP BY object_id
    ) statsRowCounts ON objIdsWithStats.object_id = statsRowCounts.object_id
LEFT JOIN (
    SELECT sm.name [schema]
        ,tb.name logical_table_name
        ,tb.object_id object_id
        ,SUM(rg.row_count) actual_row_count
    FROM sys.schemas sm
    INNER JOIN sys.tables tb ON sm.schema_id = tb.schema_id
    INNER JOIN sys.pdw_table_mappings mp ON tb.object_id = mp.object_id
    INNER JOIN sys.pdw_nodes_tables nt ON nt.name = mp.physical_name
    INNER JOIN sys.dm_pdw_nodes_db_partition_stats rg ON rg.object_id = nt.object_id
        AND rg.pdw_node_id = nt.pdw_node_id
        AND rg.distribution_id = nt.distribution_id
    INNER JOIN sys.indexes ind on tb.object_id = ind.object_id
    WHERE rg.index_id < 2 -- In case this condition removed the number of rows will gets duplicated based on the number of index.
    AND ind.type_desc IN ('CLUSTERED COLUMNSTORE', 'HEAP') -- Switch between the CCI (Column store) and HEAP, You should at least keep one value or else the total number of rows will gets duplicated based on the number of indexes.
    GROUP BY sm.name
        ,tb.name
        ,tb.object_id
    ) actualRowCounts ON objIdsWithStats.object_id = actualRowCounts.object_id

步驟 2：確定統計數據是最新的

更新數據可能會大幅影響用來產生有效執行計劃的統計數據直方圖。執行下列查詢，以判斷統計數據的最後一個更新日期是否符合數據表的修改模式：

SELECT ob.[object_id],max(sm.[name]) AS [schema_name]
    ,max(tb.[name]) AS [table_name]
    ,st.[stats_id]
    ,max(st.[name]) AS [stats_name]
    ,string_agg(co.[name], ',') AS [stats_column_names]
    ,STATS_DATE(ob.[object_id], st.[stats_id]) AS [stats_last_updated_date]
    ,'UPDATE STATISTICS ' + quotename(max(sm.[name])) + '.' + quotename(max(tb.[name])) + ';' as [update_stats_stmt]
FROM sys.objects ob
JOIN sys.stats st ON ob.[object_id] = st.[object_id]
JOIN sys.stats_columns sc ON st.[stats_id] = sc.[stats_id]
    AND st.[object_id] = sc.[object_id]
JOIN sys.columns co ON sc.[column_id] = co.[column_id]
    AND sc.[object_id] = co.[object_id]
JOIN sys.types ty ON co.[user_type_id] = ty.[user_type_id]
JOIN sys.tables tb ON co.[object_id] = tb.[object_id]
JOIN sys.schemas sm ON tb.[schema_id] = sm.[schema_id]
WHERE st.[stats_id] > 1
GROUP BY ob.[object_id], st.[stats_id]
ORDER BY stats_last_updated_date

步驟 3：更新所識別資料表的統計資料

在識別先前步驟中的候選數據表之後，請執行查詢數據行中 update_stats_stmt 產生的語句來更新統計數據。

注意

即使使用者建立，也不建議您更新個別統計數據。藉由執行 UPDATE STATISTICS 而不指定統計資料名稱，將會更新與資料表相關聯的所有統計資料，以及控制節點資料列的計數。您可以考慮使用 WITH FULLSCAN 或來覆寫默認掃描百分比，以 WITH SAMPLE <SamplePercent> PERCENT 達到適合大型數據表的正確性。如需完整語法，請參閱更新統計資料 (Transact-SQL)。

例如：

UPDATE STATISTICS [dbo].[MyAwesomeTable];

更新統計資料之後，請重新執行問題查詢，以判斷統計資料更新是否已改善執行持續時間。

共用方式為

檢查專用 SQL 集區上的統計資料正確性

步驟 1：驗證控制節點資料列計數精確度

步驟 2：確定統計數據是最新的

步驟 3：更新所識別資料表的統計資料

更多統計數據維護資源

意見反應

其他資源