檢查專用 SQL 集區上的統計資料正確性
適用於:Azure Synapse Analytics
更新的統計資料對於產生最佳執行計畫非常重要。 您應該評估兩個不同的檢視方塊,因為它與判斷統計數據精確度有關:
步驟 1:驗證控制節點資料列計數精確度
在專用 SQL 集區中,建立分散式查詢計劃的主要引擎必須持續更新計算節點上的數據列數目。 執行下列查詢,以識別數據列計數之間有差異的數據表:
SELECT objIdsWithStats.[object_id]
,actualRowCounts.[schema]
,actualRowCounts.logical_table_name
,statsRowCounts.stats_row_count
,actualRowCounts.actual_row_count
,row_count_difference = CASE
WHEN actualRowCounts.actual_row_count >= statsRowCounts.stats_row_count
THEN actualRowCounts.actual_row_count - statsRowCounts.stats_row_count
ELSE statsRowCounts.stats_row_count - actualRowCounts.actual_row_count
END
,percent_deviation_from_actual = CASE
WHEN actualRowCounts.actual_row_count = 0
THEN statsRowCounts.stats_row_count
WHEN statsRowCounts.stats_row_count = 0
THEN actualRowCounts.actual_row_count
WHEN actualRowCounts.actual_row_count >= statsRowCounts.stats_row_count
THEN CONVERT(NUMERIC(18, 0), CONVERT(NUMERIC(18, 2), (actualRowCounts.actual_row_count - statsRowCounts.stats_row_count)) / CONVERT(NUMERIC(18, 2), actualRowCounts.actual_row_count) * 100)
ELSE CONVERT(NUMERIC(18, 0), CONVERT(NUMERIC(18, 2), (statsRowCounts.stats_row_count - actualRowCounts.actual_row_count)) / CONVERT(NUMERIC(18, 2), actualRowCounts.actual_row_count) * 100)
END
,'UPDATE STATISTICS ' + quotename(actualRowCounts.[schema]) + '.' + quotename(actualRowCounts.logical_table_name) + ';' as update_stats_stmt
FROM (
SELECT DISTINCT object_id
FROM sys.stats
WHERE stats_id > 1
) objIdsWithStats
LEFT JOIN (
SELECT object_id
,sum(rows) AS stats_row_count
FROM sys.partitions
GROUP BY object_id
) statsRowCounts ON objIdsWithStats.object_id = statsRowCounts.object_id
LEFT JOIN (
SELECT sm.name [schema]
,tb.name logical_table_name
,tb.object_id object_id
,SUM(rg.row_count) actual_row_count
FROM sys.schemas sm
INNER JOIN sys.tables tb ON sm.schema_id = tb.schema_id
INNER JOIN sys.pdw_table_mappings mp ON tb.object_id = mp.object_id
INNER JOIN sys.pdw_nodes_tables nt ON nt.name = mp.physical_name
INNER JOIN sys.dm_pdw_nodes_db_partition_stats rg ON rg.object_id = nt.object_id
AND rg.pdw_node_id = nt.pdw_node_id
AND rg.distribution_id = nt.distribution_id
INNER JOIN sys.indexes ind on tb.object_id = ind.object_id
WHERE rg.index_id < 2 -- In case this condition removed the number of rows will gets duplicated based on the number of index.
AND ind.type_desc IN ('CLUSTERED COLUMNSTORE', 'HEAP') -- Switch between the CCI (Column store) and HEAP, You should at least keep one value or else the total number of rows will gets duplicated based on the number of indexes.
GROUP BY sm.name
,tb.name
,tb.object_id
) actualRowCounts ON objIdsWithStats.object_id = actualRowCounts.object_id
步驟 2:確定統計數據是最新的
更新數據可能會大幅影響用來產生有效執行計劃的統計數據直方圖。 執行下列查詢,以判斷統計數據的最後一個更新日期是否符合數據表的修改模式:
SELECT ob.[object_id],max(sm.[name]) AS [schema_name]
,max(tb.[name]) AS [table_name]
,st.[stats_id]
,max(st.[name]) AS [stats_name]
,string_agg(co.[name], ',') AS [stats_column_names]
,STATS_DATE(ob.[object_id], st.[stats_id]) AS [stats_last_updated_date]
,'UPDATE STATISTICS ' + quotename(max(sm.[name])) + '.' + quotename(max(tb.[name])) + ';' as [update_stats_stmt]
FROM sys.objects ob
JOIN sys.stats st ON ob.[object_id] = st.[object_id]
JOIN sys.stats_columns sc ON st.[stats_id] = sc.[stats_id]
AND st.[object_id] = sc.[object_id]
JOIN sys.columns co ON sc.[column_id] = co.[column_id]
AND sc.[object_id] = co.[object_id]
JOIN sys.types ty ON co.[user_type_id] = ty.[user_type_id]
JOIN sys.tables tb ON co.[object_id] = tb.[object_id]
JOIN sys.schemas sm ON tb.[schema_id] = sm.[schema_id]
WHERE st.[stats_id] > 1
GROUP BY ob.[object_id], st.[stats_id]
ORDER BY stats_last_updated_date
步驟 3:更新所識別資料表的統計資料
在識別先前步驟中的候選數據表之後,請執行查詢數據行中 update_stats_stmt
產生的語句來更新統計數據。
注意
即使使用者建立,也不建議您更新個別統計數據。 藉由執行 UPDATE STATISTICS
而不指定統計資料名稱,將會更新與資料表相關聯的所有統計資料,以及控制節點資料列的計數。 您可以考慮使用 WITH FULLSCAN
或 來覆寫默認掃描百分比,以 WITH SAMPLE <SamplePercent> PERCENT
達到適合大型數據表的正確性。 如需完整語法,請參閱更新統計資料 (Transact-SQL)。
例如:
UPDATE STATISTICS [dbo].[MyAwesomeTable];
更新統計資料之後,請重新執行問題查詢,以判斷統計資料更新是否已改善執行持續時間。