Query insights in Fabric data warehousing

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

In Microsoft Fabric, the query insights feature is a scalable, sustainable, and extendable solution to enhance the SQL analytics experience. With historical query data, aggregated insights, and access to actual query text, you can analyze and tune your query performance. QI provides information on queries run in a user's context only, system queries aren't considered.

The query insights feature provides a central location for historic query data and actionable insights for 30 days, helping you to make informed decisions to enhance the performance of your Warehouse or SQL analytics endpoint. When a SQL query runs in Microsoft Fabric, the query insights feature collect and consolidates its execution data, providing you with valuable information. You can view complete query text for Admin, Member, and Contributor roles.

  • Historical Query Data: The query insights feature stores historical data about query executions, enabling you to track performance changes over time. System queries aren't stored in query insights.
  • Aggregated Insights: The query insights feature aggregates query execution data into insights that are more actionable, such as identifying long-running queries or most active users. These aggregations are based on the query shape. For more information, see How are similar queries aggregated to generate insights?

Before you begin

You should have access to a SQL analytics endpoint or Warehouse within a Premium capacity workspace with contributor or higher permissions.

When do you need query insights?

The query insights feature addresses several questions and concerns related to query performance and database optimization, including:

Query Performance Analysis

  • What is the historical performance of our queries?
  • Are there any long-running queries that need attention?
  • Can we identify the queries causing performance bottlenecks?
  • Was cache utilized for my queries?
  • Which queries are consuming the most CPU?

Query Optimization and Tuning

  • Which queries are frequently run, and can their performance be improved?
  • Can we identify queries that have failed or been canceled?
  • Can we track changes in query performance over time?
  • Are there any queries that consistently perform poorly?

User Activity Monitoring

  • Who submitted a particular query?
  • Who are the most active users or the users with the most long-running queries?

There are three system views to provide answers to these questions:

Where can you see query insights?

Autogenerated views are under the queryinsights schema in SQL analytics endpoint and Warehouse. In the Fabric Explorer of a Warehouse for example, find query insights views under Schemas, queryinsights, Views.

Screenshot from the Fabric Explorer showing where to find query insights views under Schemas, queryinsights, Views.

After your query completes execution, you see its execution data in the queryinsights views of the Warehouse or SQL analytics endpoint you were connected to. If you run a cross-database query while in the context of WH_2, your query appears in the query insights of WH_2. Completed queries can take up to 15 minutes to appear in query insights depending on the concurrent workload being executed. The time taken for queries to appear in query insights increases with increase in concurrent queries being executed.

How are similar queries aggregated to generate insights?

Queries are considered the same by the Query Insights if the queries have the same shape, even if the predicates may be different.

You can utilize the query hash column in the views to analyze similar queries and drill down to each execution.

For example, the following queries are considered the same after their predicates are parameterized:

SELECT * FROM Orders
WHERE OrderDate BETWEEN '1996-07-01' AND '1996-07-31';

and

SELECT * FROM Orders
WHERE OrderDate BETWEEN '2000-07-01' AND '2006-07-31';

Examples

Identify queries run by you in the last 30 minutes

The following query uses queryinsights.exec_requests_history and the built-in USER_NAME() function, which returns your current session user name.

SELECT * FROM queryinsights.exec_requests_history 
WHERE start_time >= DATEADD(MINUTE, -30, GETUTCDATE())
AND login_name = USER_NAME();

Identify top CPU consuming queries by CPU time

The following query returns the top 100 queries by allocated CPU time.

SELECT TOP 100 distributed_statement_id, query_hash, allocated_cpu_time_ms, label, command
FROM queryinsights.exec_requests_history
ORDER BY allocated_cpu_time_ms DESC;

Identify which queries are scanning most data from remote rather than cache

You can determine if the large data scanning during query execution is slowing down your query and make decisions to tweak your query code accordingly. This analysis allows you to compare different query executions and identify if the variance in the amount of data scanned is the reason for performance changes.

Furthermore, you can assess the use of cache by examining the sum of data_scanned_memory_mb and data_scanned_disk_mb, and comparing it to the data_scanned_remote_storage_mb for past executions.

Note

The data scanned values might not account the data moved during the intermediate stages of query execution. In some cases, the size of the data moved and CPU required to process may be larger than the data scanned value indicates.

SELECT distributed_statement_id, query_hash, data_scanned_remote_storage_mb, data_scanned_memory_mb, data_scanned_disk_mb, label, command
FROM queryinsights.exec_requests_history
ORDER BY data_scanned_remote_storage_mb DESC;

Identify the most frequently run queries using a substring in the query text

The following query returns the most recent queries that match a certain string, ordered by the number of successful executions descending.

SELECT * FROM queryinsights.frequently_run_queries
WHERE last_run_command LIKE '%<some_label>%'
ORDER BY number_of_successful_runs DESC;

Identify long-running queries using a substring in the query text

The following query returns the queries that match a certain string, ordered by the median query execution time descending.

SELECT * FROM queryinsights.long_running_queries
WHERE last_run_command LIKE '%<some_label>%'
ORDER BY median_total_elapsed_time_ms DESC;