Query history system table reference
Important
This system table is in Public Preview. To access the table, the schema must be enabled in your system
catalog. For more information, see Enable system table schemas.
This article includes information on the query history system table, including an outline of the table’s schema.
Important
To access the query history system table, you must enable the query
schema. For instructions on enabling system schemas, see Enable system table schemas.
Table path: This system table is located at system.query.history
.
Using the query history table
The query history table includes records for queries run using SQL warehouses or serverless compute for notebooks and jobs. The table includes account-wide records from all workspaces in the same region from which you access the table.
By default, only admins have access to the system table. If you would like to share the table’s data with a user or group, Databricks recommends creating a dynamic view for each user or group. See Create a dynamic view.
Query history system table schema
The query history table uses the following schema:
Column name | Data type | Description | Example |
---|---|---|---|
account_id |
string | ID of the account. | 11e22ba4-87b9-4cc2 -9770-d10b894b7118 |
workspace_id |
string | The ID of the workspace where the query was run. | 1234567890123456 |
statement_id |
string | The ID that uniquely identifies the execution of the statement. You can use this ID to find the statement execution in the Query History UI. | 7a99b43c-b46c-432b -b0a7-814217701909 |
session_id |
string | The Spark session ID. | 01234567-cr06-a2mp -t0nd-a14ecfb5a9c2 |
execution_status |
string | The statement termination state. Possible values are: - FINISHED : execution was successful- FAILED : execution failed with the reason for failure described in the accompanying error message- CANCELED : execution was canceled |
FINISHED |
compute |
struct | A struct that represents the type of compute resource used to run the statement and the ID of the resource where applicable. The type value will be either WAREHOUSE or SERVERLESS_COMPUTE . |
{ type: WAREHOUSE, cluster_id: NULL, warehouse_id: ec58ee3772e8d305 } |
executed_by_user_id |
string | The ID of the user who ran the statement. | 2967555311742259 |
executed_by |
string | The email address or username of the user who ran the statement. | example@databricks.com |
statement_text |
string | Text of the SQL statement. If you have configured customer-managed keys, statement_text is empty. |
SELECT 1 |
statement_type |
string | The statement type. For example: ALTER , COPY , andINSERT . |
SELECT |
error_message |
string | Message describing the error condition. If you have configured customer-managed keys, error_message is empty. |
[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission SELECT on table 'default.nyctaxi_trips'. |
client_application |
string | Client application that ran the statement. For example: Databricks SQL Editor, Tableau, and Power BI. This field is derived from information provided by client applications. While values are expected to remain static over time, this cannot be guaranteed. | Databricks SQL Editor |
client_driver |
string | The connector used to connect to Azure Databricks to run the statement. For example: Databricks SQL Driver for Go, Databricks ODBC Driver, Databricks JDBC Driver. | Databricks JDBC Driver |
total_duration_ms |
bigint | Total execution time of the statement in milliseconds ( excluding result fetch time ). | 1 |
waiting_for_compute_duration_ms |
bigint | Time spent waiting for compute resources to be provisioned in milliseconds. | 1 |
waiting_at_capacity_duration_ms |
bigint | Time spent waiting in queue for available compute capacity in milliseconds. | 1 |
execution_duration_ms |
bigint | Time spent executing the statement in milliseconds. | 1 |
compilation_duration_ms |
bigint | Time spent loading metadata and optimizing the statement in milliseconds. | 1 |
total_task_duration_ms |
bigint | The sum of all task durations in milliseconds. This time represents the combined time it took to run the query across all cores of all nodes. It can be significantly longer than the wall-clock duration if multiple tasks are executed in parallel. It can be shorter than the wall-clock duration if tasks wait for available nodes. | 1 |
result_fetch_duration_ms |
bigint | Time spent, in milliseconds, fetching the statement results after the execution finished. | 1 |
start_time |
timestamp | The time when Databricks received the request. Timezone information is recorded at the end of the value with +00:00 representing UTC. |
2022-12-05T00:00:00.000+0000 |
end_time |
timestamp | The time the statement execution ended, excluding result fetch time. Timezone information is recorded at the end of the value with +00:00 representing UTC. |
2022-12-05T00:00:00.000+00:00 |
update_time |
timestamp | The time the statement last received a progress update. Timezone information is recorded at the end of the value with +00:00 representing UTC. |
2022-12-05T00:00:00.000+00:00 |
read_partitions |
bigint | The number of partitions read after pruning. | 1 |
pruned_files |
bigint | The number of pruned files. | 1 |
read_files |
bigint | The number of files read after pruning. | 1 |
read_rows |
bigint | Total number of rows read by the statement. | 1 |
produced_rows |
bigint | Total number of rows returned by the statement. | 1 |
read_bytes |
bigint | Total size of data read by the statement in bytes. | 1 |
read_io_cache_percent |
int | The percentage of bytes of persistent data read from the IO cache. | 50 |
from_result_cache |
boolean | TRUE indicates that the statement result was fetched from the cache. |
TRUE |
spilled_local_bytes |
bigint | Size of data, in bytes, temporarily written to disk while executing the statement. | 1 |
written_bytes |
bigint | The size in bytes of persistent data written to cloud object storage. | 1 |
shuffle_read_bytes |
bigint | The total amount of data in bytes sent over the network. | 1 |
query_source |
struct | A struct that contains key-value pairs representing one or more Databricks entities that were involved in the execution of this statement, such as jobs, notebooks, or dashboards. This field only records Databricks entities. | { job_info: { job_id: 64361233243479 job_run_id: 887406461287882 job_task_key: “job_task_1” job_task_run_id: 110378410199121 } |
executed_as |
string | The name of the user or service principal whose privilege was used to run the statement. | example@databricks.com |
executed_as_user_id |
string | The ID of the user or service principal whose privilege was used to run the statement. | 2967555311742259 |
View the query profile for a record
To navigate to a query’s query profile based on a record in the query history table, do the following:
- Identify the record of interest, then copy the record’s
statement_id
. - Reference the record’s
workspace_id
to ensure you are logged in to the same workspace as the record. - Click Query History in the workspace sidebar.
- In the Statement ID field, paste the
statement_id
on the record. - Click the name of a query. An overview of query metrics appears.
- Click See query profile.
Materialize the query history from your metastore
The following code can be used to create a job running hourly, daily, or weekly to materialize the query history from a metastore. Adjust the HISTORY_TABLE_PATH
and LOOKUP_PERIOD_DAYS
variables accordingly.
from delta.tables import *
from pyspark.sql.functions import *
from pyspark.sql.types import *
HISTORY_TABLE_PATH = "jacek.default.history"
# Adjust the lookup period according to your job schedule
LOOKUP_PERIOD_DAYS = 1
def table_exists(table_name):
try:
spark.sql(f"describe table {table_name}")
return True
except Exception:
return False
def save_as_table(table_path, df, schema, pk_columns):
deltaTable = (
DeltaTable.createIfNotExists(spark)
.tableName(table_path)
.addColumns(schema)
.execute()
)
merge_statement = " AND ".join([f"logs.{col}=newLogs.{col}" for col in pk_columns])
result = (
deltaTable.alias("logs")
.merge(
df.alias("newLogs"),
f"{merge_statement}",
)
.whenNotMatchedInsertAll()
.whenMatchedUpdateAll()
.execute()
)
result.show()
def main():
df = spark.read.table("system.query.history")
if table_exists(HISTORY_TABLE_PATH):
df = df.filter(f"update_time >= CURRENT_DATE() - INTERVAL {LOOKUP_PERIOD_DAYS} days")
else:
print(f"Table {HISTORY_TABLE_PATH} does not exist. Proceeding to copy the whole source table.")
save_as_table(
HISTORY_TABLE_PATH,
df,
df.schema,
["workspace_id", "statement_id"]
)
main()