Predictive optimization system table reference

Important

This system table is in Public Preview. To access the table, the schema must be enabled in your system catalog. For more information, see Enable system table schemas.

Note

To have access to this table, you must enable the storage schema (see Enable system table schemas) and your region must support predictive optimization (see Azure Databricks regions).

This article outlines the predictive optimization operation history table schema and provides sample queries. Predictive optimization optimizes your data layout for peak performance and cost efficiency. The system table tracks the operation history of this feature. For information on predictive optimization, see Predictive optimization for Unity Catalog managed tables.

Table path: This system table is located at system.storage.predictive_optimization_operations_history.

Delivery considerations

  • Data can take up to 24 hours to be populated.
  • Predictive optimization might run multiple operations on the same cluster. If so, the share of DBUs attributed to each of the multiple operations is approximated. This is why the usage_unit is set to ESTIMATED_DBU. Still, the total number of DBUs spent on the cluster will be accurate.

Predictive optimization table schema

The predictive optimization operation history system table uses the following schema:

Column name Data type Description Example
account_id string ID of the account. 11e22ba4-87b9-4cc2-9770-d10b894b7118
workspace_id string The ID of the workspace in which predictive optimization ran the operation. 1234567890123456
start_time timestamp The time at which the operation started. Timezone information is recorded at the end of the value with +00:00 representing UTC. 2023-01-09 10:00:00.000+00:00
end_time timestamp The time at which the operation ended. Timezone information is recorded at the end of the value with +00:00 representing UTC. 2023-01-09 11:00:00.000+00:00
metastore_name string The name of the metastore to which the optimized table belongs. metastore
metastore_id string The ID of the metastore to which the optimized table belongs. 5a31ba44-bbf4-4174-bf33-e1fa078e6765
catalog_name string The name of the catalog to which the optimized table belongs. catalog
schema_name string The name of the schema to which the optimized table belongs. schema
table_id string The ID of the optimized table. 138ebb4b-3757-41bb-9e18-52b38d3d2836
table_name string The name of the optimized table. table1
operation_type string The optimization operation which was performed. The value will be COMPACTION, VACUUM, ANALYZE, or CLUSTERING. COMPACTION
operation_id string The ID for the optimization operation. 4dad1136-6a8f-418f-8234-6855cfaff18f
operation_status string The status of the optimization operation. The value will be SUCCESSFUL or FAILED: INTERNAL_ERROR. SUCCESSFUL
operation_metrics map[string, string] Additional details about the specific optimization that was performed. See Operation metrics. {"number_of_output_files":"100","number_of_compacted_files":"1000","amount_of_output_data_bytes":"4000","amount_of_data_compacted_bytes":"10000"}
usage_unit string The unit of usage that this operation incurred. Can only be one value: ESTIMATED_DBU. ESTIMATED_DBU
usage_quantity decimal The amount of the usage unit that was used by this operation. 2.12

Operation metrics

The metrics recorded in the operation_metrics column vary depending on the operation type:

  • COMPACTION: number_of_compacted_files, amount_of_data_compacted_bytes, number_of_output_files, amount_of_output_data_bytes
  • VACUUM: number_of_deleted_files, amount_of_data_deleted_bytes
  • ANALYZE: amount_of_scanned_bytes, number_of_scanned_files, staleness_percentage_reduced
  • CLUSTERING: number_of_removed_files, number_of_clustered_files, amount_of_data_removed_bytes, amount_of_clustered_data_bytes

Example queries

The following sections include sample queries you can use to gain insights into the predictive optimization system table. For these queries to work, you need to replace the parameter values with your own values.

This article includes the following example queries:

How many estimated DBUs has predictive optimization used in the last 30 days?

SELECT SUM(usage_quantity)
FROM system.storage.predictive_optimization_operations_history
WHERE
     usage_unit = "ESTIMATED_DBU"
     AND  timestampdiff(day, start_time, Now()) < 30

On which tables did predictive optimization spend the most in the last 30 days (estimated cost)?

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     SUM(usage_quantity) as totalDbus
FROM system.storage.predictive_optimization_operations_history
WHERE
    usage_unit = "ESTIMATED_DBU"
    AND timestampdiff(day, start_time, Now()) < 30
GROUP BY ALL
ORDER BY totalDbus DESC

On which tables is predictive optimization performing the most operations?

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     operation_type,
     COUNT(DISTINCT operation_id) as operations
FROM system.storage.predictive_optimization_operations_history
GROUP BY ALL
ORDER BY operations DESC

For a given catalog, how many total bytes have been compacted?

SELECT
     schema_name,
     table_name,
     SUM(operation_metrics["amount_of_data_compacted_bytes"]) as bytesCompacted
FROM system.storage.predictive_optimization_operations_history
WHERE
    metastore_name = :metastore_name
    AND catalog_name = :catalog_name
    AND operation_type = "COMPACTION"
GROUP BY ALL
ORDER BY bytesCompacted DESC

What tables had the most bytes vacuumed?

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     SUM(operation_metrics["amount_of_data_deleted_bytes"]) as bytesVacuumed
FROM system.storage.predictive_optimization_operations_history
WHERE operation_type = "VACUUM"
GROUP BY ALL
ORDER BY bytesVacuumed DESC

What is the success rate for operations run by predictive optimizations?

WITH operation_counts AS (
     SELECT
           COUNT(DISTINCT (CASE WHEN operation_status = "SUCCESSFUL" THEN operation_id END)) as successes,
           COUNT(DISTINCT operation_id) as total_operations
    FROM system.storage.predictive_optimization_operations_history
 )
SELECT successes / total_operations as success_rate
FROM operation_counts