Use materialized views in Databricks SQL

Artikkeli
03/04/2025

This article describes how to create and use materialized views in Databricks SQL to improve performance and reduce the cost of your data processing and analysis workloads.

Note

If you need to use an Azure Private Link connection with your materialized view, contact your Databricks representative.

Important

Materialized views created in Databricks SQL are backed by a serverless DLT pipeline. Your workspace must support serverless pipelines to use this functionality.

What are materialized views?

In Databricks SQL, materialized views are Unity Catalog managed tables that allow users to precompute results based on the latest version of data in source tables. Materialized views on Azure Databricks differ from other implementations as the results returned reflect the state of data when the materialized view was last refreshed rather than always updating results when the materialized view is queried. You can manually refresh materialized views or schedule refreshes.

Materialized views are powerful for data processing workloads such as extract, transform, and load (ETL) processing. Materialized views provide a simple, declarative way to process data for compliance, corrections, aggregations, or general change data capture (CDC). Materialized views reduce cost and improve query latency by pre-computing slow queries and frequently used computations. Materialized views also enable easy-to-use transformations by cleaning, enriching, and denormalizing base tables. Materialized views can reduce costs while providing a simplified end-user experience because, in some cases, they can incrementally compute changes from the base tables.

Materialized views were first supported in Azure Databricks with the launch of DLT. When you create a materialized view in a Databricks SQL warehouse, a serverless pipeline is created to process refreshes to the materialized view. You can monitor the status of refresh operations in the DLT UI or the pipelines API. See View the status of a materialized view refresh.

Requirements

To create or refresh materialized views:

You must use a Unity Catalog-enabled pro or serverless SQL warehouse.
To refresh a materialized view, you must be in the workspace that created it.

Your workspace must be in a region that supports serverless SQL warehouses.

To query materialized views:

You must be the owner of the materialized view, or have SELECT on the materialized view, along with USE SCHEMA and USE CATALOG on its parents.
You must use one of the following compute resources:
- SQL warehouse
- DLT interfaces
- Standard access mode compute (formerly shared access mode)
- Dedicated access mode (formerly single user access mode) on Databricks Runtime 15.4 and above, as long as the workspace is enabled for serverless compute. See Fine-grained access control on dedicated compute (formerly single user compute).
- Only if you are the materialized view owner: a Dedicated access mode compute resource that is running Databricks Runtime between 14.3 and 15.3.

To learn about other restrictions on using materialized views, see Limitations.

Create a materialized view

Databricks SQL materialized view CREATE operations use a Databricks SQL warehouse to create and load data in the materialized view. Creating a materialized view is a synchronous operation, which means that the CREATE MATERIALIZED VIEW command blocks until the materialized view is created and the initial data load finishes. A serverless DLT pipeline is automatically created for every Databricks SQL materialized view. When the materialized view is refreshed the DLT pipeline processes the refresh.

To create a materialized view, use the CREATE MATERIALIZED VIEW statement. To submit a create statement, use the SQL editor in the Azure Databricks UI, the Databricks SQL CLI, or the Databricks SQL API.

Note

The user who creates a materialized view is the materialized view owner and needs to have the following permissions:

SELECT privilege on the base tables referenced by the materialized view.
USE CATALOG and USE SCHEMA privileges on the catalog and schema containing the source tables for the materialized view.
USE CATALOG and USE SCHEMA privileges on the target catalog and schema for the materialized view.
CREATE TABLE and CREATE MATERIALIZED VIEW privileges on the schema containing the materialized view.

The following example creates the materialized view mv1 from the base table base_table1:

CREATE MATERIALIZED VIEW mv1
AS SELECT
  date,
  sum(sales) AS sum_of_sales
FROM
  base_table1
GROUP BY
  date;

Column comments on a base table are automatically propagated to the new materialized view. To add a schedule, table constraints, or other properties, modify the materialized view definition. To learn syntax details for defining a materialized view, see CREATE MATERIALIZED VIEW.

Set the runtime channel

Materialized views created using SQL warehouses are automatically refreshed using a DLT pipeline. DLT pipelines use the runtime in the current channel by default. See DLT release notes and the release upgrade process to learn about the release process.

Databricks recommends using the current channel for production workloads. New features are first released to the preview channel. You can set a pipeline to the preview DLT channel to test new features by specifying preview as a table property. You can specify this property when you create the table or after the table is created using an ALTER statement.

The following code example shows how to set the channel to preview in a CREATE statement:

CREATE OR REPLACE MATERIALIZED VIEW foo.default.bar
TBLPROPERTIES ('pipelines.channel' = 'preview') as
SELECT
  *
FROM
  range(5)

Load data from external systems

Databricks recommends loading external data using Lakehouse Federation for supported data sources. For information on loading data from sources not supported by Lakehouse Federation, see Data format options.

Refresh a materialized view

The REFRESH operation refreshes the materialized view to reflect the latest changes to the base table. The operation is synchronous by default, meaning that the command blocks until the refresh operation is complete. To refresh a materialized view, use the REFRESH MATERIALIZED VIEW statement. See REFRESH (MATERIALIZED VIEW or STREAMING TABLE) for details on the SQL syntax and parameters for this command. To learn more about the types of materialized views that can be incrementally refreshed, see Incremental refresh for materialized views.

To submit a refresh statement, use the SQL editor in the Azure Databricks UI, a notebook attached to a SQL warehouse, the Databricks SQL CLI, or the Databricks SQL API.

Only the owner can REFRESH the materialized view.

The following example refreshes the mv1 materialized view:

REFRESH MATERIALIZED VIEW mv1;

How are Databricks SQL materialized views refreshed?

Materialized views automatically create and use serverless DLT pipelines to process refresh operations. The refresh is managed by the DLT pipeline and the update is monitored by the Databricks SQL warehouse used to create the materialized view. Materialized views can be updated using a DLT pipeline that runs on a schedule. See Triggered vs. continuous pipeline mode.

Note

The DLT runtime cannot detect changes in non-Delta data sources. The table is still updated regularly but with a higher default trigger interval to prevent excessive recomputation from slowing down any incremental processing happening on compute.

By default, refresh operations are performed synchronously. You can also set a refresh operation to occur asynchronously. This can be set using the refresh command. See REFRESH (MATERIALIZED VIEW or STREAMING TABLE) The behavior associated with each approach is as follows:

Synchronous: A synchronous refresh prevents other operations from proceeding until the refresh is complete. If the result is needed for the next step, such as when sequencing refresh operations in orchestration tools like Databricks Jobs, use a synchronous refresh. To orchestrate materialized views with a job, use the SQL task type. See Overview of orchestration on Databricks.
Asynchronous: An asynchronous refresh starts a background job on DLT compute when a materialized view refresh begins, allowing the command to return before the data load completes. This refresh type can save on cost because the operation doesn’t necessarily hold compute capacity in the warehouse where the command is initiated. If the refresh becomes idle and no other tasks are running, the warehouse can shut down while the refresh uses other available compute. Additionally, asynchronous refreshes support starting multiple operations in parallel.

Some queries can be incrementally refreshed. See Incremental refresh for materialized views. If an incremental refresh cannot be performed, a full refresh is performed instead.

Schedule materialized view refreshes

You can configure a Databricks SQL materialized view to refresh automatically based on a defined schedule. To set a schedule, do one of the following:

Configure the schedule with the SCHEDULE clause when you create the materialized view
Add a schedule with the ALTER MATERIALIZED VIEW statement.

When a schedule is created, a new Databricks job is automatically configured to process the update.

To view the schedule, do one of the following:

Run the DESCRIBE EXTENDED statement from the SQL editor in the Azure Databricks UI.
Use Catalog Explorer to view the materialized view. The schedule is listed on the Overview tab, under Refresh status. See What is Catalog Explorer?.

View the status of a materialized view refresh

Note

Because a DLT pipeline manages materialized view refreshes, there is latency incurred by the startup time for the pipeline. This time might be in the seconds to minutes, in addition to the time required to perform the refresh.

You can view the status of a materialized view refresh by viewing the pipeline that manages the materialized view in the DLT UI or by viewing the Refresh Information returned by the DESCRIBE EXTENDED command for the materialized view.

You can also view the refresh history of a materialized view by querying the DLT event log. See View the refresh history for a materialized view.

Monitor runs using query history

You can use the query history page to access query details and query profiles that can help you identify poorly performing queries and bottlenecks in the DLT pipeline used to run your streaming table updates. For an overview of the kind of information available for query histories and query profiles, see Query history and Query profile.

Important

This feature is in Public Preview. Workspace admins can enable this feature from the Previews page. See Manage Azure Databricks Previews.

All statements related to materialized views appear in the query history. You can use the Statement drop-down filter to select any command and inspect the related queries. All CREATE statements are followed by a REFRESH statement that executes asynchronously on a DLT pipeline. The REFRESH statements typically include detailed query plans that provide insights into optimizing performance.

To access REFRESH statements in the query history UI, use the following steps:

Click in the left sidebar to open the Query History UI.
Select the REFRESH checkbox from the Statement drop-down filter.
Click the name of the query statement to view summary details like the duration of the query and aggregated metrics.
Click See query profile to open the query profile. For details about navigating the query profile, see Query profile.
Optionally, use the links in the Query Source section to open the related query or pipeline.

See CREATE MATERIALIZED VIEW.

View the refresh status in the DLT UI

By default, the DLT pipeline that manages a materialized view is not visible in the DLT UI. To view the pipeline in the DLT UI, you must directly access the link to the pipeline’s Pipeline details page. To access the link:

Copy and paste the link shown in the Latest Refresh row of the table returned by the DESCRIBE EXTENDED statement.
On the lineage tab for the materialized view, click Pipelines and then click the pipeline link.

For asynchronous REFRESH commands submitted using the SQL editor in the Azure Databricks UI, you can view the refresh status by following the link shown in the Results panel.

Stop an active refresh

To stop an active refresh in the DLT UI, in the Pipeline details page click Stop to stop the pipeline update. You can also stop the refresh with the Databricks CLI or the POST /api/2.0/pipelines/{pipeline_id}/stop operation in the Pipelines API.

Update the definition of a materialized view

To update the definition of a materialized view, you must first drop, then re-create the materialized view.

Permanently delete records from a materialized view with deletion vectors enabled

Important

Support for the REORG statement with materialized views is in Public Preview.

Note

Using a REORG statement with a materialized view requires Databricks Runtime 15.4 and above.
Although you can use the REORG statement with any materialized view, it’s only required when deleting records from a materialized view with deletion vectors enabled. The command has no effect when used with a materialized view without deletion vectors enabled.

To physically delete records from the underlying storage for a materialized view with deletion vectors enabled, such as for GDPR compliance, additional steps must be taken to ensure that a VACUUM operation runs on the materialized view‘s data.

The following describes these steps in more detail:

Run a REORG statement against the materialized view, specifying the APPLY (PURGE) parameter. For example REORG TABLE <materialized-view-name> APPLY (PURGE);. See REORG TABLE.
Wait for the materialized view‘s data retention period to pass. The default data retention period is seven days, but it can be configured with the delta.deletedFileRetentionDuration table property. See Configure data retention for time travel queries.
REFRESH the materialized view. See Refresh a materialized view. Within 24 hours of the REFRESH operation, DLT maintenance tasks, including the VACUUM operation which is required to ensure records are permanently deleted, are run automatically. See Maintenance tasks performed by DLT.

Drop a materialized view

Note

To submit the command to drop a materialized view, you must be the owner of that materialized view or have the MANAGE privilege on the materialized view.

To drop a materialized view, use the DROP VIEW statement. To submit a DROP statement, you can use the SQL editor in the Azure Databricks UI, the Databricks SQL CLI, or the Databricks SQL API. The following example drops the mv1 materialized view:

DROP MATERIALIZED VIEW mv1;

Describe a materialized view

To retrieve the columns and data types for a materialized view, use the DESCRIBE statement. To retrieve the columns, data types, and metadata such as owner, location, creation time, and refresh status for a materialized view, use DESCRIBE EXTENDED. To submit a DESCRIBE statement, use the SQL editor in the Azure Databricks UI, the Databricks SQL CLI, or the Databricks SQL API.

Change the owner of a materialized view

You can change the owner of a materialized view if you are a both a metastore admin and a workspace admin. Materialized views automatically create and use DLT pipelines to process changes. Use the following steps to change a materialized views owner:

On the lineage tab for the materialized view, click Pipelines and then click the pipeline link.
Click Share. The Permissions Settings dialog appears.
Click x to the right of the current owner’s name to remove the current owner.
Start typing to filter the list of available users. Click the user who should be the new pipeline owner.
Click Save to save your changes and close the dialog.

All pipeline assets, including materialized views defined in the pipeline, are owned by the new pipeline owner. All future updates are run using the new owner’s identity.

Control access to materialized views

Materialized views support rich access controls to support data-sharing while avoiding exposing potentially private data. A materialized view owner or a user with the MANAGE privilege can grant SELECT privileges to other users. Users with SELECT access to the materialized view do not need SELECT access to the tables referenced by the materialized view. This access control enables data sharing while controlling access to the underlying data.

Grant privileges to a materialized view

To grant access to a materialized view, use the GRANT statement:

GRANT
  privilege_type [, privilege_type ] ...
  ON <mv_name> TO principal;

The privilege_type can be:

SELECT - the user can SELECT the materialized view.
REFRESH - the user can REFRESH the materialized view. Refreshes are run using the owner’s permissions.

The following example creates a materialized view and grants select and refresh privileges to a user:

CREATE MATERIALIZED VIEW <mv_name> AS SELECT * FROM <base_table>;
GRANT SELECT ON <mv_name> TO user;
GRANT REFRESH ON <mv_name> TO user;

Revoke privileges from a materialized view

To revoke access from a materialized view, use the REVOKE statement:

REVOKE
  privilege_type [, privilege_type ]
  ON <name> FROM principal;

When SELECT privileges on a base table are revoked from the materialized view owner or any other user who has been granted MANAGE or SELECT privileges on the materialized view, or the base table is dropped, the materialized view owner or user granted access is still able to query the materialized view. However, the following behavior occurs:

The materialized view owner or others who have lost access to a materialized view can no longer REFRESH that materialized view, and the materialized view will become stale.
If automated with a schedule, the next scheduled REFRESH fails or is not run.

The following example revokes the SELECT privilege from mv1:

REVOKE SELECT ON mv1 FROM user1;

Enable change data feed

Change data feed is required on the materialized views base tables, except for certain advanced use cases. To enable change data feed on a base table, set the delta.enableChangeDataFeed table property using the following syntax:

ALTER TABLE table1 SET TBLPROPERTIES (delta.enableChangeDataFeed = true);

View the refresh history for a materialized view

To view the status of REFRESH operations on a materialized view, including current and past refreshes, query the DLT event log:

SELECT
  *
FROM
  event_log(TABLE(<fully-qualified-table-name>))
WHERE
  event_type = "update_progress"
ORDER BY
  timestamp desc;

Replace <fully-qualified-table-name> with the fully qualified name of the materialized view, including the catalog and schema.

See What is the DLT event log?.

Get detailed information about materialized views with Catalog Explorer

You can use the Catalog Explorer to view details about a materialized view.

Click Catalog in the sidebar.
In the Catalog Explorer tree at the left, open the catalog and select the schema where your materialized view is located.
Open the Tables item under the schema you selected, and click the materialized view.

From here, you can use the tabs under the materialized view name to view and edit information about the materialized view, including:

Refresh status and history
The table schema
Sample data (requires an active compute)
Permissions
Lineage, including tables and pipelines that this materialized view depends on
Insights into usage
Monitors that you have created for this materialized view

Limitations

For compute and workspace requirements, see Requirements.
Materialized views do not support identity columns or surrogate keys.
If a materialized view uses a sum aggregate over a NULL-able column and only NULL values remain in that column, the materialized views resultant aggregate value is zero instead of NULL.
You cannot read a change data feed from a materialized view.
Time travel queries are not supported on materialized views.
The underlying files supporting materialized views might include data from upstream tables (including possible personally identifiable information) that do not appear in the materialized view definition. This data is automatically added to the underlying storage to support incremental refreshing of materialized views. Because the underlying files of a materialized view might risk exposing data from upstream tables not part of the materialized view schema, Databricks recommends not sharing the underlying storage with untrusted downstream consumers. For example, suppose the definition of a materialized view includes a COUNT(DISTINCT field_a) clause. Even though the materialized view definition only includes the aggregate COUNT DISTINCT clause, the underlying files will contain a list of the actual values of field_a.

Jaa

Use materialized views in Databricks SQL

What are materialized views?

Requirements

To create or refresh materialized views:

To query materialized views:

Create a materialized view

Set the runtime channel

Load data from external systems

Refresh a materialized view

How are Databricks SQL materialized views refreshed?

Schedule materialized view refreshes

View the status of a materialized view refresh

Monitor runs using query history

View the refresh status in the DLT UI

Stop an active refresh

Update the definition of a materialized view

Permanently delete records from a materialized view with deletion vectors enabled

Drop a materialized view

Describe a materialized view

Change the owner of a materialized view

Control access to materialized views

Grant privileges to a materialized view

Revoke privileges from a materialized view

Enable change data feed

View the refresh history for a materialized view

Get detailed information about materialized views with Catalog Explorer

Limitations

Palaute

Lisäresursseja