LIVE schema (legacy)

This article provides an overview of the legacy syntax and behavior for the LIVE virtual schema.

The LIVE virtual schema is a legacy feature of Delta Live Tables pipelines and is considered deprecated. You can still use legacy publishing mode and the LIVE virtual schema for pipelines that were created with this mode.

Support for legacy LIVE virtual schema and legacy publishing mode will be removed in a future version of Azure Databricks.

Note

Legacy pipelines are indicated in the Summary field of the Delta Live Tables pipeline settings UI. You can also confirm a pipeline uses legacy publishing mode if the target field is set in the JSON specification for the pipeline.

You cannot use the pipeline configuration UI to create new pipelines with the legacy publishing mode. If you need to deploy new pipelines using legacy LIVE syntax, contact your Databricks account representative.

What is the LIVE virtual schema?

Note

The LIVE virtual schema is no longer needed to analyze dataset dependency in the default publishing mode for Delta Live Tables.

The LIVE schema is a programming concept in Delta Live Tables that defines a virtual boundary for all datasets created or updated in a pipeline. By design, the LIVE schema is not tied directly to datasets in a published schema. Instead, the LIVE schema allows logic in a pipeline to be planned and run even if a user does not want to publish datasets to a schema.

In legacy pipeline publishing mode, you can use the LIVE keyword to reference other datasets in the current pipeline for reads, for example, SELECT * FROM LIVE.bronze_table. In the default publishing mode for new Delta Live Tables pipelines, this syntax is silently ignored, meaning that unqualified identifiers use the current schema. See Set the target catalog and schema.

Legacy pipeline publishing mode

The LIVE virtual schema is used with the legacy Delta Live Tables pipeline publishing mode. All tables created before February 5, 2025, use legacy publishing mode by default.

The following table describes the behavior for all materialized views and streaming tables created or updated in a pipeline in the legacy publishing mode:

Storage option Storage location or catalog Target schema Behavior
Hive metastore None specified None specified Dataset metadata and data are stored to the DBFS root. No database objects are registered to the Hive metastore.
Hive metastore A URI or file path to cloud object storage. None specified Dataset metadata and data are stored to the specified storage location. No database objects are registered to the Hive metastore.
Hive metastore None specified An existing or new schema in the Hive metastore. Dataset metadata and data are stored to the DBFS root. All materialized views and streaming tables in the pipeline are published to the specified schema in Hive metastore.
Hive metastore A URI or file path to cloud object storage. An existing or new schema in the Hive metastore. Dataset metadata and data are stored to the specified storage location. All materialized views and streaming tables in the pipeline are published to the specified schema in Hive metastore.
Unity Catalog An existing Unity Catalog catalog. None specified Dataset metadata and data are stored in the default storage location associated with the target catalog. No database objects are registered to the Unity Catalog.
Unity Catalog An existing Unity Catalog catalog. An existing or new schema in Unity Catalog. Dataset metadata and data are stored in the default storage location associated with the target schema or catalog. All materialized views and streaming tables in the pipeline are published to the specified schema in Unity Catalog.

Update source code from LIVE schema

Pipelines configured to run with the new default publishing mode silently ignore the LIVE schema syntax. By default, all table reads use the catalog and schema specified in the pipeline configuration.

For most existing pipelines, this behavior change has no impact, as the legacy LIVE virtual schema behavior also directs reads to the catalog and schema specified in the pipeline configuration.

Important

Legacy code with reads that leverage the workspace default catalog and schema require code updates. Consider the following materialized view definition:

CREATE MATERIALIZED VIEW silver_table
AS SELECT * FROM raw_data

In legacy publishing mode, an unqualified read from the raw_data table uses the workspace default catalog and schema, for example, main.default.raw_data. In the new default pipeline mode, the catalog and schema used by default are those configured in the pipeline configuration. To ensure that this code continues to work as expected, update the reference to use the fully qualified identifier for the table, as in the following example:

CREATE MATERIALIZED VIEW silver_table
AS SELECT * FROM main.default.raw_data

Work with event log for Unity Catalog legacy pipelines

Important

The event_log TVF is available for legacy pipelines that publish tables to Unity Catalog. Default behavior for new pipelines publishes the event log to the target catalog and schema configured for the pipeline. See Query the event log.

Tables configured with Hive metastore also have different event log support and behavior. See Work with event log for Hive metastore pipelines.

If your pipeline publishes tables to Unity Catalog with legacy publishing mode, you must use the event_log table valued function (TVF) to fetch the event log for the pipeline. You retrieve the event log for a pipeline by passing the pipeline ID or a table name to the TVF. For example, to retrieve the event log records for the pipeline with ID 04c78631-3dd7-4856-b2a6-7d84e9b2638b:

SELECT * FROM event_log("04c78631-3dd7-4856-b2a6-7d84e9b2638b")

To retrieve the event log records for the pipeline that created or owns the table my_catalog.my_schema.table1:

SELECT * FROM event_log(TABLE(my_catalog.my_schema.table1))

To call the TVF, you must use a shared cluster or a SQL warehouse. For example, you can use a notebook attached to a shared cluster or use the SQL editor connected to a SQL warehouse.

To simplify querying events for a pipeline, the owner of the pipeline can create a view over the event_log TVF. The following example creates a view over the event log for a pipeline. This view is used in the example event log queries included in this article.

Note

  • The event_log TVF can be called only by the pipeline owner.
  • You cannot use the event_log table valued function in a pipeline or query to access the event logs of multiple pipelines.
  • You cannot share a view created over the event_log table valued function with other users.
CREATE VIEW event_log_raw AS SELECT * FROM event_log("<pipeline-ID>");

Replace <pipeline-ID> with the unique identifier for the Delta Live Tables pipeline. You can find the ID in the Pipeline details panel in the Delta Live Tables UI.

Each instance of a pipeline run is called an update. You often want to extract information for the most recent update. Run the following query to find the identifier for the most recent update and save it in the latest_update_id temporary view. This view is used in the example event log queries included in this article:

CREATE OR REPLACE TEMP VIEW latest_update AS SELECT origin.update_id AS id FROM event_log_raw WHERE event_type = 'create_update' ORDER BY timestamp DESC LIMIT 1;