Triggered vs. continuous pipeline mode

Artikkeli
10/14/2024

This article describes the operational semantics of triggered and continuous pipeline modes for Delta Live Tables.

Pipeline mode is independent of the type of table being computed. Both materialized views and Streaming tables can be updated in either pipeline mode.

To change between triggered and continuous, use the Pipeline mode option in the pipeline settings while creating or editing a pipeline. See Configure a Delta Live Tables pipeline.

Note

Refresh operations for materialized views and Streaming tables defined in Databricks SQL always run using triggered pipeline mode.

What is triggered pipeline mode?

If the pipeline uses triggered mode, the system stops processing after successfully refreshing all tables or selected tables, ensuring each table in the update is refreshed based on the data available when the update starts.

What is continuous pipeline mode?

If the pipeline uses continuous execution, Delta Live Tables processes new data as it arrives in data sources to keep tables throughout the pipeline fresh.

To avoid unnecessary processing in continuous execution mode, pipelines automatically monitor dependent Delta tables and perform an update only when the contents of those dependent tables have changed.

Choose a data pipeline modes

The following table highlights the differences between triggered and continuous pipeline modes:

Key questions	Triggered	Continuous
When does the update stop?	Automatically once complete.	Runs continuously until manually stopped.
What data is processed?	Data available when the update starts.	All data as it arrives at configured sources.
What data freshness requirements is this best for?	Data updates run every 10 minutes, hourly, or daily.	Data updates are desired between every 10 seconds and a few minutes.

Triggered pipelines can reduce resource consumption and expense because the cluster runs only long enough to update the pipeline. However, new data won’t be processed until the pipeline is triggered. Continuous pipelines require an always-running cluster, which is more expensive but reduces processing latency.

Set trigger interval for continuous pipelines

When configuring pipelines for continuous mode, you can set trigger intervals to control how frequently the pipeline starts an update for each flow.

You can use pipelines.trigger.interval to control the trigger interval for a flow updating a table or an entire pipeline. Because a triggered pipeline processes each table once, the pipelines.trigger.interval is used only with continuous pipelines.

Databricks recommends setting pipelines.trigger.interval on individual tables because streaming and batch queries have different defaults. Set the value on a pipeline only when processing requires controlling updates for the entire pipeline graph.

You set pipelines.trigger.interval on a table using spark_conf in Python or SET in SQL:

@dlt.table(
  spark_conf={"pipelines.trigger.interval" : "10 seconds"}
)
def <function-name>():
    return (<query>)

SET pipelines.trigger.interval=10 seconds;

CREATE OR REFRESH MATERIALIZED VIEW TABLE_NAME
AS SELECT ...

To set pipelines.trigger.interval on a pipeline, add it to the configuration object in the pipeline settings:

{
  "configuration": {
    "pipelines.trigger.interval": "10 seconds"
  }
}

Jaa

Triggered vs. continuous pipeline mode

What is triggered pipeline mode?

What is continuous pipeline mode?

Choose a data pipeline modes

Set trigger interval for continuous pipelines

Palaute

Lisäresursseja