What are all the Delta things in Azure Databricks?

Artikkeli
03/04/2025

This article is an introduction to the technologies collectively branded Delta on Azure Databricks. Delta refers to technologies related to or in the Delta Lake open source project.

This article answers:

What are the Delta technologies in Azure Databricks?
What do they do? Or what are they used for?
How are they related to and distinct from one another?

What are the Delta things used for?

Delta is a term introduced with Delta Lake, the foundation for storing data and tables in the Databricks lakehouse. Delta Lake was conceived of as a unified data management system for handling transactional real-time and batch big data, by extending Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling.

Delta Lake: OS data management for the lakehouse

Delta Lake is an open-source storage layer that brings reliability to data lakes by adding a transactional storage layer on top of data stored in cloud storage (on AWS S3, Azure Storage, and GCS). It allows for ACID transactions, data versioning, and rollback capabilities. It allows you to handle both batch and streaming data in a unified way.

Delta tables are built on top of this storage layer and provide a table abstraction, making it easy to work with large-scale structured data using SQL and the DataFrame API.

Delta tables: Default data table architecture

Delta table is the default data table format in Azure Databricks and is a feature of the Delta Lake open source data framework. Delta tables are typically used for data lakes, where data is ingested via streaming or in large batches.

See:

Delta Lake quickstart: Create a table
Updating and modifying Delta Lake tables.
DeltaTable class: Main class for interacting programmatically with Delta tables.

DLT: Data pipelines

DLT manage the flow of data between many Delta tables, thus simplifying the work of data engineers on ETL development and management. The pipeline is the main unit of execution for DLT. DLT offers declarative pipeline development, improved data reliability, and cloud-scale production operations. Users can perform both batch and streaming operations on the same table and the data is immediately available for querying. You define the transformations to perform on your data, and DLT manages task orchestration, cluster management, monitoring, data quality, and error handling. DLT enhanced autoscaling can handle streaming workloads which are spiky and unpredictable.

See the DLT tutorial.

Delta tables vs. DLT

Delta table is a way to store data in tables, whereas DLT allows you to describe how data flows between these tables declaratively. DLT is a declarative framework that manages many delta tables, by creating them and keeping them up to date. In short, Delta tables is a data table architecture while DLT is a data pipeline framework.

Delta: Open source or proprietary?

A strength of the Azure Databricks platform is that it doesn’t lock customers into proprietary tools: Much of the technology is powered by open source projects, which Azure Databricks contributes to.

The Delta OSS projects are examples:

Delta Lake project: Open source storage for a lakehouse.
Delta Sharing protocol: Open protocol for secure data sharing.

DLT is a proprietary framework in Azure Databricks.

What are the other Delta things on Azure Databricks?

Below are descriptions of other features that include Delta in their name.

An open standard for secure data sharing, Delta Sharing enables data sharing between organizations regardless of their compute platform.

Delta engine

A query optimizer for big data that uses Delta Lake open source technology included in Databricks. Delta engine optimizes the performance of Spark SQL, Databricks SQL, and DataFrame operations by pushing computation to the data.

Delta Lake transaction log (AKA DeltaLogs)

A single source of truth tracking all changes that users make to the table and the mechanism through which Delta Lake guarantees atomicity. See the Delta transaction log protocol on GitHub.

The transaction log is key to understanding Delta Lake, because it is the common thread that runs through many of its most important features:

ACID transactions
Scalable metadata handling
Time travel
And more.

Jaa