LakeFlow Connect
Important
LakeFlow Connect is in gated Public Preview. To participate in the preview, contact your Databricks account team.
This article provides an overview of Databricks LakeFlow Connect, which offers fully-managed connectors for ingesting data from SaaS applications like Salesforce and databases like SQL Server into an Azure Databricks lakehouse. The resulting ingestion pipeline is governed by Unity Catalog and is powered by serverless compute and Delta Live Tables. LakeFlow Connect leverages efficient incremental reads and writes to make data ingestion faster, scalable, and more cost-efficient, while your data remains fresh for downstream consumption.
SaaS connector components
A SaaS connector is modeled by the following components:
- Connection: A Unity Catalog securable object that stores authentication details for the database.
- Ingestion pipeline: Ingests the staged data into Delta tables. This component is modeled as a serverless DLT pipeline.
Database connector components
A database connector is modeled by the following components:
- Connection: A Unity Catalog securable object that stores authentication details for the database.
- Gateway: Extracts data from the source database and maintains the integrity of transactions during the transfer. For cloud-based databases, the gateway is configured as a DLT pipeline with classic compute.
- Staging storage: A Unity Catalog volume where data from the gateway is staged before being applied to a Delta table. The staging storage account is created when you deploy the gateway and exists within the catalog and schema that you specify.
- Ingestion pipeline: Ingests the staged data into Delta tables. This component is modeled as a serverless DLT pipeline.
LakeFlow Connect vs. Lakehouse Federation vs. Delta Sharing
Lakehouse Federation allows you to query external data sources without moving your data. Delta Sharing allows you to securely share live data across platforms, clouds, and regions. Databricks recommends ingestion using LakeFlow Connect because it scales to accommodate high data volumes, low-latency querying, and third-party API limits. However, you might want to query your data without moving it.
When you have a choice between LakeFlow Connect, Lakehouse Federation, and Delta Sharing, choose Delta Sharing for the following scenarios:
- Limiting data duplication.
- Querying the freshest possible data.
Choose Lakehouse Federation for the following scenarios:
- Ad hoc reporting or proof-of-concept work on your ETL pipelines.
LakeFlow Connect vs. Auto Loader
LakeFlow Connect provides built-in connectors that allow you to incrementally ingest data from enterprise applications and databases. Auto Loader is a connector for cloud object storage that allows you to incrementally ingest files as they arrive in S3, ADLS, and GCS. It is compatible with Structured Streaming and Delta Live Tables but does not integrate with LakeFlow Connect.
Can LakeFlow Connect write back to third-party apps and databases?
No. If you’re interested in this functionality, reach out to your account team.
What is the cost for LakeFlow Connect?
For now, customers are only billed for the serverless Delta Live Tables usage that’s needed to load data from the source (if connecting to an enterprise application, like Salesforce) or from the staging volume (if connecting to a database, like SQL Server). The final pricing model for Lakeflow Connect might include additional charges and will be announced in the future.
Serverless Delta Live Tables pricing is visible on our pricing page.