Logical data warehouse with Azure Synapse serverless SQL pools

Azure Cosmos DB
Azure Data Factory
Azure Data Lake
Azure Synapse Analytics
Power BI

Solution ideas

This article describes a solution idea. Your cloud architect can use this guidance to help visualize the major components for a typical implementation of this architecture. Use this article as a starting point to design a well-architected solution that aligns with your workload's specific requirements.

The logical data warehouse (LDW) pattern lays a lightweight virtualized relational layer on top of data that's stored in a data lake or database. This virtualization layer provides data warehouse access without requiring data movement. This solution can combine online transaction processing (OLTP) data with analytical data from data lakes for a low-complexity, low-latency way to serve business intelligence (BI) and analytics workloads.

Apache Spark™ is a trademark of the Apache Software Foundation in the United States and/or other countries/regions. No endorsement by The Apache Software Foundation is implied by the use of this mark.

Architecture

Diagram showing a flow of data from left to right as the steps describe.

Download a PowerPoint file of all the diagrams in this article.

Dataflow

  1. Azure Data Factory integrates data from source systems into the enterprise data lake.

  2. Device and sensor data also streams from edge devices into the cloud through Azure IoT Hub. Azure Stream Analytics processes the data and sends it to the enterprise data lake.

  3. Azure Synapse serverless SQL pools define an LDW that has logical tables and views accessible through the Azure Synapse workspace serverless SQL pool on-demand endpoint.

  4. Azure Synapse Link for Azure Cosmos DB queries real-time transactional data through the Azure Synapse serverless SQL pools. This data joins with cold batch and hot streaming data from the enterprise data lake to create logical views.

  5. Reporting, BI, and other analytics applications access LDW data and views by using the Azure Synapse workspace serverless SQL endpoint.

    Note

    The Azure Synapse workspace serverless SQL endpoint is accessible from any tool or service that supports Tabular Data Stream (TDS) connections to SQL Server.

Components

Scenario details

By using an LDW with Azure Synapse serverless SQL pools, you can join cold batch data, hot streaming data, and live transactional data in a single T-SQL query or view definition.

This solution avoids moving data through complex, expensive, and latency-prone extract, transform, and load (ETL) pipelines. The LDW concept is similar to a data lakehouse, but LDW with Azure Synapse Analytics includes support for hybrid transaction/analytical processing (HTAP). HTAP uses Azure Synapse serverless SQL pools to query OLTP data that's stored in Azure Cosmos DB.

An Azure Synapse Analytics LDW is based on serverless SQL pools that are available with all Azure Synapse workspaces. An enhanced version of the OPENROWSET function enables serverless SQL pools to access data in Data Lake Storage.

This data access allows creation of relational database objects like tables and views over collections of data files that represent logical entities, like products, customers, and sales transactions. BI tools that connect by using a standard SQL Server endpoint can consume these logical entities as dimensions and fact tables.

Diagram that shows a side-by-side comparison of the LDW conceptual design, next to an implementation of LDW with Azure Synapse Analytics serverless SQL pool.

The ability to access transactional data stores like Azure Cosmos DB through the Azure Synapse Link for Azure Cosmos DB expands these capabilities. Accessing OLTP data by using HTAP architecture provides instant updates without interfering with live transactions.

Diagram that shows the flow of external data to the reporting layer using Azure Synapse Analytics serverless SQL pool.

Each Azure Synapse workspace includes an on-demand SQL endpoint. The endpoint lets SQL Server administrators and developers use familiar environments to work with LDWs that Azure Synapse serverless SQL pools define.

The following screenshot shows SQL Server Management Studio (SSMS) connected to an Azure Synapse serverless SQL pool.

Screenshot that shows SSMS connected to the Azure Synapse SQL Server endpoint.

Azure Synapse serverless SQL pools support the following file formats:

  • Delimited text, such as CSV, TSV, and TXT
  • JSON
  • Parquet

Azure Synapse serverless SQL pools also support the Delta Lake format. This support allows patterns like enrich in Spark, serve with SQL, where Apache Spark™ services like Azure Databricks or Apache Spark pools in Azure Synapse engineer data to create curated datasets in the data lake. Instead of having to load these datasets into a physical data warehouse, you can define an LDW over the data lake to provide the model/serve layer for reporting.

Diagram that shows the flow of external data to the reporting layer with Azure Synapse Analytics serverless SQL pool.

The LDW with Azure Synapse serverless SQL pools is an implementation of the Data Lakehouse pattern. Using Databricks SQL to implement an LDW is an alternative solution. However, Databricks SQL lacks the HTAP capability of Azure Synapse Link for Cosmos DB.

Potential use cases

This pattern is useful for the following cases:

  • Data warehouse serving layer for BI and other analytical use cases.
  • Ad-hoc exploration of raw data in a data lake.
  • Cost-effective data streaming into a data lake that doesn't require its own compute resources to write data. A logical database table, view, or ad-hoc T-SQL query can access the data instantly from the data lake.
  • Instant access to Azure Cosmos DB transactional data to build real-time aggregation pipelines or join with analytical data stored in the data lake.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps