Azure Data Architecture Guide – Blog #8: Data warehousing
In our eighth blog in this series, we'll continue to explore the Azure Data Architecture Guide. The previous entries for this blog series are:
- Azure Data Architecture Guide – Blog #1: Introduction
- Azure Data Architecture Guide – Blog #2: On-demand big data analytics
- Azure Data Architecture Guide – Blog #3: Advanced analytics and deep learning
- Azure Data Architecture Guide – Blog #4: Hybrid data architecture
- Azure Data Architecture Guide – Blog #5: Clickstream analysis
- Azure Data Architecture Guide – Blog #6: Business intelligence
- Azure Data Architecture Guide – Blog #7: Intelligent applications
Like the previous post, we'll work from a technology implementation seen directly in our customer engagements. The example can help lead you to the ADAG content to make the right technology choices for your business.
Data warehousing
Here we see store data coming from multiple sources into Azure Data Lake Storage, in their native format. (Azure Data Lake Storage Gen 2 is recommended.) Azure SQL Data Warehouse directly queries against the data with a combination of external tables and schema on read capabilities through PolyBase. Use Azure Data Factory to store the data you need within your warehouse, and quickly analyze and visualize the combined data with Power BI.
Highlighted services
- Azure Data Lake Storage Gen 1 (legacy)
- Azure Data Lake Storage Gen 2 (recommended)
- Azure SQL Data Warehouse
- PolyBase
- Azure Data Factory
- Power BI
Related ADAG articles
- Traditional RDBMS workloads
- Cross-cutting concerns
- Technology choices
See Also
The Reference Architecture, Enterprise BI in Azure with SQL Data Warehouse, implements an extract, load, and transform (ELT) pipeline that moves data from an on-premises SQL Server database into SQL Data Warehouse and transforms the data for analysis.
The Azure Architecture Center also features two related example scenarios that demonstrate solutions using these technologies:
- Data warehousing and analytics for sales and marketing
- Hybrid ETL with existing on-premises SSIS and Azure Data Factory
In addition, the Modern Data Warehouse solution ingests the data sources through Azure Data Factory, combining your data to Azure Blob Storage. It uses Azure Databricks to prep and train cleansed and transformed data, to be moved to Azure SQL Data Warehouse (which acts as the data hub). Like the Reference Architecture above, this solution leverages Azure Analysis Services for data modeling (then on to Power BI for your visualizations).
For more information about the data movement, see the following articles:
- Load data from Azure Data Lake Storage Gen1 to SQL Data Warehouse
- Copy data to or from Azure Data Lake Storage Gen1 by using Azure Data Factory
- Copy data to or from Azure Data Lake Storage Gen2 Preview using Azure Data Factory (Preview)
- Copy data to or from Azure SQL Data Warehouse by using Azure Data Factory
- Tutorial: Extract, transform, and load data using Azure Databricks
- Choosing a data pipeline orchestration technology in Azure
For more information about stream processing and Azure Data Bricks, see our reference architecture, Create a stream processing pipeline with Azure Databricks, and our ADAG article, Choosing a stream processing technology in Azure.
For more information about Azure Analysis Services and advanced analytics, see our last blog post in this blog series, Azure Data Architecture Guide – Blog #6: Business intelligence, our ADAG article, Choosing an analytical data store in Azure, our two Reference Architectures, Enterprise BI in Azure with SQL Data Warehouse and Automated enterprise BI with SQL Data Warehouse and Azure Data Factory, our Example Scenario Data warehousing and analytics for sales and marketing, and our two Solutions, Advanced analytics on big data and Real-time analytics.
Please peruse ADAG to find a clear path for you to architect your data solution on Azure:
Azure CAT Guidance
"Hands-on solutions, with our heads in the Cloud!"
Comments
- Anonymous
December 29, 2018
Hi AzureCAT Team,Do you recommend DirectQuery connection to SQL Data Warehouse from Power BI (as it looks on the diagram in the Data warehousing section)? As I tested this approach it practically never works, because of the Power BI's nature as a self-service BI tool (many concurrent queries sent at the same time after a single click in the report containing many visuals). I think only hub & spoke approach works here (having additional layer between SQLDW and Power BI - either Analysis Services or SQL Database). Can you give some comment on that?Thank you in advance.- Anonymous
January 02, 2019
Pawel, you tried to connect Power BI to SQL Data Warehouse as your data source, per https://docs.microsoft.com/en-us/power-bi/desktop-data-sources? (Azure category data connections)That connection doesn't work at all?Thanks!- Anonymous
January 10, 2019
The comment has been removed- Anonymous
January 15, 2019
Thank you, Pawel.So in an ideal world, what features would you want to see added to Power BI and SQL Data Warehouse to address this concurrency issue? Thanks again!
- Anonymous
- Anonymous
- Anonymous
- Anonymous
January 02, 2019
Guys, this is conflicting with the modern DW architecture https://azure.microsoft.com/en-gb/solutions/architecture/modern-data-warehouse/ . How do you move data through the lake? Where is Databricks??? You need to clarify with most customers aware of the ADLA roadmap.- Anonymous
January 02, 2019
@Wesdev: +1!!!This blog entry is really inconsistent with the architecture you mentioned and also it references Azure Data Lake Store Gen1, while all other recommendations are based either on Blob Storage or ADLS Gen2.I'd really love to see Microsoft fellows sending a single universal message about modern DW architecture to the market, and not architecture relying on definitely NOT bulletproof approach (Power BI connection to SQLDW) and based on old services (ADLS Gen1). Another topic is I'd love to have a clear roadmap for every single Azure service. ADLA and Azure Data Catalog to mention as good examples.Consistency and clear vision are two important factors when client is about to decide whether to go into PaaS solutions or keep to the well-known IaaS implementations based on SQL Server stack.- Anonymous
January 02, 2019
Thank you for the feedback. I added the references to Gen 2.See my comment below about the newest roadmap and "Azure Updates" features.
- Anonymous
- Anonymous
January 02, 2019
The comment has been removed
- Anonymous
- Anonymous
January 02, 2019
In November, the Azure Updates site was refreshed with new features: https://techcommunity.microsoft.com/t5/Azure/Staying-up-to-date-with-the-Microsoft-Azure-roadmap/m-p/284886#M3661