Data warehouse tutorial introduction
Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric
Microsoft Fabric provides a one-stop shop for all the analytical needs for every enterprise. It covers the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, real time analytics, and business intelligence. With Microsoft Fabric, there's no need to stitch together different services from multiple vendors. Instead, the customer enjoys an end-to-end, highly integrated, single comprehensive product that is easy to understand, onboard, create and operate. No other product on the market offers the breadth, depth, and level of integration that Microsoft Fabric offers. Additionally, Microsoft Purview is included by default in every tenant to meet compliance and governance needs.
Purpose of this tutorial
While many concepts in Microsoft Fabric might be familiar to data and analytics professionals, it can be challenging to apply those concepts in a new environment. This tutorial has been designed to walk step-by-step through an end-to-end scenario from data acquisition to data consumption to build a basic understanding of the Microsoft Fabric user experience, the various experiences and their integration points, and the Microsoft Fabric professional and citizen developer experiences.
The tutorials aren't intended to be a reference architecture, an exhaustive list of features and functionality, or a recommendation of specific best practices.
Data warehouse end-to-end scenario
As prerequisites to this tutorial, complete the following steps:
- Sign into your Power BI online account, or if you don't have an account yet, sign up for a free trial.
- Enable Microsoft Fabric in your tenant.
In this tutorial, you take on the role of a Warehouse developer at the fictional Wide World Importers company and complete the following steps in the Microsoft Fabric portal to build and implement an end-to-end data warehouse solution:
- Create a Microsoft Fabric workspace.
- Create a Warehouse.
- Ingest data from source to the data warehouse dimensional model with a data pipeline.
- Create tables in your Warehouse.
- Load data with T-SQL with the SQL query editor.
- Clone a table using T-SQL with the SQL query editor.
- Transform the data to create aggregated datasets using T-SQL.
- Time travel using T-SQL to see data as it appeared.
- Use the visual query editor to query the data warehouse.
- Analyze data with a notebook.
- Create and execute cross-warehouse queries with SQL query editor.
- Create Power BI reports using DirectLake mode to analyze the data in place.
- Build a report from OneLake.
- Clean up resources by deleting the workspace and other items.
Data warehouse end-to-end architecture
Data sources - Microsoft Fabric makes it easy and quick to connect to Azure Data Services, other cloud platforms, and on-premises data sources to ingest data from.
Ingestion - With 200+ native connectors as part of the Microsoft Fabric pipeline and with drag and drop data transformation with dataflow, you can quickly build insights for your organization. Shortcut is a new feature in Microsoft Fabric that provides a way to connect to existing data without having to copy or move it. You can find more details about the Shortcut feature later in this tutorial.
Transform and store - Microsoft Fabric standardizes on Delta Lake format, which means all the engines of Microsoft Fabric can read and work on the same data stored in OneLake - no need for data duplicity. This storage allows you to build a data warehouse or data mesh based on your organizational need. For transformation, you can choose either low-code or no-code experience with pipelines/dataflows or use T-SQL for a code first experience.
Consume - Data from the warehouse can be consumed by Power BI, the industry leading business intelligence tool, for reporting and visualization. Each warehouse comes with a built-in TDS endpoint for easily connecting to and querying data from other reporting tools, when needed. When a warehouse is created, a secondary item, called a default semantic model, is generated at the same time with the same name. You can use the default semantic model to start visualizing data with just a couple of steps.
Sample data
For sample data, we use the Wide World Importers (WWI) sample database. For our data warehouse end-to-end scenario, we have generated sufficient data for a sneak peek into the scale and performance capabilities of the Microsoft Fabric platform.
Wide World Importers (WWI) is a wholesale novelty goods importer and distributor operating from the San Francisco Bay area. As a wholesaler, WWI's customers are mostly companies who resell to individuals. WWI sells to retail customers across the United States including specialty stores, supermarkets, computing stores, tourist attraction shops, and some individuals. WWI also sells to other wholesalers via a network of agents who promote the products on WWI's behalf. To earn more about their company profile and operation, see Wide World Importers sample databases for Microsoft SQL.
Typically, you would bring data from transactional systems (or line of business applications) into a data lake or data warehouse staging area. However, for this tutorial, we use the dimensional model provided by WWI as our initial data source. We use it as the source to ingest the data into a data warehouse and transform it through T-SQL.
Data model
While the WWI dimensional model contains multiple fact tables, for this tutorial we focus on the fact_sale
table and its related dimensions only, as follows, to demonstrate this end-to-end data warehouse scenario: