Whitepaper on modeling for AS tabular scalability
This is a guest post from Daniel Rubiolo of the Power BI Customer Advisory Team (PBI CAT).
To scale Tabular models to very big volumes of data, and with the objective of getting as much of the data as possible into memory so users can consume it in their analyses, we must consider data preparation and modeling best practices to help the engine do its best work.
AS (Analysis Services) Tabular at its core has an in-memory columnar database engine optimized for BI exploratory analytics. It’s highly efficient in encoding and compressing data in memory, supporting custom business logic with calculated columns & measures, enabling high concurrency, and delivering blazing fast responses.
In this whitepaper (DOCX, PDF) we cover a simplified overview of how the Tabular engine works, and several data modeling design best practices that take the most advantage of the engine’s inner workings. This article was written with an Azure Analysis Services example, but equally applies to SQL Server 2017 Analysis Services. (With a few exceptions, these guidelines also apply to Excel’s Power Pivot and Power BI Desktop & Service.)
With a real-world example, we cover recommendations such as:
- Benefits of designing your model as a “dimensional model”, a.k.a. “star schema”.
- Pre-processing your data in this manner will enable the most scale for high data volumes, and deliver the best performance at query time.
- Steps for optimizing Dimensions:
- Minimize the number of columns.
- Reduce cardinality (data type conversions).
- Filter out unused dimension values (unless a business scenario requires them).
- Integer Surrogate Keys (SK).
- Ordered by SK (to maximize Value encoding).
- Hint for VALUE encoding on numeric columns.
- Hint for disabling hierarchies on SKs.
- Steps for optimizing Facts:
- Handle early arriving facts. [Facts without corresponding dimension records.]
- Replace dimension IDs with their surrogate keys.
- Reduce cardinality (data type conversions).
- Consider moving calculations to the source (to use in compression evaluations).
- Ordered by less diverse SKs first (to maximize compression).
- Increased Tabular sample size for deciding Encoding, by considering segments and partitions.
- Hint for VALUE encoding on numeric columns.
- Hint for disabling hierarchies.
AS Tabular provides very high flexibility in what models you can build. The guidelines in this article will help you maximize the capabilities you provide with your solutions. We hope you find it useful!