Condividi tramite


Introduction to MDX Scripting in Microsoft SQL Server 2005

 

Richard Tkachuk
Microsoft Corporation

March 2005
Updated June 2005

Summary: This document describes how Multidimensional Expressions (MDX) for Microsoft SQL Server 2005 can be applied to common business problems. This document assumes some familiarity with MDX. (12 printed pages)

Contents

Introduction
MDX Concepts
MDX Scripts
How a Cube Aggregates

Introduction

Analysis Services offers a powerful server-based calculation engine. It's a spreadsheet on steroids that can do virtually anything. This document describes how Multidimensional Expressions (MDX) for Microsoft SQL Server 2005 can be applied to common business problems.

This document assumes some familiarity with MDX.

MDX Concepts

This section introduces some fundamental concepts. The following section describes scripting in more detail.

Cube Space

A significant departure from Microsoft SQL Server 2000 is the concept of attributes. An attribute is to a dimension what a column is to a table in a relational database. For example, a Customer dimension can contain attributes such as Name, Phone Number, Gender, City, State, and so on.

Attributes are exposed to users through attribute hierarchies. An attribute hierarchy in a dimension includes an optional All level and the distinct members of the attribute. For example, a Customer dimension might include a Name attribute hierarchy with two levels: the All level and a second level with a member for each name. (Parent-child hierarchies are handled differently.)

It is attribute hierarchies that define the space of a cube. You can think of a cube as the multidimensional space created by the product of its attribute hierarchies. Dimensions are containers for attribute hierarchies. A dimension can also contain user hierarchies as a navigational convenience, but these do not affect the space in the cube.

An attribute need not have an attribute hierarchy. If an attribute hierarchy is not created, the space of the cube is independent of the attribute. For example, an attribute hierarchy is typically not created for the attribute Phone Number because there is generally little demand to be able to navigate a dimension by phone numbers. If an attribute hierarchy is not created for the attribute, the attribute can be used as a member property, but not as a level in a user hierarchy.

Cell values are completely defined by their attribute hierarchy coordinates and pass. The value of a cell does not change depending on navigation path or the presence of user hierarchies.

For example, the cell defined by a user hierarchy Geography, Customer.Geography.USA.WA.Redmond.Richard, is identical to the cell defined by attribute hierarchies (Customer.Country.USA, Customer.State.WA, Customer.City.Redmond, Customer.Name.Richard).

The cell at coordinate (Customer.Name.Richard) is not the same coordinate as (Customer.Geography.USA.WA.Redmond.Richard). This is because a member of one attribute hierarchy does not imply members of attribute hierarchy-in other words: (Customer.Name.Richard) is the same as (Customer.Name.Richard, Customer.Country.All, Customer.State.All, Customer.City.All). Even though the customer Richard lives in Redmond, WA, USA, the coordinate Customer.Name.Richard does not imply a coordinate in City, State, or Country attribute hierarchies.

Dimension Leaves

A dimension has leaves. The meaning of a leaf is based on the granularity attribute-the attribute that binds the dimension to the measure group. Often this is the dimension key, but it doesn't have to be. For example, if a Time dimension is tied to the measure group by Date, Date is the granularity attribute even if the Time dimension has a finer grain of detail (for example, to the minute).

Leaves are the dimension coordinates that fact table data is directly associated with. More precisely, a leaf is defined by the combination of members from attribute hierarchies (whose attributes are directly or indirectly related to the granularity attribute) where none of the members are the All member. For example in a Customer dimension with attributes Name (the granularity attribute), Email, City, State, and Country, the following dimensional coordinate is a leaf:

(Customer.Name.Name.Richard, 
Customer.Email.Email.richard@Adventureworks.com, 
Customer.City.City.Sammamish, Customer.State.State.WA, 
Customer.Country.Country.USA)

However, the following dimensional coordinate is not a leaf because the member from the Email attribute hierarchy is at the All level:

(Customer.Name.Name.Richard, Customer.Email.[(All)].[All], 
Customer.City.City.Sammamish, Customer.State.State.Wa, 
Customer.Country.Country.USA)

If the granularity attribute is not the key attribute of the dimension, leaves contain the All member for attributes that are not related to the granularity attribute. For example, if the granularity attribute for the Customer dimension is the City attribute, the following dimensional coordinate is a leaf:

(Customer.Name.[(All)].[All], Customer.Email.[(All)].[All], 
Customer.City.City.Sammamish, Customer.State.State.Wa, 
Customer.Country.Country.USA)

Note   There may be no data in cells that intersect with members of attributes that are not related to the granularity attribute. For example, if the granularity attribute is City, then slicing on a member of the Name attribute yields empty cells. The exact behavior is dictated by the IgnoreUnrelatedDimensions property on the measure group.

By definition, there is a single leaf for each granularity attribute member.

Autoexists

The true space of the cube is more restricted than the product of its attribute hierarchies. There are cells that do not exist, because Autoexists Attribute members from the same dimension, which do not exist with one another, do not exist in this cube space. For example, (Beijing, Canada) does not exist. The concept of Autoexists runs throughout this document.

Note   This has nothing to do with data in the fact table. It is a dimensional concept only-Autoexists only pertains to the attributes in the same dimension.

Queries can request cells that do not exist in the cube space. For example, the statement select customer.gender.members on 0, {Customer.Name.Fred, Customer.Name.Jane} on 1 from sales includes cells that do not exist in this space. These cells always return empty-they cannot contain calculations and they cannot be written to.

Autoexists plays a large role in query results:

  • When sets with common dimensionality are projected along the same axis, members are retained that exist with each other.

    For example:

    Select crossjoin({Customer.Country.Country.USA}, Customer.States.States.members) on 0 from Sales
    

    Result: only those states in USA are retained from Customer.States.States.members.

  • The All member of every hierarchy automatically exists with all other members of all hierarchies in the same dimension.

Subcubes

A subcube is a collection of crossjoined sets that restricts the space for subsequent statements to the subspace. Subcubes are frequently referenced, and there are restrictions on what forms a valid subcube. Each set in the subcube definition can be:

  • An arbitrary collection of members from an attribute hierarchy excluding the All member.
  • A single member of a hierarchy or an MDX expression that resolves to a single member.
  • An arbitrary collection of members at a specified level in a natural hierarchy.

Note   A natural hierarchy is composed of attributes where each attribute is a member property of the attribute below. For example, the Geography hierarchy Country, State, City and Name is a natural hierarchy if City is a member property of Name; State is for City; and Country is for State. The hierarchy Gender-Age is not a natural hierarchy because Gender is not a member property of Age.)

  • Descendants of a member at multiple levels (for natural hierarchies only).
  • All members of a hierarchy.
  • The leaf members of a hierarchy

The argument "*" is accepted by subcube definitions. This represents the entire contents of the cube-every member from every hierarchy from every dimension.

Axes Hierarchality

There are rules how members from the same hierarchy can be projected on different axes in a query:

  • There are no limitations on projecting sets with common dimensionality but different hierarchality along any axis.
  • Sets with the same hierarchality cannot be included in more than a single axis.

Note   The hierarchality of an object represents the hierarchies present in the object.

Which Way Is Up?

In SQL Server 2000 Analysis Services, data aggregated from level to level. For example, in a Customer dimension Country, State, City, and Name data aggregated from Name to City, then to State, and then to Country. If a calculation changed the value for, say, Redmond (but not for individual customers), values above Redmond reflected the change; that is, values for Washington, USA, and the All Customer member were affected.

The way data aggregates in SQL Server 2005 is similar but differs a bit because a cube might not contain any hierarchies other than the attribute hierarchies. In other words, how does data aggregate "up" if no hierarchy defines what "up" means? If the customer dimension contains dozens of attributes and no user defined hierarchies, what aggregates from what?

This question can be answered if one re-examines the cube space using attribute granularity. Each cell in a cube has a grain with respect to an attribute hierarchy; the cell is either at the attribute all level or at the attribute level (things get somewhat more complicated with parent child hierarchies, but the principal is the same). The All level of an attribute hierarchy has a higher granularity that the attribute hierarchy level. For example, the member Customer.City.[All] is at a higher grain than Customer.City.Seattle.

Once we accept the fact that each cell has a grain, we can define what "up" means using granularity:

  • One cell has the same grain as another cell if it has the same granularity on all attributes.
  • A cell has a higher grain than another if at least one attribute has a higher granularity and others are the same or higher granularity.

Advanced Note   Cells that are above or below another can be described as comparable. Cells neither above nor below are termed incomparable; cells are incomparable if some attributes are at a higher grain and other attributes are at a lower grain. For example, the cell defined by the tuple (Customer.City.[All], Customer.Customer.Richard) is not comparable with the cell (Customer.City.Sammamish, Customer.Customer.[All]) because one attribute is higher and the other is lower.

Now that we have a sense of direction, we can define how data aggregates: data aggregates from lower to higher granularity.

This is helpful-but in no way complete. How does poking a calculation in the middle of a cube affect how the cube aggregates? Unary operators? Pass? This is defined in the upcoming sections.

MDX Scripts

An MDX script is a collection of commands, each of which is separated by a semicolon. Scripts are used to populate a cube with calculations.

This section describes the key concepts behind MDX scripts and some new functions that help with common calculations.

Execution vs. Declaration

The script appears to execute like a program. In fact, the same concepts of Analysis Services 2000 are applied (such as pass and solve order) but are pushed further into the background. The MDX script does not really "run." It is a declared set of commands that are always in effect. The contents of the cube are always consistent with the script.

Pass

At pass 0, the cube contains fact and writeback data only. Scripts populate post-pass 0 data.

Each assignment, freeze and calculate statement in a script creates a new pass.

Calculated members exist in all passes regardless of the pass in which they are created.

No Script

If no script is defined, the cube assumes a default implicit script with the single command Calculate.

If the script exists but is empty, an empty script is used and nothing is calculated.

Creating Calculations

A simple way to define a calculated cell is:

<subcube definition> = <expression>;

Examples:

(Sales,Budget) = (Sales,Actual) * 1.2;
Sales = Sales *1.2 ;

Each assignment creates its own pass, which prevents infinite recursion.

Where a single expression causes infinite recursion, the value from the prior pass is taken. For example,

Sales = Sales * 1.2 

is equivalent to

Sales = CalculationPassValue(Sales, -1, RELATIVE) * 1.2

Explicit references to pass are permitted but discouraged because the pass number changes as the script changes.

Calculations change results at higher granularity-exactly how is defined later.

Scope

A scope limits statements to a subcube. If no scope is specified, the default scope is for the entire cube.

BNF

Scope(<subcube definition>);
<statement>
...
End Scope;
<subcube definition> :== refer to the definition of a subcube

Scope definitions are static. For example, consider a cube with a Customer dimension that plays two roles: SoldTo and PurchasedFrom. To define a subcube of customers who sold to themselves, the following approach does not work as intended:

Scope (SoldTo.Name.Name.members);
   Scope(LinkMember(SoldTo.Name.CurrentMember, PurchasedFrom.Name));
      ...
   End Scope;
End Scope;

This does not work because the second scope statement is not reevaluated in each cell; the expression SoldTo.Name.CurrentMember is not dynamic over the original scope.

Examples

The next several examples are based on a cube with 2 dimensions; Customer and Measures. The measures dimension has a single measure, Sales. The Customer dimension has 2 attributes: Country and City (the granularity attribute); and 3 hierarchies: the two attribute hierarchies and a user hierarchy Geography. Country has the members USA and England; City: Seattle, Redmond, London and Leeds. The tables below show the impact of a scope and assignment.

Note that there is no calculate statement in the following examples. The data in the cube is not aggregated; all changes are because of the assignment only.

For brevity, the dimension name is omitted from member names.

Example 1

Scope(Customer.Country.Country.USA, *);
   Sales = 2;
End Scope;

The query:

If one looks at the data by the attribute hierarchies, one sees:

  Country.

[(All)].[All]

Country. Country.USA Country. Country.England
City.[(All)].[All] 2
City.City.Seattle 2
City.City.Redmond 2
City.City.London
City.City.Leeds

And if one looks at data from the user hierarchy Geography:

  Sales
Geography.[(All)].[All]
Geography.USA 2
Geography.USA.Seattle 2
Geography.USA.Redmond 2
Geography.England
Geography.England.London
Geography.England.Leeds

In this example, there is no value in the cell defined by (City.London, Country.USA) because it does not exist.

Example 2

Scope(Customer.City.City.Redmond, *);
   Sales = 4;
End Scope;

Sales in attribute hierarchies:

  Country.

[(All)].[All]

Country. Country.USA Country. Country.England
City.[(All)].[All]
City.City.Seattle
City.City.Redmond 4 4
City.City.London
City.City.Leeds

Note that the cell (City.City.Redmond, Country. Country.England) remains null. This cell does not exist and calculations can not affect it.

Looking at Sales by the user hierarchy Geography:

  Sales
Geography.[(All)].[All]  
Geography.USA  
Geography.USA.Seattle  
Geography.USA.Redmond 4
Geography.England  
Geography.England.London  
Geography.England.Leeds  

Example 3

Scope(Geography.USA.Redmond, *)
   Sales = 32;
End Scope;
  Country.

[(All)].[All]

Country. Country.USA Country. Country.England
City.[(All)].[All]
City.City.Seattle
City.City.Redmond 32
City.City.London
City.City.Leeds
  Sales
Customer.Geography.[(All)].[All]
Customer.Geography.USA
Customer.Geography.USA.Seattle
Customer.Geography.USA.Redmond 32
Customer.Geography.England
Customer.Geography.England.London
Customer.Geography.England.Leeds

Nested Scope

Nested scope statements inherit parent scope (unless an attribute is rescoped). For example:

Scope(Customers.Country.USA);
   Scope(Customers.State.Or);
      Scope(Customers.Gender.Male);
         <scope is USA, Or, Male>
      End Scope;
   End Scope;
End Scope;

A scope statement can rescope the hierarchy included in a parent scope. For example:

Scope(Customers.Country.USA);
   Scope(Customers.State.Or);
      Scope(Customers.State.Wa);
         <scope is Wa, USA>
      End Scope;
      <scope is Or, USA>
   End Scope;
End Scope;

This

The key word This represents the current subcube. For example, the following code sets Sales in USA and Canada to 2:

Scope(Sales, {USA, Canada});
   This = 2;
End Scope;

Note that This may not be used on the right-hand side of assignments.

Freeze Statement

In scripts, users may want to change the cell value that was earlier used in an expression to determine the result of another cell, but want to make this change without changing results of the earlier calculation.

BNF

Freeze [<subcube>]

The Freeze statement pins cells to their current value. Changes to other cells have no effect.

For example,

B = 2;
A = B;
B = 3;

At this point A and B both equal 3.

B = 2;
A = B;
Freeze(A);
B = 3;

At this point A = 2 and B =3.

In other words, Freeze(<scope>) on pass 'p' is logically equivalent to assigning the expression to the cell CalculationPassValue(<scope>, p, ABSOLUTE).

Calculate Statement

Cell calculations for nonleaf cells are implied by the structure of the cube's dimensions. Most often, it is a simple aggregation, but a dimension can define unary operators and custom member and level calculations.

Further, each dimension can imply a different formula and the sequence of their application can change the aggregation results. The Calculate command provides the means for the cube to be aggregated according to its default behavior or can be used to finely control the sequence of dimension aggregation.

The contents of a nonleaf cell are null before the Calculate command is applied; that is, no cell has a formula before the Calculate command is used to populate the cell with a formula.

BNF

Calculate;

The Calculate command has no affect on calculated members.

Root

Returns the tuple at the intersection of the All members from each attribute. If an attribute does not have an All member, Root returns the default member.

BNF

Root()

Returns the tuple at the intersection of the All member from each attribute in the cube. If there is no All member, a member from the top level is returned.

Root(<dimension>)

Returns the tuple at the intersection of the All member from each attribute in the dimension.

Root(<tuple>)

Returns the tuple at the intersection of all members from each attribute included in the tuple.

For example, a calculated member that calculates the percentage of profit of a particular product SKU versus the percentage of profit for all products would be the following:

Create Member [Profit%] AS
([Profit %], [Product].[SKU].[SKU].currentmember) /
([Profit %], Root(Product))

Root works in the context of the current member. Essentially, it is taking the root ancestor of the current member from every hierarchy. If there is an All member in all hierarchies, then it is indeed the grand All. However in the case of non-All members, it will go to the root ancestor of the current member. For example,

Root(Time) will yield [2002] if time.currentmember is [25-Feb-2002].

How a Cube Aggregates

Most of the time, the way a cube aggregates "just makes sense". But if you're trying to do something sophisticated, knowing the rules helps.

Last Pass Wins

Whenever a multiple assignments apply to the same cell, the last pass takes precedence.

This applies to cells at and above the scope of the assignment. For example, consider the following scriptlet:

Calculate;
(Customer.Name.[All], Customer.City.[All], Customer.State.[All],
  Customer.Country.[All]) = 100;
(Customer.Name.[Richard], Customer.City.[Sammamish], Customer.State.[Wa],
  Customer.Country.[USA]) = 1;

The value at (Customer.Name.[All], Customer.City.[All], Customer.State.[All], Customer.Country.[All]) will be 1 plus the data from the fact table for customers other than Richard. The aggregation from the assignment at the lower grain takes precedence over the assignment at an earlier pass at the higher granularity.

Closest Wins

The last pass wins rule applies to assignments, but not to dimension calculations such as unary operators and custom member formulas. The closest wins rule applies instead.

If a cell is above a dimension calculation and an assignment, the cell is evaluated from the closet calculation.

For example, consider the contrived cube with an account hierarchy with unary operators and a product hierarchy with two members. The calculate statement aggregates the fact data as follows:

All Products Baked Goods Drinks
All Accounts 5 1 4
+Revenue 12 4 8
-Expense 7 3 4

Now consider the impact of the following assignment:

(Account.[All Accounts], Product.[Drinks] ) = 100;

All Products Baked Goods Drinks
All Accounts 5 1 100
+Revenue 12 4 8
-Expense 7 3 4

The value at cell (Account.[All Accounts], Products.[All Products]) remains 5 because the Unary operator is closest (right on top!).

Consider instead the assignment:

(Account.[Expense], Product.[Drinks] ) = 1000;

All Products Baked Goods Drinks
All Accounts -991 1 -992
+Revenue 12 4 8
-Expense 1003 3 1000

The value at (Account.Expense, Products.[All Products]) aggregates from the assignment because it is at a higher granularity and no other calculation affects it (Unary operators affect how a member aggregates into its parent and not a calculation on the member itself).

Because of the difference in how so-called dimension calculations (unary operators and custom member formulas), dimension calculations cannot be purely emulated with assignments.