What makes a Data Source a Data Source?
It should be obvious, and it is — at least at the Tabular 1200 compatibility level: A data source definition in a Tabular model holds the connection information for Analysis Services to connect to a source of data such as a database, an OData feed, or a file. That’s straightforward. However, at the Tabular 1400 compatibility level, this is no longer so trivial. At the Tabular 1400 compatibility level, a data source definition can include a native query and even a contextual M expression on top of the connection information, which opens interesting capabilities that didn’t exist previously and redefines to some degree the nature of a data source definition.
Let’s take a closer look at a data source definition in a Tabular 1400 model, such as the following definition for a SQL Server-based data source:
The two important properties are the query parameter in the connectionDetails, which can hold a native source query, and the contextExpression parameter, which can take an M expression. The default "..." simply stands for an expression that takes the data source definition as is without wrapping it into a further M context. You can find a more elaborate example at the end of this article. For now, just note that you won’t see the contextExpression in your data source definitions yet. A forthcoming release of SSAS and SSDT Tabular will enable this feature.
The query parameter, on the other hand, already exists in the metadata. It’s just that SSDT Tabular does not let you enter a source query through the user interface (UI) when defining a data source. This is intentional to maintain the familiar separation of connection information on data sources and source queries on table partitions. Equally, there are currently no plans to expose a contextExpression designer in the UI.
The following screenshot shows the Power BI Desktop UI in the background for a SQL Server data source with a textbox to enter a SQL query in comparison to SSDT Tabular in the foreground, which doesn’t offer this textbox.
For most data modelling scenarios, a clear separation of connection information and source queries is advantageous. After all, multiple tables and partitions can refer to a single data source definition in SSDT. It doesn’t seem very useful to restrict a data source to a single result set by means of a source query, such as “SELECT * FROM dimCustomer”, defined through the data source’s query parameter. Instead, it would be more useful to specify the query when importing a table by using the Value.NativeQuery function, as the following screenshot illustrates.
This way, the data source remains available for importing further tables from the same source. On the other hand, if you do need a data source with a very narrow scope, you can set the query parameter manually by using the Tabular Model Scripting Language (TMSL).
If it’s clearly not recommended to use the query parameter in a data source definition, then why did we come up with yet another such parameter called contextExpression? Well, this brings us back to the starting point: What makes a Data Source a Data Source?
A data source can be defined along a varying degree of detail, as shown above. On one extreme, you could define a data source that is so narrow it returns a single character, such as by using the following source query: “SELECT TOP 1 Left(firstName, 1) FROM dimCustomer”. Not very useful, but still a source of data. On the other extreme, a data source could be so broad that the tables you import on top of it require redundant statements that could be avoided with a more precise data source definition. For example, by using Tabular Object Model (TOM) or TMSL, you could define a SQL Server data source that only specifies the server name but no database name. Any tables importing from this data source would now require an M expression that includes a line to navigate to the desired database first before importing a source table, such as “AdventureWorksDW = Source{[Name="AdventureWorksDW"]}[Data] ”. Perhaps even more extreme, some data sources can be defined so broadly that they don’t even include information about the data source type. For example, any file-based data source can be considered of type File, while in fact a better definition would be a Microsoft Access database, Microsoft Excel workbook, comma-separated values file, and so forth. This is where the contextExpression comes in. It adds context information to narrow down a very broad data source definition to make it more meaningful.
The following abbreviated data source definition for an Access database shows the contextExpression in action. The connectionDetails merely define a File data source, which is too broad. What we want to define is an Access data source, so the contextExpression takes the File data source and wraps it into an Access.Database() function. As mentioned earlier, the placeholder expression "..." stands for the data source definition without the additional context.
By using a context expression, SSDT Tabular can define data sources that build on other data sources. Through TOM or TMSL, you can also edit the context expression to build more sophisticated definitions, yet this is generally not recommended. Also, unfortunately, TOM and TMSL do not provide an API for editing an M expression. This may come at some point in the future, but for now it’s not a priority.
And this is it for a quick glance at the upcoming contextExpression feature. As always, please send us your feedback and suggestions by using ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. You can influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers.
Comments
- Anonymous
March 15, 2017
This is great -- you guys are revolutionizing the BI industry. Thanks for all your hard work and for giving us a better look at what's going on underneath the covers. Very much looking forward to future updates. - Anonymous
March 17, 2017
this is certainly an interesting enhancement... can you share a bit of background on the use-case(s) targeted by this feature?dynamic disparate sources...maybe "data lake"-style architectures??doesn't seem to change much for the traditional BI implementations.- Anonymous
March 22, 2017
You’ll see the contextExpression in action for the first time when we enable File-based data sources. The SSDT UI will set the expression and will be able to parse it again (general rule: what the UI creates, the UI can parse). You can modify the expression, of course, to apply whatever additional detail you find useful. It’s an M expression. The only caveat is that the SSDT UI might not be able to parse the expression anymore, so you might not be able to edit such a customized data source in the UI. And yes, it opens the door somewhat for a dynamic definition of data source parameters, but this is stretching the contextExpression too far for now. Dynamic or parameterized data sources are not yet a design goal. Let’s see how the File-based data sources turn out.
- Anonymous