Collation and language settings in tabular models

Collation and language settings are tricky; so tricky that my two planned blog posts on this topic can only scratch the surface of what goes on under the covers here. This post is a simplified overview of collation and language settings for Denali. In a future post I will show you an example of how to work with collations.

There are two read-only properties on tabular models, Collation and Language, that give you a hint as to what the engine is thinking when it formats and sorts your data. For tabular models, we show you what the database settings are, and you can observe them by clicking on the bim file and viewing its properties:

image

The language is used for some data formatting. For example, the language hard codes the currency symbol format, so the currency symbol doesn’t change by the report viewer’s locale (very important!). The collation is used for sorting data, comparing data, and figuring out what to do with pasted data. The MSDN overview of languages and collations (which applies to both multidimensional and tabular models) provides some additional information on this subject.

When you create a new tabular project in SQL Server Data Tools (BIDS), the language is always set for you and serialized as part of the model. Here is how the language of the model gets chosen:

Project Template Default Language
Analysis Services Tabular Project Language of BIDS
Import from PowerPivot Language of the PowerPivot Workbook
Import from Server (Tabular) Language of the source database

This language is not editable after it has been set. It is also persistent across modeling environments. If your model is 1033, it stays 1033 even if you open it in German BIDS. The language is serialized all the way down from Database to Cube to Dimension. The tools keep the language the same across all of these objects, and never mixes up the languages.

Now to the collation. The collation may or may not be serialized as part of the model, as the table below describes:

Project Template Collation
Analysis Services Tabular Project Not serialized
Import from PowerPivot Not serialized
Import from Server (Tabular) Serialized if the user explicitly set the collation on the source database, otherwise not serialized

When the collation is not serialized, the Collation property in the property grid appears blank. This means that the collation is blank 95%+ of the time. How are you supposed to know what the collation is? Well, the engine figures out what collation to use based on the language that was set for the tabular model. It uses the same routines as SQL engine does to pick an appropriate default collation. This collation reference from SQL setup gives a pretty accurate mapping of the default collation for each LCID. Once the engine picks the collation, it uses it consistently when sorting, processing, and comparing data.

Thank you Brandon Unger for tolerating my incessant pestering while writing this post. I’m glad you still answer with my emails and IMs with helpful information.

Comments

  • Anonymous
    October 23, 2011
    The comment has been removed
  • Anonymous
    October 24, 2011
    Chris - thanks for the feedback. I can totally dig why this is not your favourite feature.A couple of things. The language is not a server side setting in this case. The language used is determined by the client tools when the model is created. The currency symbol can actually be changed in the property grid. The default though is the language of the model.The display formatting of numbers is something that I'll get to in another post. There are some nuances to that behaviour.Thanks