Behind the scenes of PowerPivot’s automatic relationship detection
At times you will see a notification for the automatic relationship detection appear at the top of the PowerPivot Field List, yet no relationships can be detected. Why?
This can be caused by multiple reasons, which are described below.
The design goal for the automatic relationship detection was to help explain the value and impact of relationships to the end user that is not necessarily all that familiar with concepts in a context where it makes sense. So rather than aiming at automatically detecting all possible relationships, the aim was user education. As a result the detection algorithm is kept very conservative in order to make sure incorrect relationships are never automatically created.
In this article I’ll try to shed some light on the particular restrictions that apply, as well as outline when relationship detection is triggered. Hopefully, this can help explain why not all relationships are detected, as well as how you can apply changes to data and metadata to potentially improve the results of the automatic relationship detection.
Triggering the Automatic Relationship Detection notification
Automatic relationship detection is triggered for measures added to the Values area only. Not for fields added to the row and columns labels (dimensions). So as you start to build the PivotTable, no automatic detection of relationships will take place until a field is dropped in the Values drop zone.
The notification bar informing you that relationships are needed is activated by the fact that you drag a field to the Value drop zone of the existing PivotTable that is currently not related to one or more of the fields in the PivotTable. This however does not necessarily mean that relationships can actually be detected. There can be various reasons why no relationships are detected even if notification bar was displayed:
1.1 Name Similarity
The relationship detection algorithm depends on the foreign key column having a name with some similarity to either the primary key column table name or the primary key column name. In other words the key column fact table name (FK) must have some extent of similarity with the name of the look-up (PK) table or the name of the key column in the look-up table.
If not, the algorithm may not be able to detect any relationships. In this case, it’s recommended that you go to the PowerPivot window in order to manually create the required relationships between the active tables in the PivotTable OR that your take a look at the metadata for the table and column names before you try to automatically create the relationships again.
1.2 Data Type support
Only detection of whole numbers and text data types are supported by the relationship detection algorithm. This mean that depending on the structure of the data set you may end up in a situation where no relationships can be detected due to the fact that one or more of the tables used in the PivotTable contains columns of non supported data types only. Non supported data types include Boolean, Date time, Currency and Decimals. In this case – as above, it’s recommended that you go to the PowerPivot window in order to manually create the required relationships between the active tables in the PivotTable OR that your take a look at the metadata for the table and column names before you try to automatically create the relationships again.
1.3 Duplicate Keys
In cases where the key column of the look-up table (Primary key column) contain duplicate values no relationships will be detected either. The reason being, that the column cannot be marked as unique and is therefore not a valid look-up column. In this case, the only workaround is to edit the data of the look-up table to ensure the column contains unique values only. This, either by modifying the data back in the original source or by pasting the data to Excel, identify the duplicated values, update the table and paste it back into the PowerPivot window (using paste replace to update existing data if table originally was a manually created table, pasting to new table if data was originally imported from external data source)
1.4 Distinct unique keys
The primary key column (the lookup table column) must always contain more distinct unique keys than the foreign key column (the column on the many side of the relationship). For example, if you have a sales table and a product table for which you are trying to detect a relationship based on product name, the number of unique names in your product table must be equal to or exceed the number of distinct unique names in your sales table. If not, no relationships will automatically be detected.
1.5 Unknown members
You may encounter scenarios where you have some keys in the fact table, which have no matching keys in the lookup (Primary key) table. Such unmatched keys – also known as unknown members – are permitted in the PowerPivot data set yet still allowing you to be able to automatically detect relationships. The number of unknown members are limited to a certain degree however to make sure meaningless relationships are not created. The thresholds is lowered exponentially with the growth of data volumes, lowering the allowed threshold for greater the data volumes. As an example you will be able to automatically detect relationships as long as you have less than 5% unknown members for a data set of 100 rows whereas 1% is allowed for a data set of more than a 1000 rows.
The unknown members are represented with a blank row label in the PivotTable as shown below
1.5.1 No Valid Relation between Tables
As a final note it has to be said that in scenarios you simply may be adding fields to the PivotTable that has no valid relationships to the other table(s) already in use. In some situations adding an unrelated field can be intentionally and you probably do not even expect the detection to return anything. In other situations you may simply realize that that the data is not really related. In this case there is not much you can do accept from (hopefully) realizing that the fields simply are not related J
Comments
- Anonymous
September 16, 2010
Thanks for the information Julie. I believe there is a small typo in the 1.2 section on data types where you listed 'Data time' instead of 'Date time'. I blogged about this exact issue here PowerPivot Relationship Detection Experience (denglishbi.spaces.live.com/.../cns!CD3E77E793DF6178!3439.entry) and posted an item on Connect in regards to DateTime relationship detection - connect.microsoft.com/.../powerpivot-datetime-relationship-detection. - Anonymous
September 22, 2011
Is there anyway we can get at this logic from AMO? ie Can we structure our source data sources to expose the right primary and foreign key relationships and have it all fit together in code. Then output a cube with all the relationships.Thanks,Matt - Anonymous
May 02, 2012
When facing duplicate keys with tables coming from an SQL database, it is often possible to make the SQL query than defines the table in Powerpivot more specific and by that remove the duplicate values. This must make sense from a functional point of view, since there is usually a functional if you encounter duplicate values in a column that you intend to use as keys (proper key columns cannot contain duplicate values).