SSAS Troubleshooting: less data in cube than in fact table
Issue
Customers have less data in their cube than they have in the fact table.
Essentially, when they view the cube Measures.NetSales does not equal Sum(NetSales) from the fact table.
Causes
So here are some possible causes, that you should investigate in such circumstances.
The measure datatype cannot hold the aggregated number.
For example, the fact table might have an integer data type for the SalesQuantity column, which might be fine for the SQL table. However, by default, SSAS will give the SalesQuantity measure the same datatype, but now it must be able to contain the sum(SalesQuantity). If sum(SalesQuantity) is greater than 2^31 SSAS will not display any errors and the Measures.SalesQuantity will show numbers that appear to wrap around. So the numbers are much smaller than expected (almost random) at an aggregated level. The solution is simple, just change the datatype in the cube to a much larger number, for example, bigint, which can be up to 2^63. One of the good things about using bigint over int, is that the cube is almost exactly the same size, as SSAS will compress the leading zeroes in binary numbers very effectively. If you need numbers larger than 2^63, you might need to use a double datatype, which, although potentially much larger is an imprecise number. Also, a double datatype measure will typically make the cube significantly larger than a binary measure containing the same information.
Missing members
There are missing members on one or more dimensions and missing members are not enabled. The missing member facts will not be included in the cube. The cube should contain all the facts in the fact table. Even if there is a missing member, the data should still be available. For example, if there are sales for a ProductId that does not exist, typically, you will still want to know about the sales even though you cannot say which product it was for. I prefer to enable missing members, and just hide them if you must. It is also a good practice to regularly check for missing members, either in the DW or in the cube (through missing members).
Arithmetic error
The third situation where the cube numbers are light is an unusual one. It is where the fact table (actually a view) returns an arithmetic error such as divide by zero. Most times SSAS will sympathetically return the SQL error, and you will know there is a problem during cube processing. However, sometimes SSAS does not sympathetically return the SQL error, rather, it returns a successful cube process with the rows up until the divide by zero. There are two ways of locating this error. Firstly, you could simply execute the same SQL query in SQLEM window and wait for the full resultset or the arithmetic error to return. This is not practical in many circumstances due to the number of rows in the resultset. I guess you could get SSIS to help you, but an easier way is to use SQL Profiler and trace Error events.
Inner join
You have a referenced dimension and the INNER JOIN clause that SSAS is generating for SQL Server is removing fact records that don't join with the intermediate dimension. To identify this issue, you need to execute the SQL generated by SSAS, and compare its counts with the raw fact table.
Incremental processing
Is your cube (measure group) processed incrementally? If so, there might be a logic issue with the way you are incrementally processing. To verify, you could reprocess the entire measure group and see if the numbers now equal the fact table.
Data changing
Is the data in your fact table changing? Of course, if the fact table is changing, the MOLAP cube won't know about it. This can be related to incremental processing, where the incremental processing logic is good, but historic data records are changing. Strictly speaking, you cannot incrementally process updates, only new records. If the data has changed, the cube (partition) will require processing.
Partitions do not cover all possible values
For instance, partitions may be set up for date ranges. If you have a partition for "less than 2012", "2012", "2013", and "greater than 2013" then you may not be covering NULL date values. Those records with null dates will not appear in the cube but obviously will show up in the fact table. Add one more partition for "date is null" and reprocess the cube to resolve this particular scenario.
Default member
Have you a default member that is not the [All] member, or do you have a dimension without an [All] member. In these cases, you would need to take this into account when comparing Measures.NetSales with sum(NetSales).
Calculate statement commented out
The Calculate statement in the cube has been commented out.
Suggestions
A couple of suggestions when reconciling such issues
- Use physical measures in the cube. Even if they are hidden, you can still query the physical measures.
- Ideally, start off by reconciling record counts. If you can, put a record count in the cube.
- When you are running SQL queries against the fact table, use the SQL statement that is generated by SSAS, since it might be referencing a different table/view than you expected. Also, as in point 4 above, it might be performing an INNER JOIN with an intermediate dimension that has missing members.