Bucketing Values in DAX
Let’s say I wanted to analyze query performance on my SQL database. I have a log that has all the queries, the time it took them to run and the total records that were returned. I want to look at how the size of the results impacts the performance of the queries. I pull my data into Power Pivot and create a new pivot table where I put “Rows Returned” on rows and “Average Duration” in values. I get the following long results:
These results show me every unique number of rows that was returned and their average duration. The values returned for the “Rows Returned” field range between 0 and 172,028,053. I have added a measure with the query counts to the results showing the vast distribution of queries.
Since there are so many values of “Rows Returned” each individual average is not useful. What I really want to do is bucket the “Rows Returned” into groups and analyze the averages of each group. To do this. I take the following steps:
Add the table to my data model and call it “Buckets”
Add a new calculated column to my QueryLog table called “Row Range” with the following query:
“=CALCULATE(VALUES('Buckets'[Bucket Name]),filter('Buckets', 'Buckets'[start]<=[Rows Returned] && 'Buckets'[end]>=[Rows Returned]))”
Let’s break down this query a little for a single row:
VALUES('Buckets'[Bucket Name]) |
This will return all of the buckets names |
|
CALCULATE(…) |
Since we are only looking for one value for Bucket Name and don’t want the whole list, so we use CALCULATE to change the filter context. |
|
filter('Buckets',… && …) |
We want to filter the list by two different fields from the Bucket table, so we use the filter function inside of calculate as well as “&&” to specify an and condition. |
|
'Buckets'[start]<=[Rows Returned] |
Looks for all rows where the Start column is less than or equal to 515 (Rows Returned) |
|
&& 'Buckets'[end]>=[Rows Returned] |
And where End greater than or equal to 515 (Rows Returned) |
Now, when I go back to my pivot table I can use “Row Range” on rows and I get:
Comments
- Anonymous
June 06, 2014
Hi Josh, I've written two longer articles about this technique in DAX Patterns web site.You used the Static Segmentation pattern: www.daxpatterns.com/static-segmentationIt is also available the Dynamic Segmentation Pattern: www.daxpatterns.com/dynamic-segmentationThe Dynamic pattern is certainly slower, but it allows the user to define buckets based on current selection, using a measure (calculated field in Excel 2013) instead of a calculated column.I hope this will help other readers! - Anonymous
June 09, 2014
Thanks Marco. You certainly go into some great detail here that I am sure people will want to read. - Anonymous
August 27, 2017
Hi Josh!Thanks for an excellent article, this is just what I have been looking for. I was just wondering how I should proceed to incorporate this into a cube with multiple tables joined by a common key? With this soloution, I am able to get the correct results for calculations made in the same table as where I added the Calculated Columns, but not for the measures in other tables.This is my setup:1 Stock table with all data related to stock levels, with department number as key1 Sales table with all data related to sales, with department number as keyA number of lookup tables that translate the department keys into aggregations on higher lvl (product type, material, etc..)PriceGroups table which is the Bucket in your case - not joined on anythingThe formula i use for the grouping is =CALCULATE(VALUES(PriceGroups[GroupName]);FILTER(PriceGroups;PriceGroups[start]=[Avg Sold Price TY]))Appriciate any input! I am very new to DAX and am working my way through the first books to get a deeper understanding.Sincerely,Olof