Understanding Trends in the Time Series Model (Intermediate Data Mining Tutorial)
When you look at the various models created based on aggregated data, you see that the trend and prediction lines look very different depending on whether you summed amount and quantity across the different models and regions, or whether you averaged the figures.
Before you decide which model to apply as the general model for making worldwide projections, you will review the underlying data together with the prediction chart, in order to better understand the predictions.
You noticed that the trend lines overlapped until June 2006, when the lines for quantity and amount diverge. Then, at July 2008, the lines diverged again.
In this task, you will create a named calculation based on the original data source view to help you track the relationship between quantity and pricing. Then, you will create a pivot chart that includes this ratio to help understand the splits in the trend lines.
Reviewing the Underlying Data
To create a named calculation
In Solution Explorer, expand Data Source Views, and double-click SalesByRegion.dsv.
Right-click the table vTimeSeries and select New Named Calculation.
In the Create Named Calculation box, for Name, type UnitAmt.
In the Expression text box, type Amount/Quantity. Click OK.
The vTimeSeries table now contains an additional calculated column, UnitAmt. This calculated column is stored only in the data source view definition for the data mining project and does not affect the underlying relational database view.
To create a pivot chart using the named calculation
Right-click the table vTimeSeries and select Explore Data.
In the Explore vTimeSeries Table tab, click the tab, Pivot Table.
Drag the TimeIndex field from the Pivot Table Field List box and drop it in to the pivot table design surface where it says Drop Column Fields Here.
Drag the ModelRegion field from the Pivot Table Field List box and drop it in to the pivot table design surface where it says Drop Row Fields Here.
Drag the UnitAmt field from the Pivot Table Field List box and drop it in to the pivot table design surface where it says Drop Totals or Detail Fields Here.
Reviewing the Aggregated Models
From the pivot table, you can tell that at several points in time there were substantial changes in the amount of sales per unit, possibly because of changes in product pricing. In July 2007, there is another significant change when a new version, the T1000, was added to all regions. All of these changes affect the trends that are calculated in the model. A generalized model is useful because it minimizes the impact of any one change. However, in some scenarios, you might decide to create separate models for new stores so that the data from these stores does not affect trends.
For this tutorial, you will pick one of the aggregate models to apply to sales projections. Remember that you created four different mining models, each based on a different aggregate measure. You will use the tools provided in the time series viewer together with the pivot table that you created earlier to guide this decision. The following diagram shows the time series chart created for the aggregate models. The two series lines in gray show averages and the two series lines in green shows sums.
Before you decide which mining model to use for sales projections, you decide to investigate the following points:
Mining models based on Amount trend upward whereas models based on Quantity have a cyclical decline.
The projections based on average amount (AvgAmt) and the projections based on sum quantity (SumQty) are very far apart.
Although trend lines in three of the models level out after 5 predictions, the trend line in the model based on sum quantity continues sharply upward.
There are three things that you can check to get more information. First, select the Show Deviations check box to show the standard deviations for each prediction. A longer error bar indicates that there is greater variance in the predicted value.
Second, notice that the unit of the y-axis is percentages; moreover, note that the scale of the graph changes depending on the data in the chart. By default, the Microsoft Time Series Viewer automatically adjusts the units on the percentage axis to make the chart easiest to view. Therefore, if you want to use a specific or fixed scale, you should use a prediction query to create and export the values, and create a graph in another application, such as Microsoft Excel.
Finally, you can use the decision tree view of the time series model to understand splits in the model. In a time series model, a split, or a branch in the decision tree, can mean that the slope of the trend line changed significantly at some point, or it can mean that the tree branched based on some other conditions. The view that shows these splits as nodes in a tree view lets you drill down into the details of what caused the split.
To view the decision tree for each series
In Solution Explorer, expand Data Source Views, and right click AllRegions.dsv.
Right-click the table vTimeSeries and select Explore Data.
In the Explore vTimeSeries Table tab, click the tab, Pivot Table.
Drag the ReportingDate field from the Pivot Table Field List box and drop it in to the pivot table design surface where it says Drop Column Fields Here.
Drag the Region field from the Pivot Table Field List box and drop it in to the pivot table design surface where it says Drop Row Fields Here.
Drag the UnitAmt field from the Pivot Table Field List box and drop it in to the pivot table design surface where it says Drop Totals or Detail Fields Here.
It is also important to understand that, because a time series model is calculated by using moving averages, data values at the end of the data series can have a stronger impact on prediction than data values at the start of the data series. Moreover, the periodicity hint that you supplied when building the model also has an effect on how the averages are used over time.
Conclusion
Based on this analysis, you have learned several things:
Quantity is inherently volatile because the absolute values might be very low for some stores; moreover, the chart view tends to exaggerate the differences by presenting the figures as percentages.
Amount appears even more volatile, because Amount is it is dependent on Quantity but is also affected by changes in pricing.
The strong upwards trend in predictions for the M200 North America series is caused by the fact that sales volume was especially high for this combination of product and region towards the end of the historical data.
As a result of exploring the data and the various models in depth, you are satisfied that you have found a reliable model. If your business users require justification for the analysis, you can obtain formulas and supporting data by using the statistics available in the InfoTips, or by browsing or querying the model content.
In the next lesson, you will use the model that is based on average quantity to make predictions for quantity in all regions.