Dela via


Microsoft Time Series Algorithm Technical Reference

The Microsoft Time Series algorithm includes two separate algorithms:

  • The ARTXP algorithm, which was introduced in SQL Server 2005, is optimized for predicting the next likely value in a series.

  • The ARIMA algorithm was added in SQL Server 2008 to improve accuracy in long-term prediction.

By default, Analysis Services uses each algorithm separately to train the model and then blends the results to yield the best prediction for a variable number of predictions. You can also choose to use just one of the algorithms, based on your data and prediction requirements. In SQL Server 2008 Enterprise, you can also customize the cut-off point that controls the blend of algorithms during prediction.

This topic provides additional information about how each algorithm is implemented, and how you can customize the algorithm by setting parameters to fine-tune the analysis and prediction results.

Implementation of the Time Series Algorithms

Microsoft Research developed the ARTXP algorithm and based it on the Microsoft Decision Trees algorithm, which is an autoregressive tree model for representing periodic time series data. The ARTXP algorithm relates a variable number of past items to each current item that is being predicted. The name, ARTXP, derives from the fact that the ART algorithm, or autoregressive tree method, is applied to multiple, unknown prior states. For a detailed explanation of the ARTXP algorithm, see Autoregressive Tree Models for Time-Series Analysis.

The ARIMA algorithm is used for long-term prediction. It is an implementation of the ARIMA algorithm developed by Box and Jenkins. ARIMA, or autoregressive integrated moving average, is a well-known method for determining dependencies in observations taken sequentially in time. It also supports multiplicative seasonality. The ARIMA algorithm uses both autoregressive and moving average terms.

Although the ARTXP algorithm supports cross-prediction, the ARIMA algorithm does not. Therefore, cross-prediction is available only when you use a blend of algorithms, or when you configure the model to use only ARTXP.

Customizing the Microsoft Time Series Algorithm

The Microsoft Time Series algorithm supports several parameters that affect the behavior, performance, and accuracy of the resulting mining model.

Detection of Seasonality

Both algorithms support detection of seasonality or periodicity. Analysis Services uses Fast Fourier transformation to detect seasonality before training.

To achieve the best results, you can control seasonality detection by setting algorithm parameters. By changing the value of AUTODETECT_SEASONALITY, you can influence the number of possible time segments that are generated. By setting a value or multiple values for PERIODICITY_HINT, you can provide the algorithm with information about known repeating periods and increase the accuracy of detection.

Note

Both the ARTXP and ARIMA algorithms are sensitive to seasonality hints. Therefore, providing the wrong hint can adversely affect results.

Blending of Algorithms

By default, Analysis Services combines and equally weights the algorithms. However, in SQL Server 2008 Enterprise, you can customize the blending of the results by selecting the MIXED option and setting a parameter that weights the results towards either the short or the long-term prediction.

If you want to use cross-prediction, you must use either the ARTXP or the MIXED option because ARIMA does not support cross-prediction.

To control the choice of algorithm, you set the FORECAST_METHOD parameter. By default, the FORECAST_METHOD parameter is set to MIXED, and Analysis Services uses both algorithms and then weights their values to maximize the strengths of each algorithm. However, you can set the FORECAST_METHOD to ARTXP if you want to use only the ARTXP algorithm or to ARIMA if you want to use only the ARIMA algorithm.

In SQL Server 2008 Enterprise, you can also customize how Analysis Services mixes the combination of the ARIMA and ARTXP algorithms. You can control both the starting point for the mixture, and the rate of change by setting the PREDICTION_SMOOTHING parameter:

  • If you set PREDICTION_SMOOTHING to 0, the model becomes pure ARTXP.

  • If you set PREDICTION_SMOOTHING to 1, the model becomes pure ARIMA.

  • If you set PREDICTION_SMOOTHING to a value between 0 and 1, the model weights the ARTXP algorithm as an exponentially decreasing function of the prediction steps. At the same time, the model also weights the ARIMA algorithm as the 1-complement of the ARTXP weight. The model uses normalization and a stabilization constant to smooth the curves.

In general, if you predict up to 5 time slices, ARTXP is almost always the better choice. However, as you increase the number of time slices to predict, ARIMA typically performs better.

The following diagram illustrates how the model blends the algorithms when PREDICTION_SMOOTHING is set to the default value, 0.5. ARIMA and ARTXP are weighted equally at first, but as the number of prediction steps increases, ARIMA is weighed more heavily.

default curve for mix of time series algorithms

In contrast, the following diagram illustrates the blending of the algorithms when PREDICTION_SMOOTHING is set to 0.2. For step 0, the model weights ARIMA as 0.2 and ARTXP as 0.8. Thereafter, the weight of ARIMA exponentially increases and the weight of ARTXP exponentially decreases.

decay curve for time series model mixing

Setting Algorithm Parameters

The following table describes the parameters that can be used with the Microsoft Time Series algorithm.

Parameter

Description

AUTO_DETECT_PERIODICITY

Specifies a numeric value between 0 and 1 that detects periodicity. The default is 0.6.

If the value is closer to 0, periodicity is detected only for strongly periodic data.

Setting this value closer to 1 favors the discovery of many patterns that are almost periodic and the automatic generation of periodicity hints.

NoteNote
Dealing with many periodicity hints will likely lead to significantly longer model training times, but more accurate models.

COMPLEXITY_PENALTY

Controls the growth of the decision tree. The default is 0.1.

Decreasing this value increases the chance of a split. Increasing this value decreases the chance of a split.

NoteNote
This parameter is only available in SQL Server Enterprise.

FORECAST_METHOD

Specifies which algorithm to use for analysis and prediction. Possible values are ARTXP, ARIMA, or MIXED. The default is MIXED.

HISTORIC_MODEL_COUNT

Specifies the number of historic models that will be built. The default is 1.

NoteNote
This parameter is only available in SQL Server Enterprise.

HISTORICAL_MODEL_GAP

Specifies the time lag between two consecutive historic models. The default is 10. The value represents a number of time units, where the unit is defined by the model.

For example, setting this value to g causes historic models to be built for data that is truncated by time slices at intervals of g, 2*g, 3*g, and so on.

NoteNote
This parameter is only available in SQL Server Enterprise.

INSTABILITY_SENSITIVITY

Controls the point at which prediction variance exceeds a certain threshold and the ARTXP algorithm suppresses predictions. The default value is 1.

NoteNote
This parameter applies only to ARTXP, and therefore does not apply to models that are created using only the ARIMA algorithm. If applied to a MIXED model, the parameter applies only to the ARTXP portion of the model.

The default value of 1 provides the same behavior for ARTXP models as in SQL Server 2005. Analysis Services monitors the normalized standard deviation for each prediction. As soon as the standardize deviations for any prediction exceeds the threshold, the time series algorithm returns a NULL and stops the prediction process.

A value of 0 stops instability detection. This means that you can create an infinite number of predictions, regardless of the variance.

NoteNote
This parameter can only be modified in SQL Server Enterprise. In SQL Server Standard, Analysis Services uses only the default value of 1.

MAXIMUM_SERIES_VALUE

Specifies the maximum value to use for predictions. This parameter is used, together with MINIMUM_SERIES_VALUE, to constrain the predictions to some expected range. For example, you can specify that the predicted sales quantity for any day should never exceed the number of products in inventory.

NoteNote
This parameter is only available in SQL Server Enterprise.

MINIMUM_SERIES_VALUE

Specifies the minimum value that can be predicted. This parameter is used, together with MAXIMUM_SERIES_VALUE, to constrain the predictions to some expected range. For example, you can specify that the predicted sales quantity should never be a negative number.

NoteNote
This parameter is only available in SQL Server Enterprise.

MINIMUM_SUPPORT

Specifies the minimum number of time slices that are required to generate a split in each time series tree. The default is 10.

MISSING_VALUE_SUBSTITUTION

Specifies how gaps in historical data are filled. By default, gaps in data are not allowed.

The following table lists the possible values for this parameter:

ValueDescription
PreviousRepeats the value from the previous time slice.
MeanUses a moving average of time slices used in training.
Numeric constantUses the specified number to replace all missing values.
NoneReplaces missing values with values plotted along the curve of the trained model. This is the default value.

If your data contains multiple series, the series also cannot have ragged edges. That is, all series should have the same start and end points.

Analysis Services also uses the value of this parameter to fill gaps in new data when you perform a PREDICTION JOIN on time series model.

PERIODICITY_HINT

Provides a hint to the algorithm as to the periodicity of the data. For example, if sales vary by year, and the unit of measurement in the series is months, the periodicity is 12. This parameter takes the format of {n [, n]}, where n is any positive number.

The n in the brackets [] is optional and can be repeated as frequently as needed. For example, to provide multiple periodicity hints for data supplied monthly, you might enter {12, 3, 1} to detect patterns for the year, quarter, and month. However, periodicity has a strong effect on model quality. If the hint that you give differs from the actual periodicity, your results can be adversely affected.

The default is {1}.

NoteNote
The braces are required. Also, this parameter has a string data type. Therefore, if you type this parameter as part of a Data Mining Extensions (DMX) statement, you must enclose the number and braces in quotation marks.

PREDICTION_SMOOTHING

Specifies how the model should be mixed to optimize forecasting. You can type any value between 0 and 1 or use one of the following values:

NoteNote
Use the FORECAST_METHOD parameter to control training.
ValueDescription
0Specifies that prediction uses ARTXP only. Forecasting is optimized for fewer predictions.
1Specifies that prediction uses ARIMA only. Forecasting is optimized for many predictions.
0.5(Default) Specifies that for prediction both algorithms should be used and the results blended.
NoteNote
This parameter is only available in SQL Server Enterprise.

Modeling Flags

The Microsoft Time Series algorithm supports the following modeling flags. When you create the mining structure or mining model, you define modeling flags to specify how values in each column are handled during analysis. For more information, see Modeling Flags (Data Mining).

Modeling Flag

Description

NOT NULL

Indicates that the column cannot contain a null. An error will result if Analysis Services encounters a null during model training.

Applies to mining structure columns.

MODEL_EXISTENCE_ONLY

Means that the column will be treated as having two possible states: Missing and Existing. A null is a missing value.

Applies to mining model columns.

Requirements

A time series model must contain a key time column that contains unique values, input columns, and at least one predictable column.

Input and Predictable Columns

The Microsoft Time Series algorithm supports the specific input column content types, predictable column content types, and modeling flags that are listed in the following table.

Column

Content types

Input attribute

Continuous ,Key, Key Time, and Table

Predictable attribute

Continuous, Table

Note

Cyclical and Ordered content types are supported, but the algorithm treats them as discrete values and does not perform special processing.