Microsoft Linear Regression Algorithm
The Microsoft Linear Regression algorithm is a variation of the Microsoft Decision Trees algorithm, where the MINIMUM_LEAF_CASES parameter is set to be greater than or equal to the total number of cases in the dataset that the algorithm uses to train the mining model. With the parameter set in this way, the algorithm will never create a split, and therefore performs a linear regression.
You can use linear regression to determine a relationship between two continuous columns. The relationship takes the form of an equation for a line that best represents a series of data. For example, the line in the following diagram is the best possible linear representation of the data.
The equation that represents the line in the diagram takes the general form of y = ax + b, and is known as the regression equation. The variable Y represents the output variable, X represents the input variable, and a and b are adjustable coefficients. Each data point in the diagram has an error associated with its distance from the regression line. The coefficients a and b in the regression equation adjust the angle and location of the regression line. You can obtain the regression equation by adjusting a and b until the sum of the errors that are associated with points reaches the lowest number.
Using the Algorithm
Use the Microsoft Tree Viewer to explore a linear regression mining model.
A linear regression model must contain a key column, input columns, and at least one predictable column.
The Microsoft Linear Regression algorithm supports specific input column content types, predictable column content types, and modeling flags, which are listed in the following table.
Input column content types |
Continuous ,Cyclical, Key, Table, and Ordered |
Predictable column content types |
Continuous, Cyclical, and Ordered |
Modeling flags |
NOT NULL and REGRESSOR |
All Microsoft algorithms support a common set of functions. However, the Microsoft Linear Regression algorithm supports additional functions, listed in the following table.
|
For a list of the functions that are common to all Microsoft algorithms, see Data Mining Algorithms. For more information about how to use these functions, see Data Mining Extensions (DMX) Function Reference.
The Microsoft Linear Regression algorithm supports several parameters that affect the performance and accuracy of the resulting mining model. The following table describes each parameter.
Parameter | Description |
---|---|
MAXIMUM_INPUT_ATTRIBUTES |
Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. The default is 255. |
MAXIMUM_OUTPUT_ATTRIBUTES |
Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. The default is 255. |
FORCED_REGRESSOR |
Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm. |
See Also
Concepts
Data Mining Algorithms
Data Mining Wizard
Feature Selection in Data Mining
Viewing a Mining Model with the Microsoft Tree Viewer