Dela via


Customizing a Data Mining Model (Analysis Services - Data Mining)

After you have selected an algorithm that meets your business needs, you can customize the mining model in the following ways to potentially improve results.

  • Use different columns of data in the model, or change the usage or content types of the columns.

  • Create filters on the mining model to restrict the data used in training the model.

  • Set algorithm parameters to control thresholds, tree splits, and other conditions.

  • Change the default algorithm that is used to analyze data or make predictions.

Changing Data Used by the Model

The decisions that you make about which columns of data to use in the model, and how to use and process that data, can greatly affect the results of analysis. The following topics provide information to help you understand these choices.

  • Mining Models (Analysis Services - Data Mining)

    Provides an overview of the architecture of a mining model, including the underlying mining structure and the choice of mining columns.

  • Creating Filters for Mining Models (Analysis Services - Data Mining)

    Explains how you can create filters that apply to a mining model, in order to create models based on a subset of the mining structure data.

  • Feature Selection in Data Mining.

    Explains how Analysis Services uses a process called feature selection to select only the most useful attributes for addition to a model. Reducing the number of columns and attributes can improve performance and the quality of the model. The feature selection methods that are available differ depending on the algorithm that you choose.

If you use the Data Mining wizard, you can also have Analysis Services automatically select the data that is most useful for building a particular model.

Customizing Algorithm Settings

The choice of algorithm determines what kind of results you will get. For general information about how a specific algorithm works, or the business scenarios where you would benefit from using a particular algorithm, see Data Mining Algorithms (Analysis Services - Data Mining).

The data mining algorithms provided in Analysis Services are also extensively customizable. You can control the behavior of the algorithm and how it processes data by setting algorithm parameters. The following topics provide detailed information about the parameters that each algorithm supports.

Microsoft Decision Trees Algorithm Technical Reference

Microsoft Clustering Algorithm Technical Reference

Microsoft Naive Bayes Algorithm Technical Reference

Microsoft Association Algorithm Technical Reference

Microsoft Sequence Clustering Algorithm Technical Reference

Microsoft Neural Network Algorithm Technical Reference

Microsoft Logistic Regression Algorithm Technical Reference

Microsoft Linear Regression Algorithm Technical Reference

Microsoft Time Series Algorithm Technical Reference

The topic for each algorithm type also lists the prediction functions that can be used with models based on that algorithm.

List of Algorithm Parameters

Each algorithm supports parameters that you can use to customize the behavior of the algorithm and fine-tune the results of your model. For a description of how to use each parameter, see the following topics:

Property name

Applies to

AUTO_DETECT_PERIODICITY

Microsoft Time Series Algorithm Technical Reference

CLUSTER_COUNT

Microsoft Clustering Algorithm Technical Reference

Microsoft Sequence Clustering Algorithm Technical Reference

CLUSTER_SEED

Microsoft Clustering Algorithm Technical Reference

CLUSTERING_METHOD

Microsoft Clustering Algorithm Technical Reference

COMPLEXITY_PENALTY

Microsoft Decision Trees Algorithm Technical Reference

Microsoft Time Series Algorithm Technical Reference

FORCED_REGRESSOR

Microsoft Decision Trees Algorithm Technical Reference

Microsoft Linear Regression Algorithm Technical Reference

FORECAST_METHOD

Microsoft Time Series Algorithm Technical Reference

HIDDEN_NODE_RATIO

Microsoft Neural Network Algorithm Technical Reference

HISTORIC_MODEL_COUNT

Microsoft Time Series Algorithm Technical Reference

HISTORICAL_MODEL_GAP

Microsoft Time Series Algorithm Technical Reference

HOLDOUT_PERCENTAGE

Microsoft Logistic Regression Algorithm Technical Reference

Microsoft Neural Network Algorithm Technical Reference

NoteNote
This parameter is different from the holdout percentage value that applies to a mining structure.

HOLDOUT_SEED

Microsoft Logistic Regression Algorithm Technical Reference

Microsoft Neural Network Algorithm Technical Reference

NoteNote
This parameter is different from the holdout seed value that applies to a mining structure.

INSTABILITY_SENSITIVITY

Microsoft Time Series Algorithm Technical Reference

MAXIMUM_INPUT_ATTRIBUTES

Microsoft Clustering Algorithm Technical Reference

Microsoft Decision Trees Algorithm Technical Reference

Microsoft Linear Regression Algorithm Technical Reference

Microsoft Naive Bayes Algorithm Technical Reference

Microsoft Neural Network Algorithm Technical Reference

Microsoft Logistic Regression Algorithm Technical Reference

MAXIMUM_ITEMSET_COUNT

Microsoft Association Algorithm Technical Reference

MAXIMUM_ITEMSET_SIZE

Microsoft Association Algorithm Technical Reference

MAXIMUM_OUTPUT_ATTRIBUTES

Microsoft Decision Trees Algorithm Technical Reference

Microsoft Linear Regression Algorithm Technical Reference

Microsoft Logistic Regression Algorithm Technical Reference

Microsoft Naive Bayes Algorithm Technical Reference

Microsoft Neural Network Algorithm Technical Reference

MAXIMUM_SEQUENCE_STATES

Microsoft Sequence Clustering Algorithm Technical Reference

MAXIMUM_SERIES_VALUE

Microsoft Time Series Algorithm Technical Reference

MAXIMUM_STATES

Microsoft Clustering Algorithm Technical Reference

Microsoft Neural Network Algorithm Technical Reference

Microsoft Sequence Clustering Algorithm Technical Reference

MAXIMUM_SUPPORT

Microsoft Association Algorithm Technical Reference

MINIMUM_IMPORTANCE

Microsoft Association Algorithm Technical Reference

MINIMUM_ITEMSET_SIZE

Microsoft Association Algorithm Technical Reference

MINIMUM_DEPENDENCY_PROBABILITY

Microsoft Naive Bayes Algorithm Technical Reference

MINIMUM_PROBABILITY

Microsoft Association Algorithm Technical Reference

MINIMUM_SERIES_VALUE

Microsoft Time Series Algorithm Technical Reference

MINIMUM_SUPPORT

Microsoft Association Algorithm Technical Reference

Microsoft Clustering Algorithm Technical Reference

Microsoft Decision Trees Algorithm Technical Reference

Microsoft Sequence Clustering Algorithm Technical Reference

Microsoft Time Series Algorithm Technical Reference

MISSING_VALUE_SUBSTITUTION

Microsoft Time Series Algorithm Technical Reference

MODELLING_CARDINALITY

Microsoft Clustering Algorithm Technical Reference

PERIODICITY_HINT

Microsoft Time Series Algorithm Technical Reference

PREDICTION_SMOOTHING

Microsoft Time Series Algorithm Technical Reference

SAMPLE_SIZE

Microsoft Clustering Algorithm Technical Reference

Microsoft Logistic Regression Algorithm Technical Reference

Microsoft Neural Network Algorithm Technical Reference

SCORE_METHOD

Microsoft Decision Trees Algorithm Technical Reference

SPLIT_METHOD

Microsoft Decision Trees Algorithm Technical Reference

STOPPING_TOLERANCE

Microsoft Clustering Algorithm Technical Reference

Additional Requirements

Choosing and preparing data is an important part of the data mining process. For example, the algorithms that Microsoft provides do not allow duplicate keys. The type of data that is required for each model differs depending on the algorithm. For more information, see the Requirements section of the following topics:

Customizing Results by using Queries and Prediction Functions

After the model has been built and processed, you can view the information by using one of the viewers specific to each model type. Alternatively, you can write custom queries by using Data Mining Extensions (DMX) to obtain more advanced or detailed information about the patterns found in the data.

For information about how to create queries that return model content, see Querying Data Mining Models (Analysis Services - Data Mining).

You can use functions to extend the results that a mining model returns. Some functions also return statistics that represent the probability of an outcome, or other scores. In addition, individual algorithms also support additional functions. For example, if a mining model uses clustering, you can use special functions to find information about the clusters. However, if your model is based on the Time Series algorithm, a different set of functions is available for making predictions and querying model content. For more information, see the Technical Reference Topic for each algorithm.

For examples of how to query a mining model and how to use the prediction functions that are designed for specific model types, see Querying Data Mining Models (Analysis Services - Data Mining).

For a list of prediction functions that are supported for all algorithm types, see Mapping Functions to Query Types (DMX).

Assessing Changes in a Model

When you experiment with different models to solve a business problem, or build variations on a model, you need to measure the accuracy of each model and also evaluate how well each model answers the business problem. For general information about evaluating data mining models, see Validating Data Mining Models (Analysis Services - Data Mining). For more information about how to chart the accuracy of different mining models, seeTools for Charting Model Accuracy (Analysis Services - Data Mining).