Partilhar via


CREATE MINING MODEL (DMX)

Creates both a new mining model and a mining structure in the database. You can create a model either by defining the new model in the statement, or by using the Predictive Model Markup Language (PMML). This second option is for advanced users only.

The mining structure is named by appending "_structure" to the model name, which insures that the structure name is unique from the model name.

To create a mining model for an existing mining structure, use the ALTER MINING STRUCTURE (DMX) statement.

Syntax

CREATE [SESSION] MINING MODEL <model>
(
    [(<column definition list>)]
)
USING <algorithm> [(<parameter list>)] [WITH DRILLTHROUGH]
CREATE MINING MODEL <model> FROM PMML <xml string>

Arguments

  • model
    A unique name for the model.
  • column definition list
    A comma-separated list of column definitions.
  • algorithm
    The provider-defined name of a data mining provider.
  • parameter list
    Optional. A comma-separated list of provider-defined parameters for the algorithm.
  • XML string
    (For advanced use only.) An XML-encoded model (PMML). The string must be enclosed in single quotation marks (').

Remarks

The SESSION clause lets you create a mining model that is automatically removed from the server when the connection closes or the session times out. SESSION mining models are useful because they do not require the user to be a database administrator, and they only use disk space for as long as the connection is open.

The WITH DRILLTHROUGH clause enables drill through on the new mining model. Drill through can only be enabled when you create the model.

The CREATE MINING MODEL statement creates a new mining model that is based on the column definition list, the algorithm, and the algorithm parameter list.

Column Definition List

You define the structure of a model that uses the column definition list by including the following information for each column:

  • Name (mandatory)
  • Data type (mandatory)
  • Distribution
  • List of modeling flags
  • Content type (mandatory)
  • Prediction request, which indicates to the algorithm to predict this column, indicated by the PREDICT or PREDICT_ONLY clause
  • Relationship to an attribute column (mandatory only if it applies), indicated by the RELATED TO clause

Use the following syntax for the column definition list, to define a single column:

<column name>    <data type>    [<Distribution>]    [<Modeling Flags>]    <Content Type>    [<prediction>]    [<column relationship>] 

Use the following syntax for the column definition list, to define a nested table column:

<column name>    TABLE    [<prediction>] ( <non-table column definition list> )

Except for modeling flags, you can use no more than one clause from a particular group to define a column. You can define multiple modeling flags for a column.

For a list of the data types, content types, column distributions, and modeling flags that you can use to define a column, see the following topics:

You can add a clause to the statement to describe the relationship between two columns. Analysis Services supports using the following <Column relationship> clause.

  • RELATED TO
    This form indicates a value hierarchy. The target of a RELATED TO column can be a key column in a nested table, a discretely-valued column in the case row, or another column with a RELATED TO clause, which indicates a deeper hierarchy.

Use a prediction clause to describe how the prediction column is used. The following table describes the two possible clauses.

<prediction> clause Description

PREDICT

This column can be predicted by the model, and it can be supplied in input cases to predict the value of other predictable columns.

PREDICT_ONLY

This column can be predicted by the model, but its values cannot be used in input cases to predict the value of other predictable columns.

Parameter Definition List

You can use the parameter list to adjust the performance and functionality of a mining model. The syntax of the parameter list is as follows:

[<parameter> = <value>, <parameter> = <value>,…]

For a list of the parameters that are associated with each algorithm, see Data Mining Algorithms.

Naive Bayes Example

The following example uses the Microsoft Naive Bayes algorithm to create a new mining model. The Bike Buyer column is defined as the predictable attribute.

CREATE MINING MODEL [NBSample]
(
    CustomerKey LONG KEY, 
    Gender TEXT DISCRETE,
    [Number Cars Owned] LONG DISCRETE,
    [Bike Buyer] LONG DISCRETE PREDICT
)
Using Microsoft_Naive_Bayes

Association Model Example

The following example uses the Microsoft Association algorithm to create a new mining model. The statement takes advantage of the ability to nest a table inside the model definition by using a table column. The model is modified by using the Minimum_Probability and Minimum_Support parameters.

CREATE MINING MODEL MyAssociationModel (
    OrderNumber TEXT KEY,
    [Products] TABLE PREDICT (
        [Model] TEXT KEY
    )
)
Using Microsoft_Association_Rules (Minimum_Probability = 0.1, Minimum_Support = 0.01)

Sequence Clustering Example

The following example uses the Microsoft Sequence Clustering algorithm to create a new mining model. Two keys are used to define the model. Order Number specifies individual orders. Line Number specifies the sequence in which items were added to an order.

CREATE MINING MODEL BuyingSequence (
    [Order Number] TEXT KEY,
    [Products] TABLE (
        [Line Number] LONG KEY SEQUENCE,
        [Model] TEXT DISCRETE PREDICT
)
)
Using Microsoft_Sequence_Clustering

Time Series Example

The following example uses the Microsoft Times Series algorithm to create a new mining model. Because a time series model can contain multiple keys, both TimeIndex and ModelRegion are designated to be key columns. In this example, it is assumed that the periodicity of the data is every 12 months, so the PERIODICITY_HINT parameter is set to 12.

Note

You must specify the PERIODICITY_HINT parameter by using brace characters, "{}".

CREATE MINING MODEL SalesForcast (
        TimeIndex LONG KEY TIME,
        ModelRegion TEXT KEY,
        Amount LONG CONTINUOUS PREDICT,
        Quantity LONG CONTINUOUS PREDICT
)
Using Microsoft_Time_Series (PERIODICITY_HINT = '{12}')

See Also

Reference

Data Mining Extensions (DMX) Data Definition Statements
Data Mining Extensions (DMX) Data Manipulation Statements
Data Mining Extensions (DMX) Statement Reference

Help and Information

Getting SQL Server 2005 Assistance