Share via


Data Mining Queries

Data mining queries are useful for many purposes. You can:

  • Apply the model to new data, to make single or multiple predictions. You can provide input values as parameters, or in a batch.

  • Get a statistical summary of the data used for training.

  • Extract patterns and rules, or generate a profile of the typical case representing a pattern in the model.

  • Extract regression formulas and other calculations that explain patterns.

  • Get the cases that fit a particular pattern.

  • Retrieve details about individual cases used in the model, including data not used in analysis.

  • Retrain a model by adding new data, or perform cross-prediction.

This section provides an overview of the information you need to get started with data mining queries. It describes the types of queries you can create against data mining objects, introduces the query tools and query languages, and provides links to examples of queries that you can create against models that were built using the algorithms provided in SQL Server Data Mining.

Understanding Data Mining Queries

Query Tools and Interfaces

Queries for Different Model Types

Requirements

Understanding Data Mining Queries

Analysis Services Data Mining supports the following types of queries:

Before you create queries, we recommend that you familiarize yourself with the differences between models created with each of the data mining algorithms provided by SQL Server.

Query Tools and Interfaces

You can build data mining queries interactively by using one of the query tools provided by SQL Server. The graphical Prediction Query Builder is provided in both SQL Server Data Tools (SSDT) and SQL Server Management Studio. If you have not used the Prediction Query Builder before, we recommend that you follow the steps in the Basic Data Mining Tutorial to familiarize yourself with the interface. For q quick overview of the steps, see Create a Query using the Create a Prediction Query Using the Prediction Query Builder.

The Prediction Query Builder is helpful for starting queries that you will customize later. You can easily add data sources and map them to columns, and then switch to DMX view and customize the query by adding a WHERE clause or other functions.

Once you are familiar with data mining models and how to build queries, you can also write queries directly by using Data Mining Extensions (DMX). DMX is a query language that is similar to Transact-SQL, and that you can use from many different clients. DMX is the tool of choice for creating both custom predictions and complex queries. For an introduction to DMX, see Creating and Querying Data Mining Models with DMX: Tutorials (Analysis Services - Data Mining).

DMX editors are provided in both SQL Server Data Tools (SSDT) and SQL Server Management Studio. You can also use the Prediction Query Builder to start your queries, then change the view to the text editor and copy the DMX statement to another client. For more information, see Data Mining Query Interfaces.

You can compose DMX statements programmatically and send them from your client to the Analysis Services server by using AMO or XMLA. However, DMX is the language that you must use to create queries against a mining model.

You can also query the metadata, statistics, and some content of the model by using Dynamic Management Views (DMVs) that are based on the data mining schema rowsets. These DMVs make it easy to retrieve information about the model by typing SELECT statements; however, you cannot create predictions. For more information about DMVs supported by Analysis Services, see Use Dynamic Management Views (DMVs) to Monitor Analysis Services.

Finally, you can create data mining queries for use in Integration Services packages, by using the Data Mining Query Task, or the Data Mining Query Transformation. The control flow task supports multiple types of DMX queries, whereas the data flow transformation supports only queries that work with data in the data flow, meaning queries that use the PREDICTION JOIN syntax.

Queries for Different Model Types

The algorithm that was used when the model was created greatly influences the type of information that you can get from a data mining query. The reason for the differences is that each algorithm processes the data in a different way, and stores different kinds of patterns. For example, some algorithms create clusters; others create trees. Therefore, you might need to use specialized prediction and query functions, depending on the type of model that you are working with.

The following list provides a summary of the functions that you can use in queries:

  • General prediction functions: The Predict function is polymorphic, meaning it works with all model types. This function will automatically detect the type of model you are working with and prompt you for additional parameters. For more information, see Predict (DMX).

    Warning

    Not all models are used to make predictions. For example, you can create a clustering model that does not have a predictable attribute. However, even if a model does not have a predictable attribute, you can create prediction queries that return other types of useful information from the model.

  • Custom prediction functions: Each model type provides a set of prediction functions designed for working with the patterns created by that algorithm.

    For example, the Lag function is provided for time series models, to let you view the historical data used for the model. For clustering models, functions such as ClusterDistance are more meaningful.

    For more information about the functions that are supported for each model type, see the following links:

    Association Model Query Examples Microsoft Naive Bayes Algorithm
    Clustering Model Query Examples Neural Network Model Query Examples
    Decision Trees Model Query Examples Sequence Clustering Model Query Examples
    Linear Regression Model Query Examples Time Series Model Query Examples
    Logistic Regression Model Query Examples

    You can also call VBA functions, or create your own functions. For more information, see Functions (DMX).

  • General statistics: There are a number of functions that can be used with almost any model type, which return a standard set of descriptive statistics, such as standard deviation.

    For example, the PredictHistogram function returns a table that lists all the states of the specified column.

    For more information, see General Prediction Functions (DMX).

  • Custom statistics: Additional supporting functions are provided for each model type, to generate statistics that are relevant to the specific analytical task.

    For example, when you are working with a clustering model, you can use the function, PredictCaseLikelihood, to return the likelihood score associated with a certain case and cluster. However, if you created a linear regression model, you would be more interested in retrieving the coefficient and intercept, which you can do using a content query.

  • Model content functions: The content of all models is represented in a standardized format that lets you retrieve information with a simple query. You create queries on the model content by using DMX. You can also get some type of model content by using the data mining schema rowsets.

    In the model content, the meaning of each row or node of the table that is returned differs depending on the type of algorithm that was used to build the model, as well as the data type of the column. For more information, see Content Queries (Data Mining).

Requirements

Before you can create a query against a model, the data mining model must have been processed. Processing of Analysis Services objects requires special permissions. For more information on processing mining models, see Processing Requirements and Considerations (Data Mining).

To execute queries against a data mining model requires different levels of permissions, depending on the type of query that you run. For example, drillthrough to case or structure data typically requires additional permissions which can be set on the mining structure object or mining model object.

However, if your query uses external data, and includes statements such as OPENROWSET or OPENQUERY, the database that you are querying must enable these statements, and you must have permission on the underlying database objects.

For more information on the security contexts required to run data mining queries, see Security Overview (Data Mining)

In This Section

The topics in this section introduce each type of data mining query in more detail, and provide links to detailed examples of how to create queries against data mingin models.

Prediction Queries (Data Mining)

Content Queries (Data Mining)

Drillthrough Queries (Data Mining)

Data Definition Queries (Data Mining)

Data Mining Query Interfaces

Use these links to learn how to create and work with data mining queries.

Tasks Links
View tutorials and walkthroughs on data mining queries Lesson 6: Creating and Working with Predictions (Basic Data Mining Tutorial)

Time Series Prediction DMX Tutorial
Use data mining query tools in SQL Server Management Studio and SQL Server Data Tools (SSDT) Create a DMX Query in SQL Server Management Studio

Create a Prediction Query Using the Prediction Query Builder

Apply Prediction Functions to a Model

Manually Edit a Prediction Query
Work with external data used in prediction queries Choose and Map Input Data for a Prediction Query

Choose and Map Input Data for a Prediction Query
Work with the results of queries View and Save the Results of a Prediction Query
Use DMX and XMLA query templates provided in Management Studio Create a Singleton Prediction Query from a Template

Create a Data Mining Query by Using XMLA

Use Analysis Services Templates in SQL Server Management Studio
Learn more about content queries and see examples Create a Content Query on a Mining Model

Query the Parameters Used to Create a Mining Model

Content Queries (Data Mining)
Set query options and troubleshoot query permissions and problems Change the Time-out Value for Data Mining Queries
Use the data mining components in Integration Services Data Mining Query Task

Data Mining Query Transformation

See Also

Data Mining Algorithms (Analysis Services - Data Mining)
Mining Model Content (Analysis Services - Data Mining)