Creating a Targeted Mailing Mining Model Structure (Data Mining Tutorial)
The first step in creating a targeted mailing scenario is to use the Data Mining Wizard in Business Intelligence Development Studio to create a new mining structure and decision tree mining model.
For More Information
Data Mining Wizard, Data Mining Designer, Microsoft Decision Trees Algorithm
To create a mining structure for a targeted mailing scenario
In Solution Explorer, right-click Mining Structures and select New Mining Structure.
The Data Mining Wizard opens.
On the Welcome to the Data Mining Wizard page, click Next.
On the Select the Definition Method page, verify that From existing relational database or data warehouse is selected, and then click Next.
On the Select the Data Mining Technique page, under Which data mining technique do you want to use?, select Microsoft Decision Trees.
In this tutorial you will create several models based on this initial mining structure. The first model will be created together with the structure when you complete the wizard, and will be based on the Microsoft Decision Trees algorithm.
Click Next.
On the Select Data Source View page, notice that Adventure Works DW is selected by default. Click Browse to view the tables in the data source view, and then click Close to return to the wizard.
Click Next.
On the Specify Table Types page, select the check box in the Case column next to the vTargetMail table, and then click Next.
On the Specify the Training Data page, verify that the check box in the Key column is selected next to the CustomerKey column.
If the source table from the data source view indicates a key, the Data Mining Wizard automatically chooses that column as a key for the model.
Select Input and Predictable next to the BikeBuyer column.
When you indicate that a column is predictable, the Suggest button is enabled. Clicking Suggest opens the Suggest Related Columns dialog box, which lists the columns that are most closely related to the predictable column.
The Suggest Related Columns dialog box orders the attributes by their correlation with the predictable attribute. Columns with a value greater than 0.05 are automatically selected to be included in the model. If you agree with the suggestions, click OK, which marks the selected columns as input columns in the wizard. For this tutorial, ignore the suggestions by clicking Cancel.
Select the Input check boxes next to the following columns:
- Age
- CommuteDistance
- EnglishEducation
- EnglishOccupation
- FirstName
- Gender
- GeographyKey
- HouseOwnerFlag
- LastName
- MaritalStatus
- NumberCarsOwned
- NumberChildrenAtHome
- Region
- TotalChildren
- YearlyIncome
You can select multiple columns by using the SHIFT key.
Click Next.
On the Specify Columns' Content and Data Type page, click Detect.
An algorithm runs that samples numeric data and determines whether the numeric columns contain continuous or discrete values. For example, a column can contain salary information as actual salary values, which are continuous, or it can contain integers that represent encoded salary ranges, such as 1 = < $25,000; 2 = from $25,000 to $50,000, which are discrete.
After clicking Detect, make sure that the entries in the Content Type and Data Type columns have the settings listed in the following table.
Column Content Type Data Type Age
Continuous
Long
BikeBuyer
Discrete
Long
CommuteDistance
Discrete
Text
CustomerKey
Key
Long
EnglishEducation
Discrete
Text
EnglishOccupation
Discrete
Text
FirstName
Discrete
Text
Gender
Discrete
Text
GeographyKey
Discrete
Text
HouseOwnerFlag
Discrete
Text
LastName
Discrete
Text
MaritalStatus
Discrete
Text
NumberCarsOwned
Discrete
Long
NumberChildrenAtHome
Discrete
Long
Region
Discrete
Text
TotalChildren
Discrete
Long
YearlyIncome
Continuous
Double
Note
Based solely on the numeric values, the Data Mining algorithm suggests that the GeographyKey column contains continuous numbers. However, numbers such as postal codes typically should be treated as discrete, rather than continuous numeric values, because mathematical operations using these numbers are meaningless.
- Click Next.
- On the Completing the Wizard page, in Mining structure name, type Targeted Mailing.
- In Mining model name, type TM_Decision_Tree.
- Select the Allow drill through check box.
- Click Finish.