Mining Model Content for Logistic Regression Models (Analysis Services - Data Mining)
This topic describes mining model content that is specific to models that use the Microsoft Logistic Regression algorithm. For an explanation of how to interpret statistics and structure shared by all model types, and general definitions of terms related to mining model content, see Mining Model Content (Analysis Services - Data Mining).
Understanding the Structure of a Logistic Regression Model
A logistic regression model is created by using the Microsoft Neural Network algorithm with parameters that constrain the model to eliminate the hidden node. Therefore, the overall structure of a logistic regression model is almost identical to that of a neural network: each model has a single parent node that represents the model and its metadata, and a special marginal statistics node (NODE_TYPE = 24) that provides descriptive statistics about the inputs used in the model.
Additionally, the model contains a subnetwork (NODE_TYPE = 17) for each predictable attribute. Just like in a neural network model, each subnetwork always contains two branches: one for the input layer, and another branch that contains the hidden layer (NODE_TYPE = 19) and the output layer (NODE_TYPE = 20) for the network. The same subnetwork may be used for multiple attributes if they are specified as predict-only. Predictable attributes that are also inputs may not appear in the same subnetwork.
However, in a logistic regression model, the node that represents the hidden layer is empty, and has no children. Therefore the model contains nodes that represent individual outputs (NODE_TYPE = 23) and individual inputs (NODE_TYPE = 21) but no individual hidden nodes.
By default, a logistic regression model is displayed in the Microsoft Neural Network Viewer. With this custom viewer, you can filter on input attributes and their values, and graphically see how they affect the outputs. The tooltips in the viewer show you the probability and lift associated with each pair of inputs and output values. For more information, see Viewing a Mining Model with the Microsoft Neural Network Viewer.
To explore the structure of the inputs and subnetworks, and to see detailed statistics, you can use the Microsoft Generic Content Tree viewer. You can click on any node to expand it and see the child nodes, or view the weights and other statistics contained in the node.
Model Content for a Logistic Regression Model
This section provides detail and examples only for those columns in the mining model content that have particular relevance for logistic regression. The model content is almost identical to that of a neural network model, but descriptions that apply to neural network models may be repeated in this table for convenience.
For information about general-purpose columns in the schema rowset, such as MODEL_CATALOG and MODEL_NAME, that are not described here, or for explanations of mining model terminology, see Mining Model Content (Analysis Services - Data Mining).
MODEL_CATALOG
Name of the database where the model is stored.MODEL_NAME
Name of the model.ATTRIBUTE_NAME
The names of the attribute that corresponds to this node.Node
Content
Model root
Blank
Marginal statistics
Blank
Input layer
Blank
Input node
Input attribute name
Hidden layer
Blank
Output layer
Blank
Output node
Output attribute name
NODE_NAME
The name of the node. Currently, this column contains the same value as NODE_UNIQUE_NAME, though this may change in future releases.NODE_UNIQUE_NAME
The unique name of the node.For more information about how the names and IDs provide structural information about the model, see the section, Using Node Names and IDs.
NODE_TYPE
A logistic regression model outputs the following node types:Node Type ID
Description
1
Model.
17
Organizer node for the subnetwork.
18
Organizer node for the input layer.
19
Organizer node for the hidden layer. The hidden layer is empty.
20
Organizer node for the output layer.
21
Input attribute node.
23
Output attribute node.
24
Marginal statistics node.
NODE_CAPTION
A label or a caption associated with the node. In logistic regression models, always blank.CHILDREN_CARDINALITY
An estimate of the number of children that the node has.Node
Content
Model root
Indicates the count of child nodes, which includes at least 1 network, 1 required marginal node, and 1 required input layer. For example, if the value is 5, there are 3 subnetworks.
Marginal statistics
Always 0.
Input layer
Indicates the number of input attribute-values pairs that were used by the model.
Input node
Always 0.
Hidden layer
In a logistic regression model, always 0.
Output layer
Indicates the number of output values.
Output node
Always 0.
PARENT_UNIQUE_NAME
The unique name of the node's parent. NULL is returned for any nodes at the root level.For more information about how the names and IDs provide structural information about the model, see the section, Using Node Names and IDs.
NODE_DESCRIPTION
A user-friendly description of the node.Node
Content
Model root
Blank
Marginal statistics
Blank
Input layer
Blank
Input node
Input attribute name
Hidden layer
Blank
Output layer
Blank
Output node
If the output attribute is continuous, contains the name of the output attribute.
If the output attribute is discrete or discretized, contains the name of the attribute and the value.
NODE_RULE
An XML description of the rule that is embedded in the node.Node
Content
Model root
Blank
Marginal statistics
Blank
Input layer
Blank
Input node
An XML fragment containing the same information as the NODE_DESCRIPTION column.
Hidden layer
Blank
Output layer
Blank
Output node
An XML fragment containing the same information as the NODE_DESCRIPTION column.
MARGINAL_RULE
For logistic regression models, always blank.NODE_PROBABILITY
The probability associated with this node. For logistic regression models, always 0.MARGINAL_PROBABILITY
The probability of reaching the node from the parent node. For logistic regression models, always 0.NODE_DISTRIBUTION
A nested table that contains statistical information for the node. For detailed information about the contents of this table for each node type, see the section, Understanding the NODE_DISTRIBUTION Table, in Mining Model Content for Neural Network Models (Analysis Services - Data Mining).NODE_SUPPORT
For logistic regression models, always 0.Note
Support probabilities are always 0 because the output of this model type is not probabilistic. The only thing that is meaningful for the algorithm is the weights; therefore, the algorithm does not compute probability, support, or variance.
To get information about the support in the training cases for specific values, see the marginal statistics node.
MSOLAP_MODEL_COLUMN
Node
Content
Model root
Blank
Marginal statistics
Blank
Input layer
Blank
Input node
Input attribute name.
Hidden layer
Blank
Output layer
Blank
Output node
Input attribute name.
MSOLAP_NODE_SCORE
In logistic regression models, always 0.MSOLAP_NODE_SHORT_CAPTION
In logistic regression models, always blank.
Using Node Names and IDs
The naming of the nodes in a logistic regression model provides additional information about the relationships between nodes in the model. The following table shows the conventions for the IDs that are assigned to nodes in each layer.
Node Type |
Convention for node ID |
---|---|
Model root (1) |
00000000000000000. |
Marginal statistics node (24) |
10000000000000000 |
Input layer (18) |
30000000000000000 |
Input node (21) |
Starts at 60000000000000000 |
Subnetwork (17) |
20000000000000000 |
Hidden layer (19) |
40000000000000000 |
Output layer (20) |
50000000000000000 |
Output node (23) |
Starts at 80000000000000000 |
You can use these IDs to determine how output attributes are related to specific input layer attributes, by viewing the NODE_DISTRIBUTION table of the output node. Each row in that table contains an ID that points back to a specific input attribute node. The NODE_DISTRIBUTION table also contains the coefficient for that input-output pair.