다음을 통해 공유


Understanding Classification Algorithms in Azure ML

Introduction

Machine learning is study of how to automatically learn to make accurate predictions based on past observations and historical structured or unstructured data. Classification problems are to classify examples into given set of categories.

                                                     

Classification

Examples of classification problems: Text categorization, fraud detection, optical character recognition, machine vision, face detection, bioinformatics etc.

Classification Example: Good vs Evil

 

Sex

Mask

Cape

Tie

Ears

Smokes

Class

 

Training Data

 

Batman

Male

Yes

Yes

No

Yes

No

Good

Robin

Male

Yes

Yes

No

No

No

Good

Alfred

Male

No

No

Yes

No

No

Good

Penguin

Male

No

No

Yes

No

Yes

Bad

Catwoman

Female

Yes

No

No

Yes

No

Bad

Joker

Male

No

No

No

No

No

Bad

 

Test data

 

Batgirl

Female

Yes

Yes

No

yes

No

???

Riddler

Male

Yes

No

No

No

No

???

A Decision Tree Classifier-

Classifier

Classification algorithms

When the data are being used to predict a category, supervised learning is also called classification. This is the case when assigning an image as a picture of either a 'Tiger' or a 'Elephant'. When there are only two options, this is called two-class or binomial classification. When there are more classes, as when predicting the winner of the IPL match tournament, this problem is known as multi-class classification. In Azure Machine Learning studio there are types of Classification algorithms available by Microsoft research team like multiclass decision jungle,  multiclass decision forest, multiclass logistic regression, Two class decision jungle, two class decision forest, two class average perceptron, one vs all multicast etc.

When you observe and visualize evaluation model of classification machine learning example there is so many parameters visible like True Positive False positive true negative false negative, Accuracy of algorithm, precision, recall, f1 score, lift, ROC graph, AUC – Area under cover, throughput. When you do not have algorithm knowledge you should compare various parameters algorithm wise and decide the final algorithm to train your model in classification example.

e.g.

•             True positive: Sick people correctly diagnosed as sick

•             False positive: Healthy people incorrectly identified as sick

•             True negative: Healthy people correctly identified as healthy

•             False negative: Sick people incorrectly identified as healthy

In general, Positive = identified and negative = rejected. Therefore:

•             True positive = correctly identified

•             False positive = incorrectly identified

•             True negative = correctly rejected

•             False negative = incorrectly rejected

You can use Evaluate Model to measure the accuracy of a train classification model or train regression model. Given a set of input column and scores, the Evaluate Model module computes a set of industry-standard evaluation metrics. The metrics that are used depend on the type of model that you are going to evaluate –

  • For classification models, Evaluate Model generates accuracy, precision, recall, F-Score, AUC etc
  • For regression models, Evaluate Model generates mean absolute error, root mean squared error, relative absolute error, and relative squared error metrics.

See Also

Another important place to find an extensive amount of Cortana Intelligence Suite related articles is the TechNet Wiki itself. The best entry point is Cortana Intelligence Suite Resources on the TechNet Wiki.