What is MicrosoftML?

[アーティクル]
09/22/2017

Important

This content is being retired and may not be updated in the future. The support for Machine Learning Server will end on July 1, 2022. For more information, see What's happening to Machine Learning Server?

MicrosoftML adds state-of-the-art data transforms, machine learning algorithms, and pre-trained models to R and Python functionality. The data transforms provided by MicrosoftML allow you to compose a custom set of transforms in a pipeline that are applied to your data before training or testing. The primary purpose of these transforms is to allow you to format your data.

The MicrosoftML functions are provided through the MicrosoftML package installed with Machine Learning Server, Microsoft R Client, and SQL Server Machine Learning Services.

Functions provide fast and scalable machine learning algorithms that enable you to tackle common machine learning tasks such as classification, regression, and anomaly detection. These high-performance algorithms are multi-threaded, some of which execute off disk, so that they can scale up to 100s of GBs on a single-node. They are especially suitable for handling a large corpus of text data or high-dimensional categorical data. It enables you to run these functions locally on Windows or Linux machines or on Azure HDInsight (Hadoop/Spark) clusters.

Pre-trained models for sentiment analysis and image featurization can also be installed and deployed with MicrosoftML. For more information on the pre-trained models and samples, see R samples for MicrosoftML and Python samples for MicrosoftML.

Match algorithms to machine learning tasks

Matching data transforms and machine learning algorithms to appropriate data science tasks is key to designing successful intelligent applications.

Machine learning tasks

The MicrosoftML package implements algorithms that can perform a variety of machine learning tasks:

binary classification: algorithms that learn to predict which of two classes an instance of data belongs to. These provide supervised learning in which the input of a classification algorithm is a set of labeled examples. Each example is represented as a feature vector, and each label is an integer of value of 0 or 1. The output of a binary classification algorithm is a classifier, which can be used to predict the label of new unlabeled instances.
multi-class classification: algorithms that learn to predict the category of an instance of data. These provide supervised learning in which the input of a classification algorithm is a set of labeled examples. Each example is represented as a feature vector, and each label is an integer between 0 and k-1, where k is the number of classes. The output of a classification algorithm is a classifier, which can be used to predict the label of a new unlabeled instance.
regression: algorithms that learn to predict the value of a dependent variable from a set of related independent variables. Regression algorithms model this relationship to determine how the typical values of dependent variables change as the values of the independent variables are varied. These provide supervised learning in which the input of a regression algorithm is a set of examples with dependent variables of known values. The output of a regression algorithm is a function, which can be used to predict the value of a new data instance whose dependent variables are not known.
anomaly detection: algorithms that identify outliers that do not belong to some target class or conform to an expected pattern. One-class anomaly detection is a type of unsupervised learning as the input data only contains data that is from the target class and does not contain instances of anomalies to learn from.

Machine learning algorithms

The following table summarizes the MicrosoftML algorithms, the tasks they support, their scalability, and lists some example applications.

Algorithm (R/Python)	ML task supported	Scalability	Application Examples
`rxFastLiner()`/ `rx_fast-linear()` Fast Linear model (SDCA)	binary classification, linear regression	#cols: ~1B; #rows: ~1B; CPU: multi-proc	Mortgage default prediction, Email spam filtering
`rxOneClassSvm()`/ `rx_oneclass-svm()` OneClass SVM	anomaly detection	cols: ~1K; #rows: RAM-bound; CPU: single-proc	Credit card fraud detection
`rxFastTrees()`/ `rx_fast-trees()` Fast Tree	binary classification, regression	#cols: ~50K; #rows: RAM-bound; CPU: multi-proc	Bankruptcy prediction
`rxFastForest()`/ `rx_fast-forest()` Fast Forest	binary classification, regression	#cols: ~50K; #rows: RAM-bound; CPU: multi-proc	Churn Prediction
`rxNeuralNet()`/ `rx_neural_network()` Neural Network	binary and multiclass classification, regression	#cols: ~10M; #rows: Inf; CPU: multi-proc CUDA GPU	Check signature recognition, OCR, Click Prediction
`rxLogisticRegression()`/ `rx_logistic-regression()` Logistic regression	binary and multiclass classification	#cols: ~100M; #rows: Inf for single-proc CPU #rows: RAM-bound for multi-proc CPU	Classifying sentiments from feedback

Data transforms

MicrosoftML also provides transforms to help tailor your data for machine learning. They are used to clean, wrangle, train, and score your data. For a description of the transforms, see Machine learning R transforms and Machine learning Python transforms reference documentation.

Next steps

For reference documentation on the R individual transforms and functions in the product help, see MicrosoftML: machine learning algorithms.

For reference documentation on the Python individual transforms and functions in the product help, see MicrosoftML: machine learning algorithms.

For guidance when choosing the appropriate machine learning algorithm from the MicrosoftML package, see the Cheat Sheet: How to choose a MicrosoftML algorithm.

次の方法で共有