Statistical Functions
Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.
- See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
- Learn more about Azure Machine Learning.
ML Studio (classic) documentation is being retired and may not be updated in the future.
This article describes the modules in Machine Learning Studio (classic) that support mathematical and statistical operations critical for machine learning. If you need to perform tasks such as the following in your experiment, look in the Statistical Functions category:
- Perform ad hoc computations on column values, such as rounding or using an absolute value.
- Compute means, logarithms, and other statistics commonly used in machine learning.
- Calculate correlation and probability scores.
- Compute z-scores.
- Compute widely used statistical distributions, such as Weibull, gamma, and beta.
- Generate statistical reports over a set of columns or a dataset.
Note
Applies to: Machine Learning Studio (classic) only
Similar drag-and-drop modules are available in Azure Machine Learning designer.
For example, if you have a new dataset, you might use the Summarize Data module first. It generates a report for an entire dataset that includes standard statistical measures, such as mean and standard deviation.
If you need more advanced statistics, such as sample skewness or interquartile distance, use the Compute Elementary Statistics module to generate additional descriptive statistics.
Because the modules generate the results each time you run the experiment, the results are updated if your data changes.
List of modules
The Statistical Functions category includes the following modules:
- Apply Math Operation: Applies a mathematical operation to column values.
- Compute Elementary Statistics: Calculates specified summary statistics for selected dataset columns.
- Compute Linear Correlation: Calculates the linear correlation between column values in a dataset.
- Evaluate Probability Function: Fits a specified probability distribution function to a dataset.
- Replace Discrete Values: Replaces discrete values from one column with numeric values based on another column.
- Summarize Data: Generates a basic descriptive statistics report for the columns in a dataset.
- Test Hypothesis Using t-Test: Compares means from two datasets by using a t-test.