Data Transformation - Manipulation
Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.
- See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
- Learn more about Azure Machine Learning.
ML Studio (classic) documentation is being retired and may not be updated in the future.
This article describes the modules in Machine Learning Studio (classic) that you can use for basic data manipulation.
Note
Applies to: Machine Learning Studio (classic) only
Similar drag-and-drop modules are available in Azure Machine Learning designer.
Machine Learning Studio (classic) supports tasks that are specific to machine learning, such as normalization or feature selection. The modules in this category are intended for more general tasks.
Data manipulation tasks
The modules in this category are intended to support core data management tasks that might need to be performed in Machine Learning Studio (classic). The following tasks are examples of core data management tasks:
- Combine two datasets, either by using joins, or by merging columns or rows.
- Create new categories to use in grouping data.
- Modify column headings, change column data types, or flag columns as features or labels.
- Check for missing values, and then replace them with appropriate values.
Related tasks
- Perform sampling or divide a dataset into training and testing sets: Use the Data Transformation - Sample and Split modules.
- Scale numbers, normalize data, or put numerical values into bins: Use the Data Transformation - Scale and Reduce modules.
- Perform calculations on numeric data fields or to generate commonly used statistics: Use the tools in Statistical Functions.
Examples
For examples of how to work with complex data in machine learning experiments, see these samples in the Azure AI Gallery:
- Data Processing and Analysis: Demonstrates key tools and processes.
- Breast cancer detection: Illustrates how to partition datasets, and then apply special processing to each partition.
Modules in this category
The Data Transformation - Manipulation category includes the following modules:
- Add Columns: Adds a set of columns from one dataset to another.
- Add Rows: Appends a set of rows from an input dataset to the end of another dataset.
- Apply SQL Transformation: Runs a SQLite query on input datasets to transform the data.
- Clean Missing Data: Specifies how to handle values that are missing from a dataset. This module replaces Missing Values Scrubber, which has been deprecated.
- Convert to Indicator Values: Converts categorical values in columns to indicator values.
- Edit Metadata: Edits metadata that's associated with columns in a dataset.
- Group Categorical Values: Groups data from multiple categories into a new category.
- Join Data: Joins two datasets.
- Remove Duplicate Rows: Removes duplicate rows from a dataset.
- Select Columns in Dataset: Selects columns to include in a dataset or exclude from a dataset in an operation.
- Select Columns Transform: Creates a transformation that selects the same subset of columns as in a specified dataset.
- SMOTE: Increases the number of low-incidence examples in a dataset by using synthetic minority oversampling.