Text Analytics
Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.
- See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
- Learn more about Azure Machine Learning.
ML Studio (classic) documentation is being retired and may not be updated in the future.
This article describes the text analytics modules included in Machine Learning Studio (classic). These modules provide specialized computational tools for working with both structured and unstructured text, including:
- Multiple options for preprocessing text.
- Language detection.
- Creation of features from text using customizable n-gram dictionaries.
- Feature hashing, to efficiently analyze text without preprocessing or advanced linguistic analysis.
- Vowpal Wabbit, for very fast machine learning on text. Vowpal Wabbit supports feature hashing, topic modeling (LDA), and classification.
- Named entity recognition, to extract the names of people, places, and organizations from unstructured text.
Note
Applies to: Machine Learning Studio (classic) only
Similar drag-and-drop modules are available in Azure Machine Learning designer.
Examples
For examples of text analytics using Machine Learning, see the Azure AI Gallery:
News categorization: Uses feature hashing to classify articles into a predefined list of categories.
Find similar companies: Uses the text of Wikipedia articles to categorize companies.
Text classification: Demonstrates the end-to-end process of using text from Twitter messages in sentiment analysis (five-part sample).
List of modules
The Text Analytics category in Machine Learning Studio (classic) includes these modules:
- Detect Languages: Detects the language of each line in the input file.
- Extract Key Phrases from Text: Extracts key phrases from given text.
- Extract N-Gram Features from Text: Creates N-Gram dictionary features, and does feature selection on them.
- Feature Hashing: Converts text data to integer-encoded features by using the Vowpal Wabbit library.
- Latent Dirichlet Allocation: Performs topic modeling by using the Vowpal Wabbit library for LDA.
- Named Entity Recognition: Recognizes named entities in a text column.
- Preprocess Text: Performs cleaning operations on text.
- Score Vowpal Wabbit 7-4 Model: Scores input from Azure by using version 7-4 of the Vowpal Wabbit machine learning system.
- Score Vowpal Wabbit 7-10 Model: Scores input from Azure by using version 7-10 of the Vowpal Wabbit machine learning system.
- Score Vowpal Wabbit 8 Model: Scores input from Azure by using version 8 of the Vowpal Wabbit machine learning system.
- Train Vowpal Wabbit 7-4 Model: Trains a model by using version 7-4 of the Vowpal Wabbit machine learning system.
- Train Vowpal Wabbit 7-10 Model: Trains a model by using version 7-10 of the Vowpal Wabbit machine learning system.
- Train Vowpal Wabbit 8 Model: Trains a model by using version 8 of the Vowpal Wabbit machine learning system.