Convert to SVMLight
Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.
- See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
- Learn more about Azure Machine Learning.
ML Studio (classic) documentation is being retired and may not be updated in the future.
Converts data input to the format used by the SVM-Light framework
Category: Data Format Conversions
Note
Applies to: Machine Learning Studio (classic) only
Similar drag-and-drop modules are available in Azure Machine Learning designer.
Module overview
This article describes how to use the Convert to SVMLight module in Machine Learning Studio (classic), to convert your datasets to the format that is used by SVMLight.
The SVM-Light framework was developed by researchers at Cornell University. The SVM-Light library implements Vapnik's Support Vector Machine, but the format has been adopted elsewhere and can be used for many machine learning tasks, including classification and regression.
For more information, see SVMLight Support Vector Machine.
How to configure Convert to SVMLight
The conversion to SVMLight format entails converting each case into a row of data that begins with the label, followed by feature-value pairs expressed as colon-separated numbers. The conversion process does not automatically identify the correct columns, so it is important that you prepare the columns in your dataset before attempting conversion. For more information, see Preparing Data for Conversion.
Add the Convert to SVMLight module to your experiment. You can find this module in the Data Format Conversions category in Machine Learning Studio (classic).
Connect the dataset or output that you want to convert to SVMLight format.
Run the experiment.
Right-click the output of the module, select Download, and save the data to a local file for modification or for reuse with a program that supports SVMLight.
Preparing data for conversion
To illustrate the conversion process, this example uses the Blood Donor dataset in Studio (classic).
This sample dataset has the following format, in tabular form.
Recency | Frequency | Monetary | Time | Class |
---|---|---|---|---|
2 | 50 | 12500 | 98 | 1 |
0 | 13 | 3250 | 28 | 1 |
1 | 1 | 4000 | 35 | 1 |
2 | 20 | 5000 | 45 | 1 |
1 | 24 | 6000 | 77 | 0 |
Note that the label column, named [Class] in this dataset, is the last column in the table. However, if you convert the dataset to SVMLight without first indicating that which column contains the label, the first column, [Recency], is used as the label, and the [Class] column is treated as a feature:
2 1:50 2:12500 3:98 4:1
0 1:13 2:3250 3:28 4:1
1 1:16 2:4000 3:35 4:1
To make sure the labels are generated correctly at the beginning of the row for each case, you must add two instances of the Edit Metadata module.
In the first instance of Edit Metadata, select the label column ([Class]) and for Fields, select Label.
In the second instance of Edit Metadata, select all feature columns that you need in the converted file ([Recency], [Frequency], [Monetary], [Time]) and for Fields, select Features.
After the columns have been identified correctly, you can run the Convert to SVMLight module. After conversion, the first few rows of the Blood Donor dataset now have this format:
The label value precedes each entry, followed by the values for [Recency], [Frequency], [Monetary], and [Time], identified as features 1, 2, 3 and 4 respectively.
The label value of 0 in the fifth row has been converted to -1. This is because SVMLight supports only binary classification labels.
1 1:2 2:50 3:12500 4:98
1 1:0 2:13 3:3250 4:28
1 1:1 2:16 3:4000 4:35
1 1:2 2:20 3:5000 4:45
-1 1:1 2:24 3:6000 4:77
You can't directly use this text data for models in Azure ML, or visualize it. However, you can download it to a local share.
While you have the file open, we recommend that you add a comment line, prefaced by #
, so that you can add notes about the source or the original feature column names.
To use an SVMLight file in Vowpal Wabbit, and make additional modifications as described here: Conversion to Vowpal Wabbit Format. When the file is ready, upload it to Azure blob storage, and call it directly from one of the Vowpal Wabbit modules.
Examples
There are no examples in the Azure AI Gallery: that are specific to this format.
Technical notes
This section contains implementation details, tips, and answers to frequently asked questions.
Usage tips
The executables provided in the SVM-Light framework require both an example file and a model file. However, this module creates only the example file. You must create the model file separately by using the SVMLight libraries.
The example file is the file that contains the training examples.
Optional header
The first lines can contain comments. Comments must be prefixed with the number sign (#).
The file format output by Convert to SVMLight does not create headers. You can edit the file to add comments, a list of column names, and so forth.
Training data
Each case is on its own row. A case consists of a target value followed by a series of indices and the associated feature values.
The response value must be 1 or -1 for classification, or a number for regression.
The target value and each of the index-value pairs are separated by a space.
Training data example
The following table shows how the values in the columns of the Two-Class Iris dataset are converted to a representation in which each column is represented by an index, followed by a colon, and then the value in that column:
Iris Dataset | Iris Dataset Converted to SVMLight |
---|---|
1 6.3 2.9 5.6 1.8 | 1 1:6.3 2:2.9 3:5.6 4:1.8 |
0 4.8 3.4 1.6 0.2 | -1 1:4.8 2:3.4 3:1.6 4:0.2 |
1 7.2 3.2 6 1.8 | 1 1:7.2 2:3.2 3:6 4:1.8 |
Note that the names of the feature columns are lost in the conversion.
Using SVMLight to prepare a Vowpal Wabbit file
The SVMLight format is similar to the format used by Vowpal Wabbit. To change the SVMLight output file to a format usable for training a Vowpal Wabbit model, just add a pipe symbol between the label and the list of features.
For example, compare these lines of input:
Vowpal Wabbit format, including optional comment
# features are [Recency], [Frequency], [Monetary], [Time]
1 | 1:2 2:50 3:12500 4:98
1 | 1:0 2:13 3:3250 4:28
SVMLight format, including optional comment
# features are [Recency], [Frequency], [Monetary], [Time]
1 1:2 2:50 3:12500 4:98
1 1:0 2:13 3:3250 4:28
Expected inputs
Name | Type | Description |
---|---|---|
Dataset | Data Table | Input dataset |
Output
Name | Type | Description |
---|---|---|
Results dataset | SvmLight | Output dataset |