automl Package

Reference

Contains automated machine learning classes for Azure Machine Learning SDKv2.

Main areas include managing AutoML tasks.

Classes

ClassificationJob	Configuration for AutoML Classification Job. Initialize a new AutoML Classification task.
ColumnTransformer	Column transformer settings.
ForecastingJob	Configuration for AutoML Forecasting Task. Initialize a new AutoML Forecasting task.
ForecastingSettings	Forecasting settings for an AutoML Job.
ImageClassificationJob	Configuration for AutoML multi-class Image Classification job.
ImageClassificationMultilabelJob	Configuration for AutoML multi-label Image Classification job.
ImageClassificationSearchSpace	Search space for AutoML Image Classification and Image Classification Multilabel tasks.
ImageInstanceSegmentationJob	Configuration for AutoML Image Instance Segmentation job.
ImageLimitSettings	Limit settings for AutoML Image Verticals. ImageLimitSettings is a class that contains the following parameters: max_concurrent_trials, max_trials, and timeout_minutes. This is an optional configuration method to configure limits parameters such as timeouts etc. Note The number of concurrent runs is gated on the resources available in the specified compute target. Ensure that the compute target has the available resources for the desired concurrency. Tip It's a good practice to match max_concurrent_trials count with the number of nodes in the cluster. For example, if you have a cluster with 4 nodes, set max_concurrent_trials to 4.
ImageModelSettingsClassification	Model settings for AutoML Image Classification tasks.
ImageModelSettingsObjectDetection	Model settings for AutoML Image Object Detection Task. Defining the automl image object detection or instance segmentation model settings. `from azure.ai.ml import automl object_detection_model_settings = automl.ImageModelSettingsObjectDetection(min_size=600, max_size=1333)`
ImageObjectDetectionJob	Configuration for AutoML Image Object Detection job.
ImageObjectDetectionSearchSpace	Search space for AutoML Image Object Detection and Image Instance Segmentation tasks.
ImageSweepSettings	Sweep settings for all AutoML Image Verticals. ] :keyword early_termination: Type of early termination policy. :paramtype early_termination: Union[ ~azure.mgmt.machinelearningservices.models.BanditPolicy, ~azure.mgmt.machinelearningservices.models.MedianStoppingPolicy, ~azure.mgmt.machinelearningservices.models.TruncationSelectionPolicy ]
NlpFeaturizationSettings	Featurization settings for all AutoML NLP Verticals.
NlpFixedParameters	Configuration of fixed parameters for all candidates of an AutoML NLP Job
NlpLimitSettings	Limit settings for all AutoML NLP Verticals.
NlpSearchSpace	Search space for AutoML NLP tasks.
NlpSweepSettings	Sweep settings for all AutoML NLP tasks.
RegressionJob	Configuration for AutoML Regression Job. Initialize a new AutoML Regression task.
SearchSpace	SearchSpace class for AutoML verticals.
StackEnsembleSettings	Advance setting to customize StackEnsemble run.
TabularFeaturizationSettings	Featurization settings for an AutoML Job.
TabularLimitSettings	Limit settings for a AutoML Table Verticals.
TextClassificationJob	Configuration for AutoML Text Classification Job.
TextClassificationMultilabelJob	Configuration for AutoML Text Classification Multilabel Job.
TextNerJob	Configuration for AutoML Text NER Job.
TrainingSettings	TrainingSettings class for Azure Machine Learning. TrainingSettings class for Azure Machine Learning.

Enums

BlockedTransformers	Enum for all classification models supported by AutoML.
ClassificationModels	Enum for all classification models supported by AutoML.
ClassificationMultilabelPrimaryMetrics	Primary metrics for classification multilabel tasks.
ClassificationPrimaryMetrics	Primary metrics for classification tasks.
FeaturizationMode	Featurization mode - determines data featurization mode.
ForecastHorizonMode	Enum to determine forecast horizon selection mode.
ForecastingModels	Enum for all forecasting models supported by AutoML.
ForecastingPrimaryMetrics	Primary metrics for Forecasting task.
InstanceSegmentationPrimaryMetrics	Primary metrics for InstanceSegmentation tasks.
LearningRateScheduler	Learning rate scheduler enum.
LogTrainingMetrics
LogValidationLoss
NCrossValidationsMode	Determines how N-Cross validations value is determined.
ObjectDetectionPrimaryMetrics	Primary metrics for Image ObjectDetection task.
RegressionModels	Enum for all Regression models supported by AutoML.
RegressionPrimaryMetrics	Primary metrics for Regression task.
SamplingAlgorithmType
ShortSeriesHandlingConfiguration	The parameter defining how if AutoML should handle short time series.
StochasticOptimizer	Stochastic optimizer for image models.
TargetAggregationFunction	Target aggregate function.
TargetLagsMode	Target lags selection modes.
TargetRollingWindowSizeMode	Target rolling windows size mode.
UseStl	Configure STL Decomposition of the time-series target column.
ValidationMetricType	Metric computation method to use for validation metrics in image tasks.

Functions

classification

Function to create a ClassificationJob.

A classification job is used to train a model that best predict the class of a data sample. Various models are trained using the training data. The model with the best performance on the validation data based on the primary metric is selected as the final model.

classification(*, training_data: Input, target_column_name: str, primary_metric: str | None = None, enable_model_explainability: bool | None = None, weight_column_name: str | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, n_cross_validations: str | int | None = None, cv_split_column_names: List[str] | None = None, test_data: Input | None = None, test_data_size: float | None = None, **kwargs) -> ClassificationJob

Keyword-Only Parameters

Name	Description
training_data	Input The training data to be used within the experiment. It should contain both training features and a label column (optionally a sample weights column).
target_column_name	str The name of the label column. This parameter is applicable to `training_data`, `validation_data` and `test_data` parameters
primary_metric	The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. Acceptable values: accuracy, AUC_weighted, norm_macro_recall, average_precision_score_weighted, and precision_score_weighted Defaults to accuracy
enable_model_explainability	bool Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. The default is None. For more information, see Interpretability: model explanations in automated machine learning.
weight_column_name	str The name of the sample weight column. Automated ML supports a weighted column as an input, causing rows in the data to be weighted up or down. If the input data is from a pandas.DataFrame which doesn't have column names, column indices can be used instead, expressed as integers. This parameter is applicable to `training_data` and `validation_data` parameters
validation_data	Input The validation data to be used within the experiment. It should contain both training features and label column (optionally a sample weights column). Defaults to None
validation_data_size	float What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive. Specify `validation_data` to provide validation data, otherwise set `n_cross_validations` or `validation_data_size` to extract validation data out of the specified training data. For custom cross validation fold, use `cv_split_column_names`. For more information, see Configure data splits and cross-validation in automated machine learning. Defaults to None
n_cross_validations	Union[str, int] How many cross validations to perform when user validation data is not specified. Specify `validation_data` to provide validation data, otherwise set `n_cross_validations` or `validation_data_size` to extract validation data out of the specified training data. For custom cross validation fold, use `cv_split_column_names`. For more information, see Configure data splits and cross-validation in automated machine learning. Defaults to None
cv_split_column_names	List[str] List of names of the columns that contain custom cross validation split. Each of the CV split columns represents one CV split where each row are either marked 1 for training or 0 for validation. Defaults to None
test_data	Input The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. The test data to be used for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. If this parameter or the `test_data_size` parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If `test_data` is specified then the `target_column_name` parameter must be specified. Defaults to None
test_data_size	float The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. What fraction of the training data to hold out for test data for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. This should be between 0.0 and 1.0 non-inclusive. If `test_data_size` is specified at the same time as `validation_data_size`, then the test data is split from `training_data` before the validation data is split. For example, if `validation_data_size=0.1`, `test_data_size=0.1` and the original training data has 1000 rows, then the test data will have 100 rows, the validation data will contain 90 rows and the training data will have 810 rows. For regression based tasks, random sampling is used. For classification tasks, stratified sampling is used. Forecasting does not currently support specifying a test dataset using a train/test split. If this parameter or the `test_data` parameter are not specified then no test run will be executed automatically after model training is completed. Defaults to None

Returns

Type	Description
ClassificationJob	A job object that can be submitted to an Azure ML compute for execution.

forecasting

Function to create a Forecasting job.

A forecasting task is used to predict target values for a future time period based on the historical data. Various models are trained using the training data. The model with the best performance on the validation data based on the primary metric is selected as the final model.

forecasting(*, training_data: Input, target_column_name: str, primary_metric: str | None = None, enable_model_explainability: bool | None = None, weight_column_name: str | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, n_cross_validations: str | int | None = None, cv_split_column_names: List[str] | None = None, test_data: Input | None = None, test_data_size: float | None = None, forecasting_settings: ForecastingSettings | None = None, **kwargs) -> ForecastingJob

Keyword-Only Parameters

Name	Description
training_data	Input The training data to be used within the experiment. It should contain both training features and a label column (optionally a sample weights column).
target_column_name	str The name of the label column. This parameter is applicable to `training_data`, `validation_data` and `test_data` parameters
primary_metric	The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. Acceptable values: r2_score, normalized_mean_absolute_error, normalized_root_mean_squared_error Defaults to normalized_root_mean_squared_error
enable_model_explainability	bool Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. The default is None. For more information, see Interpretability: model explanations in automated machine learning.
weight_column_name	str The name of the sample weight column. Automated ML supports a weighted column as an input, causing rows in the data to be weighted up or down. If the input data is from a pandas.DataFrame which doesn't have column names, column indices can be used instead, expressed as integers. This parameter is applicable to `training_data` and `validation_data` parameters
validation_data	Input The validation data to be used within the experiment. It should contain both training features and label column (optionally a sample weights column). Defaults to None
validation_data_size	float What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive. Specify `validation_data` to provide validation data, otherwise set `n_cross_validations` or `validation_data_size` to extract validation data out of the specified training data. For custom cross validation fold, use `cv_split_column_names`. For more information, see Configure data splits and cross-validation in automated machine learning. Defaults to None
n_cross_validations	Union[str, int] How many cross validations to perform when user validation data is not specified. Specify `validation_data` to provide validation data, otherwise set `n_cross_validations` or `validation_data_size` to extract validation data out of the specified training data. For custom cross validation fold, use `cv_split_column_names`. For more information, see Configure data splits and cross-validation in automated machine learning. Defaults to None
cv_split_column_names	List[str] List of names of the columns that contain custom cross validation split. Each of the CV split columns represents one CV split where each row are either marked 1 for training or 0 for validation. Defaults to None
test_data	Input The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. The test data to be used for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. If this parameter or the `test_data_size` parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If `test_data` is specified then the `target_column_name` parameter must be specified. Defaults to None
test_data_size	float The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. What fraction of the training data to hold out for test data for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. This should be between 0.0 and 1.0 non-inclusive. If `test_data_size` is specified at the same time as `validation_data_size`, then the test data is split from `training_data` before the validation data is split. For example, if `validation_data_size=0.1`, `test_data_size=0.1` and the original training data has 1000 rows, then the test data will have 100 rows, the validation data will contain 90 rows and the training data will have 810 rows. For regression based tasks, random sampling is used. For classification tasks, stratified sampling is used. Forecasting does not currently support specifying a test dataset using a train/test split. If this parameter or the `test_data` parameter are not specified then no test run will be executed automatically after model training is completed. Defaults to None
forecasting_settings	ForecastingSettings The settings for the forecasting task

Returns

Type	Description
ForecastingJob	A job object that can be submitted to an Azure ML compute for execution.

image_classification

Creates an object for AutoML Image multi-class Classification job.

image_classification(*, training_data: Input, target_column_name: str, primary_metric: str | ClassificationPrimaryMetrics | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, **kwargs) -> ImageClassificationJob

Keyword-Only Parameters

Name	Description
training_data	<xref:azure.ai.ml.entities.Input> The training data to be used within the experiment.
target_column_name	str The name of the label column. This parameter is applicable to `training_data` and `validation_data` parameters.
primary_metric	Union[str, ClassificationPrimaryMetrics] The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. Acceptable values: accuracy, AUC_weighted, norm_macro_recall, average_precision_score_weighted, and precision_score_weighted Defaults to accuracy.
validation_data	Optional[<xref:azure.ai.ml.entities.Input>] The validation data to be used within the experiment.
validation_data_size	float What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive. Specify `validation_data` to provide validation data, otherwise set `validation_data_size` to extract validation data out of the specified training data. Defaults to .2

Returns

Type	Description
ImageClassificationJob	Image classification job object that can be submitted to an Azure ML compute for execution.

Examples

creating an automl image classification job


   from azure.ai.ml import automl, Input
   from azure.ai.ml.constants import AssetTypes
   from azure.ai.ml.automl import ClassificationMultilabelPrimaryMetrics

   image_classification_job = automl.image_classification(
       experiment_name="my_experiment",
       compute="my_compute",
       training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
       validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
       target_column_name="label",
       primary_metric=ClassificationMultilabelPrimaryMetrics.ACCURACY,
       tags={"my_custom_tag": "My custom value"},
   )

image_classification_multilabel

Creates an object for AutoML Image multi-label Classification job.

image_classification_multilabel(*, training_data: Input, target_column_name: str, primary_metric: str | ClassificationMultilabelPrimaryMetrics | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, **kwargs) -> ImageClassificationMultilabelJob

Keyword-Only Parameters

Name	Description
training_data	<xref:azure.ai.ml.entities.Input> The training data to be used within the experiment.
target_column_name	str The name of the label column. This parameter is applicable to `training_data` and `validation_data` parameters.
primary_metric	Union[str, ClassificationMultilabelPrimaryMetrics] The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. Acceptable values: accuracy, AUC_weighted, norm_macro_recall, average_precision_score_weighted, precision_score_weighted, and Iou Defaults to Iou.
validation_data	Optional[<xref:azure.ai.ml.entities.Input>] The validation data to be used within the experiment.
validation_data_size	float The fraction of the training data to hold out for validation when user does not provide the validation data. This should be between 0.0 and 1.0 non-inclusive. Specify `validation_data` to provide validation data, otherwise set `validation_data_size` to extract validation data out of the specified training data. Defaults to .2

Returns

Type	Description
ImageClassificationMultilabelJob	Image multi-label classification job object that can be submitted to an Azure ML compute for execution.

Examples

creating an automl image multilabel classification job


   from azure.ai.ml import automl, Input
   from azure.ai.ml.constants import AssetTypes
   from azure.ai.ml.automl import ClassificationMultilabelPrimaryMetrics

   image_classification_multilabel_job = automl.image_classification_multilabel(
       experiment_name="my_experiment",
       compute="my_compute",
       training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
       validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
       target_column_name="label",
       primary_metric=ClassificationMultilabelPrimaryMetrics.IOU,
       tags={"my_custom_tag": "My custom value"},
   )

image_instance_segmentation

Creates an object for AutoML Image Instance Segmentation job.

image_instance_segmentation(*, training_data: Input, target_column_name: str, primary_metric: str | InstanceSegmentationPrimaryMetrics | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, **kwargs) -> ImageInstanceSegmentationJob

Keyword-Only Parameters

Name	Description
training_data	<xref:azure.ai.ml.entities.Input> The training data to be used within the experiment.
target_column_name	str The name of the label column. This parameter is applicable to `training_data` and `validation_data` parameters.
primary_metric	Union[str, InstanceSegmentationPrimaryMetrics] The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. Acceptable values: MeanAveragePrecision Defaults to MeanAveragePrecision.
validation_data	Optional[<xref:azure.ai.ml.entities.Input>] The validation data to be used within the experiment.
validation_data_size	float The fraction of the training data to hold out for validation when user does not provide the validation data. This should be between 0.0 and 1.0 non-inclusive. Specify `validation_data` to provide validation data, otherwise set `validation_data_size` to extract validation data out of the specified training data. Defaults to .2

Returns

Type	Description
ImageInstanceSegmentationJob	Image instance segmentation job

Examples

creating an automl image instance segmentation job


   from azure.ai.ml import automl, Input
   from azure.ai.ml.constants import AssetTypes
   from azure.ai.ml.automl import InstanceSegmentationPrimaryMetrics

   image_instance_segmentation_job = automl.image_instance_segmentation(
       experiment_name="my_experiment",
       compute="my_compute",
       training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
       validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
       target_column_name="label",
       primary_metric=InstanceSegmentationPrimaryMetrics.MEAN_AVERAGE_PRECISION,
       tags={"my_custom_tag": "My custom value"},
   )

image_object_detection

Creates an object for AutoML Image Object Detection job.

image_object_detection(*, training_data: Input, target_column_name: str, primary_metric: str | ObjectDetectionPrimaryMetrics | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, **kwargs) -> ImageObjectDetectionJob

Keyword-Only Parameters

Name	Description
training_data	<xref:azure.ai.ml.entities.Input> The training data to be used within the experiment.
target_column_name	str The name of the label column. This parameter is applicable to `training_data` and `validation_data` parameters.
primary_metric	Union[str, ObjectDetectionPrimaryMetrics] The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. Acceptable values: MeanAveragePrecision Defaults to MeanAveragePrecision.
validation_data	Optional[<xref:azure.ai.ml.entities.Input>] The validation data to be used within the experiment.
validation_data_size	float The fraction of the training data to hold out for validation when user does not provide the validation data. This should be between 0.0 and 1.0 non-inclusive. Specify `validation_data` to provide validation data, otherwise set `validation_data_size` to extract validation data out of the specified training data. Defaults to .2

Returns

Type	Description
ImageObjectDetectionJob	Image object detection job object that can be submitted to an Azure ML compute for execution.

Examples

creating an automl image object detection job


   from azure.ai.ml import automl, Input
   from azure.ai.ml.constants import AssetTypes
   from azure.ai.ml.automl import ObjectDetectionPrimaryMetrics

   image_object_detection_job = automl.image_object_detection(
       experiment_name="my_experiment",
       compute="my_compute",
       training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
       validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
       target_column_name="label",
       primary_metric=ObjectDetectionPrimaryMetrics.MEAN_AVERAGE_PRECISION,
       tags={"my_custom_tag": "My custom value"},
   )

regression

Function to create a Regression Job.

A regression job is used to train a model to predict continuous values of a target variable from a dataset. Various models are trained using the training data. The model with the best performance on the validation data based on the primary metric is selected as the final model.

regression(*, training_data: Input, target_column_name: str, primary_metric: str | None = None, enable_model_explainability: bool | None = None, weight_column_name: str | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, n_cross_validations: str | int | None = None, cv_split_column_names: List[str] | None = None, test_data: Input | None = None, test_data_size: float | None = None, **kwargs) -> RegressionJob

Keyword-Only Parameters

Name	Description
training_data	Input The training data to be used within the experiment. It should contain both training features and a label column (optionally a sample weights column).
target_column_name	str The name of the label column. This parameter is applicable to `training_data`, `validation_data` and `test_data` parameters
primary_metric	The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. Acceptable values: spearman_correlation, r2_score, normalized_mean_absolute_error, normalized_root_mean_squared_error. Defaults to normalized_root_mean_squared_error
enable_model_explainability	bool Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. The default is None. For more information, see Interpretability: model explanations in automated machine learning.
weight_column_name	str The name of the sample weight column. Automated ML supports a weighted column as an input, causing rows in the data to be weighted up or down. If the input data is from a pandas.DataFrame which doesn't have column names, column indices can be used instead, expressed as integers. This parameter is applicable to `training_data` and `validation_data` parameters
validation_data	Input The validation data to be used within the experiment. It should contain both training features and label column (optionally a sample weights column). Defaults to None
validation_data_size	float What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive. Specify `validation_data` to provide validation data, otherwise set `n_cross_validations` or `validation_data_size` to extract validation data out of the specified training data. For custom cross validation fold, use `cv_split_column_names`. For more information, see Configure data splits and cross-validation in automated machine learning. Defaults to None
n_cross_validations	Union[str, int] How many cross validations to perform when user validation data is not specified. Specify `validation_data` to provide validation data, otherwise set `n_cross_validations` or `validation_data_size` to extract validation data out of the specified training data. For custom cross validation fold, use `cv_split_column_names`. For more information, see Configure data splits and cross-validation in automated machine learning. Defaults to None
cv_split_column_names	List[str] List of names of the columns that contain custom cross validation split. Each of the CV split columns represents one CV split where each row are either marked 1 for training or 0 for validation. Defaults to None
test_data	Input The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. The test data to be used for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. If this parameter or the `test_data_size` parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If `test_data` is specified then the `target_column_name` parameter must be specified. Defaults to None
test_data_size	float The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. What fraction of the training data to hold out for test data for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. This should be between 0.0 and 1.0 non-inclusive. If `test_data_size` is specified at the same time as `validation_data_size`, then the test data is split from `training_data` before the validation data is split. For example, if `validation_data_size=0.1`, `test_data_size=0.1` and the original training data has 1000 rows, then the test data will have 100 rows, the validation data will contain 90 rows and the training data will have 810 rows. For regression based tasks, random sampling is used. For classification tasks, stratified sampling is used. Forecasting does not currently support specifying a test dataset using a train/test split. If this parameter or the `test_data` parameter are not specified then no test run will be executed automatically after model training is completed. Defaults to None

Returns

Type	Description
RegressionJob	A job object that can be submitted to an Azure ML compute for execution.

text_classification

Function to create a TextClassificationJob.

A text classification job is used to train a model that can predict the class/category of a text data. Input training data should include a target column that classifies the text into exactly one class.

text_classification(*, training_data: Input, target_column_name: str, validation_data: Input, primary_metric: str | None = None, log_verbosity: str | None = None, **kwargs) -> TextClassificationJob

Keyword-Only Parameters

Name	Description
training_data	Input The training data to be used within the experiment. It should contain both training features and a target column.
target_column_name	str Name of the target column.
validation_data	Input The validation data to be used within the experiment. It should contain both training features and a target column.
primary_metric	Union[str, ClassificationPrimaryMetrics] Primary metric for the task. Acceptable values: accuracy, AUC_weighted, precision_score_weighted
log_verbosity	str Log verbosity level.

Returns

Type	Description
TextClassificationJob	The TextClassificationJob object.

Examples

creating an automl text classification job


   from azure.ai.ml import automl, Input
   from azure.ai.ml.constants import AssetTypes

   test_classification_job = automl.text_classification(
       experiment_name="my_experiment",
       compute="my_compute",
       training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
       validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
       target_column_name="Sentiment",
       primary_metric="accuracy",
       tags={"my_custom_tag": "My custom value"},
   )

text_classification_multilabel

Function to create a TextClassificationMultilabelJob.

A text classification multilabel job is used to train a model that can predict the classes/categories of a text data. Input training data should include a target column that classifies the text into class(es). For more information on format of multilabel data, refer to: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-nlp-models#multi-label

text_classification_multilabel(*, training_data: Input, target_column_name: str, validation_data: Input, primary_metric: str | None = None, log_verbosity: str | None = None, **kwargs) -> TextClassificationMultilabelJob

Keyword-Only Parameters

Name	Description
training_data	Input The training data to be used within the experiment. It should contain both training features and a target column.
target_column_name	str Name of the target column.
validation_data	Input The validation data to be used within the experiment. It should contain both training features and a target column.
primary_metric	str Primary metric for the task. Acceptable values: accuracy
log_verbosity	str Log verbosity level.

Returns

Type	Description
TextClassificationMultilabelJob	The TextClassificationMultilabelJob object.

Examples

creating an automl text multilabel classification job


   from azure.ai.ml import automl, Input
   from azure.ai.ml.constants import AssetTypes

   text_classification_multilabel_job = automl.text_classification_multilabel(
       experiment_name="my_experiment",
       compute="my_compute",
       training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
       validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
       target_column_name="terms",
       primary_metric="accuracy",
       tags={"my_custom_tag": "My custom value"},
   )

text_ner

Function to create a TextNerJob.

A text named entity recognition job is used to train a model that can predict the named entities in the text. Input training data should be a text file in CoNLL format. For more information on format of text NER data, refer to: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-nlp-models#named-entity-recognition-ner

text_ner(*, training_data: Input, validation_data: Input, primary_metric: str | None = None, log_verbosity: str | None = None, **kwargs) -> TextNerJob

Keyword-Only Parameters

Name	Description
training_data	Input The training data to be used within the experiment. It should contain both training features and a target column.
validation_data	Input The validation data to be used within the experiment. It should contain both training features and a target column.
primary_metric	str Primary metric for the task. Acceptable values: accuracy
log_verbosity	str Log verbosity level.

Returns

Type	Description
TextNerJob	The TextNerJob object.

Examples

creating an automl text ner job


   from azure.ai.ml import automl, Input
   from azure.ai.ml.constants import AssetTypes

   text_ner_job = automl.text_ner(
       experiment_name="my_experiment",
       compute="my_compute",
       training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
       validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
       tags={"my_custom_tag": "My custom value"},
   )

Share via

automl Package

Classes

Enums

Functions

classification

Keyword-Only Parameters

Returns

forecasting

Keyword-Only Parameters

Returns

image_classification

Keyword-Only Parameters

Returns

Examples

image_classification_multilabel

Keyword-Only Parameters

Returns

Examples

image_instance_segmentation

Keyword-Only Parameters

Returns

Examples

image_object_detection

Keyword-Only Parameters

Returns

Examples

regression

Keyword-Only Parameters

Returns

text_classification

Keyword-Only Parameters

Returns

Examples

text_classification_multilabel

Keyword-Only Parameters

Returns

Examples

text_ner

Keyword-Only Parameters

Returns

Examples

Additional resources