AutoMLConfig Class
Represents configuration for submitting an automated ML experiment in Azure Machine Learning.
This configuration object contains and persists the parameters for configuring the experiment run, as well as the training data to be used at run time. For guidance on selecting your settings, see https://aka.ms/AutoMLConfig.
Create an AutoMLConfig.
- Inheritance
-
builtins.objectAutoMLConfig
Constructor
AutoMLConfig(task: str, path: str | None = None, iterations: int | None = None, primary_metric: str | None = None, positive_label: Any | None = None, compute_target: Any | None = None, spark_context: Any | None = None, X: Any | None = None, y: Any | None = None, sample_weight: Any | None = None, X_valid: Any | None = None, y_valid: Any | None = None, sample_weight_valid: Any | None = None, cv_splits_indices: List[List[Any]] | None = None, validation_size: float | None = None, n_cross_validations: int | str | None = None, y_min: float | None = None, y_max: float | None = None, num_classes: int | None = None, featurization: str | FeaturizationConfig = 'auto', max_cores_per_iteration: int = 1, max_concurrent_iterations: int = 1, iteration_timeout_minutes: int | None = None, mem_in_mb: int | None = None, enforce_time_on_windows: bool = True, experiment_timeout_hours: float | None = None, experiment_exit_score: float | None = None, enable_early_stopping: bool = True, blocked_models: List[str] | None = None, blacklist_models: List[str] | None = None, exclude_nan_labels: bool = True, verbosity: int = 20, enable_tf: bool = False, model_explainability: bool = True, allowed_models: List[str] | None = None, whitelist_models: List[str] | None = None, enable_onnx_compatible_models: bool = False, enable_voting_ensemble: bool = True, enable_stack_ensemble: bool | None = None, debug_log: str = 'automl.log', training_data: Any | None = None, validation_data: Any | None = None, test_data: Any | None = None, test_size: float | None = None, label_column_name: str | None = None, weight_column_name: str | None = None, cv_split_column_names: List[str] | None = None, enable_local_managed: bool = False, enable_dnn: bool | None = None, forecasting_parameters: ForecastingParameters | None = None, **kwargs: Any)
Parameters
Name | Description |
---|---|
task
Required
|
The type of task to run. Values can be 'classification', 'regression', or 'forecasting' depending on the type of automated ML problem to solve. |
path
Required
|
The full path to the Azure Machine Learning project folder. If not specified, the default is to use the current directory or ".". |
iterations
Required
|
The total number of different algorithm and parameter combinations to test during an automated ML experiment. If not specified, the default is 1000 iterations. |
primary_metric
Required
|
The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. You can use get_primary_metrics to get a list of valid metrics for your given task. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. If not specified, accuracy is used for classification tasks, normalized root mean squared is used for forecasting and regression tasks, accuracy is used for image classification and image multi label classification, and mean average precision is used for image object detection. |
positive_label
Required
|
The positive class label that Automated Machine Learning will use to calculate binary metrics with. Binary metrics are calculated in two conditions for classification tasks:
For more information on classification, checkout metrics for classification scenarios. |
compute_target
Required
|
The Azure Machine Learning compute target to run the Automated Machine Learning experiment on. See https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml#local-remote for more information on compute targets. |
spark_context
Required
|
<xref:SparkContext>
The Spark context. Only applicable when used inside Azure Databricks/Spark environment. |
X
Required
|
The training features to use when fitting pipelines during an experiment. This setting is being deprecated. Please use training_data and label_column_name instead. |
y
Required
|
The training labels to use when fitting pipelines during an experiment. This is the value your model will predict. This setting is being deprecated. Please use training_data and label_column_name instead. |
sample_weight
Required
|
The weight to give to each training sample when running fitting pipelines, each row should correspond to a row in X and y data. Specify this parameter when specifying |
X_valid
Required
|
Validation features to use when fitting pipelines during an experiment. If specified, then |
y_valid
Required
|
Validation labels to use when fitting pipelines during an experiment. Both |
sample_weight_valid
Required
|
The weight to give to each validation sample when running scoring pipelines, each row should correspond to a row in X and y data. Specify this parameter when specifying |
cv_splits_indices
Required
|
Indices where to split training data for cross validation. Each row is a separate cross fold and within each crossfold, provide 2 numpy arrays, the first with the indices for samples to use for training data and the second with the indices to use for validation data. i.e., [[t1, v1], [t2, v2], ...] where t1 is the training indices for the first cross fold and v1 is the validation indices for the first cross fold. To specify existing data as validation data, use |
validation_size
Required
|
What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive. Specify For more information, see Configure data splits and cross-validation in automated machine learning. |
n_cross_validations
Required
|
How many cross validations to perform when user validation data is not specified. Specify For more information, see Configure data splits and cross-validation in automated machine learning. |
y_min
Required
|
Minimum value of y for a regression experiment. The combination of |
y_max
Required
|
Maximum value of y for a regression experiment. The combination of |
num_classes
Required
|
The number of classes in the label data for a classification experiment. This setting is being deprecated. Instead, this value will be computed from the data. |
featurization
Required
|
'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Note: If the input data is sparse, featurization cannot be turned on. Column type is automatically detected. Based on the detected column type preprocessing/featurization is done as follows:
More details can be found in the article Configure automated ML experiments in Python. To customize featurization step, provide a FeaturizationConfig object. Customized featurization currently supports blocking a set of transformers, updating column purpose, editing transformer parameters, and dropping columns. For more information, see Customize feature engineering. Note: Timeseries features are handled separately when the task type is set to forecasting independent of this parameter. |
max_cores_per_iteration
Required
|
The maximum number of threads to use for a given training iteration. Acceptable values:
|
max_concurrent_iterations
Required
|
Represents the maximum number of iterations that would be executed in parallel. The default value is 1.
|
iteration_timeout_minutes
Required
|
Maximum time in minutes that each iteration can run for before it terminates. If not specified, a value of 1 month or 43200 minutes is used. |
mem_in_mb
Required
|
Maximum memory usage that each iteration can run for before it terminates. If not specified, a value of 1 PB or 1073741824 MB is used. |
enforce_time_on_windows
Required
|
Whether to enforce a time limit on model training at each iteration on Windows. The default is True. If running from a Python script file (.py), see the documentation for allowing resource limits on Windows. |
experiment_timeout_hours
Required
|
Maximum amount of time in hours that all iterations combined can take before the experiment terminates. Can be a decimal value like 0.25 representing 15 minutes. If not specified, the default experiment timeout is 6 days. To specify a timeout less than or equal to 1 hour, make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results. |
experiment_exit_score
Required
|
Target score for experiment. The experiment terminates after this score is reached. If not specified (no criteria), the experiment runs until no further progress is made on the primary metric. For for more information on exit criteria, see this article. |
enable_early_stopping
Required
|
Whether to enable early termination if the score is not improving in the short term. The default is True. Early stopping logic:
|
blocked_models
Required
|
list(str) or
list(Classification) <xref:for classification task> or
list(Regression) <xref:for regression task> or
list(Forecasting) <xref:for forecasting task>
A list of algorithms to ignore for an experiment. If |
blacklist_models
Required
|
list(str) or
list(Classification) <xref:for classification task> or
list(Regression) <xref:for regression task> or
list(Forecasting) <xref:for forecasting task>
Deprecated parameter, use blocked_models instead. |
exclude_nan_labels
Required
|
Whether to exclude rows with NaN values in the label. The default is True. |
verbosity
Required
|
The verbosity level for writing to the log file. The default is INFO or 20. Acceptable values are defined in the Python logging library. |
enable_tf
Required
|
Deprecated parameter to enable/disable Tensorflow algorithms. The default is False. |
model_explainability
Required
|
Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. The default is True. For more information, see Interpretability: model explanations in automated machine learning. |
allowed_models
Required
|
list(str) or
list(Classification) <xref:for classification task> or
list(Regression) <xref:for regression task> or
list(Forecasting) <xref:for forecasting task>
A list of model names to search for an experiment. If not specified, then all models supported
for the task are used minus any specified in |
whitelist_models
Required
|
list(str) or
list(Classification) <xref:for classification task> or
list(Regression) <xref:for regression task> or
list(Forecasting) <xref:for forecasting task>
Deprecated parameter, use allowed_models instead. |
enable_onnx_compatible_models
Required
|
Whether to enable or disable enforcing the ONNX-compatible models. The default is False. For more information about Open Neural Network Exchange (ONNX) and Azure Machine Learning, see this article. |
forecasting_parameters
Required
|
A ForecastingParameters object to hold all the forecasting specific parameters. |
time_column_name
Required
|
The name of the time column. This parameter is required when forecasting to specify the datetime column in the input data used for building the time series and inferring its frequency. This setting is being deprecated. Please use forecasting_parameters instead. |
max_horizon
Required
|
The desired maximum forecast horizon in units of time-series frequency. The default value is 1. Units are based on the time interval of your training data, e.g., monthly, weekly that the forecaster should predict out. When task type is forecasting, this parameter is required. For more information on setting forecasting parameters, see Auto-train a time-series forecast model. This setting is being deprecated. Please use forecasting_parameters instead. |
grain_column_names
Required
|
The names of columns used to group a timeseries. It can be used to create multiple series. If grain is not defined, the data set is assumed to be one time-series. This parameter is used with task type forecasting. This setting is being deprecated. Please use forecasting_parameters instead. |
target_lags
Required
|
The number of past periods to lag from the target column. The default is 1. This setting is being deprecated. Please use forecasting_parameters instead. When forecasting, this parameter represents the number of rows to lag the target values based on the frequency of the data. This is represented as a list or single integer. Lag should be used when the relationship between the independent variables and dependant variable do not match up or correlate by default. For example, when trying to forecast demand for a product, the demand in any month may depend on the price of specific commodities 3 months prior. In this example, you may want to lag the target (demand) negatively by 3 months so that the model is training on the correct relationship. For more information, see Auto-train a time-series forecast model. |
feature_lags
Required
|
Flag for generating lags for the numeric features. This setting is being deprecated. Please use forecasting_parameters instead. |
target_rolling_window_size
Required
|
The number of past periods used to create a rolling window average of the target column. This setting is being deprecated. Please use forecasting_parameters instead. When forecasting, this parameter represents n historical periods to use to generate forecasted values, <= training set size. If omitted, n is the full training set size. Specify this parameter when you only want to consider a certain amount of history when training the model. |
country_or_region
Required
|
The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region code, for example 'US' or 'GB'. This setting is being deprecated. Please use forecasting_parameters instead. |
use_stl
Required
|
Configure STL Decomposition of the time-series target column. use_stl can take three values: None (default) - no stl decomposition, 'season' - only generate season component and season_trend - generate both season and trend components. This setting is being deprecated. Please use forecasting_parameters instead. |
seasonality
Required
|
Set time series seasonality. If seasonality is set to 'auto', it will be inferred. This setting is being deprecated. Please use forecasting_parameters instead. |
short_series_handling_configuration
Required
|
The parameter defining how if AutoML should handle short time series. Possible values: 'auto' (default), 'pad', 'drop' and None.
Date numeric_value string target 2020-01-01 23 green 55 Output assuming minimal number of values is four: Date numeric_value string target 2019-12-29 0 NA 55.1 2019-12-30 0 NA 55.6 2019-12-31 0 NA 54.5 2020-01-01 23 green 55 Note: We have two parameters short_series_handling_configuration and legacy short_series_handling. When both parameters are set we are synchronize them as shown in the table below (short_series_handling_configuration and short_series_handling for brevity are marked as handling_configuration and handling respectively). handling handling_configuration resulting handling resulting handling_configuration True auto True auto True pad True auto True drop True auto True None False None False auto False None False pad False None False drop False None False None False None |
freq
Required
|
Forecast frequency. When forecasting, this parameter represents the period with which the forecast is desired, for example daily, weekly, yearly, etc. The forecast frequency is dataset frequency by default. You can optionally set it to greater (but not lesser) than dataset frequency. We'll aggregate the data and generate the results at forecast frequency. For example, for daily data, you can set the frequency to be daily, weekly or monthly, but not hourly. The frequency needs to be a pandas offset alias. Please refer to pandas documentation for more information: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects |
target_aggregation_function
Required
|
The function to be used to aggregate the time series target column to conform to a user specified frequency. If the target_aggregation_function is set, but the freq parameter is not set, the error is raised. The possible target aggregation functions are: "sum", "max", "min" and "mean". freq target_aggregation_function Data regularity fixing mechanism None (Default) None (Default) The aggregation is not applied.If the valid frequency can not bedetermined the error will be raised. Some Value None (Default) The aggregation is not applied.If the number of data points compliantto given frequency grid is less then 90%these points will be removed, otherwisethe error will be raised. None (Default) Aggregation function The error about missing frequency parameteris raised. Some Value Aggregation function Aggregate to frequency using providedaggregation function. |
enable_voting_ensemble
Required
|
Whether to enable/disable VotingEnsemble iteration. The default is True. For more information about ensembles, see Ensemble configuration. |
enable_stack_ensemble
Required
|
Whether to enable/disable StackEnsemble iteration. The default is None. If enable_onnx_compatible_models flag is being set, then StackEnsemble iteration will be disabled. Similarly, for Timeseries tasks, StackEnsemble iteration will be disabled by default, to avoid risks of overfitting due to small training set used in fitting the meta learner. For more information about ensembles, see Ensemble configuration. |
debug_log
Required
|
The log file to write debug information to. If not specified, 'automl.log' is used. |
training_data
Required
|
The training data to be used within the experiment.
It should contain both training features and a label column (optionally a sample weights column).
If
|
validation_data
Required
|
The validation data to be used within the experiment.
It should contain both training features and label column (optionally a sample weights column).
If
|
test_data
Required
|
The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. The test data to be used for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. If this parameter or the |
test_size
Required
|
The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. What fraction of the training data to hold out for test data for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. This should be between 0.0 and 1.0 non-inclusive.
If For regression based tasks, random sampling is used. For classification tasks, stratified sampling is used. Forecasting does not currently support specifying a test dataset using a train/test split. If this parameter or the |
label_column_name
Required
|
The name of the label column. If the input data is from a pandas.DataFrame which doesn't have column names, column indices can be used instead, expressed as integers. This parameter is applicable to |
weight_column_name
Required
|
The name of the sample weight column. Automated ML supports a weighted column as an input, causing rows in the data to be weighted up or down. If the input data is from a pandas.DataFrame which doesn't have column names, column indices can be used instead, expressed as integers. This parameter is applicable to |
cv_split_column_names
Required
|
List of names of the columns that contain custom cross validation split. Each of the CV split columns represents one CV split where each row are either marked 1 for training or 0 for validation. This parameter is applicable to Use either For more information, see Configure data splits and cross-validation in automated machine learning. |
enable_local_managed
Required
|
Disabled parameter. Local managed runs can not be enabled at this time. |
enable_dnn
Required
|
Whether to include DNN based models during model selection. The default in the init is None. However, the default is True for DNN NLP tasks, and it's False for all other AutoML tasks. |
task
Required
|
The type of task to run. Values can be 'classification', 'regression', or 'forecasting' depending on the type of automated ML problem to solve. |
path
Required
|
The full path to the Azure Machine Learning project folder. If not specified, the default is to use the current directory or ".". |
iterations
Required
|
The total number of different algorithm and parameter combinations to test during an automated ML experiment. If not specified, the default is 1000 iterations. |
primary_metric
Required
|
The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. You can use get_primary_metrics to get a list of valid metrics for your given task. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric. If not specified, accuracy is used for classification tasks, normalized root mean squared is used for forecasting and regression tasks, accuracy is used for image classification and image multi label classification, and mean average precision is used for image object detection. |
positive_label
Required
|
The positive class label that Automated Machine Learning will use to calculate binary metrics with. Binary metrics are calculated in two conditions for classification tasks:
For more information on classification, checkout metrics for classification scenarios. |
compute_target
Required
|
The Azure Machine Learning compute target to run the Automated Machine Learning experiment on. See https://docs.microsoft.com/azure/machine-learning/how-to-auto-train-remote for more information on compute targets. |
spark_context
Required
|
<xref:SparkContext>
The Spark context. Only applicable when used inside Azure Databricks/Spark environment. |
X
Required
|
The training features to use when fitting pipelines during an experiment. This setting is being deprecated. Please use training_data and label_column_name instead. |
y
Required
|
The training labels to use when fitting pipelines during an experiment. This is the value your model will predict. This setting is being deprecated. Please use training_data and label_column_name instead. |
sample_weight
Required
|
The weight to give to each training sample when running fitting pipelines, each row should correspond to a row in X and y data. Specify this parameter when specifying |
X_valid
Required
|
Validation features to use when fitting pipelines during an experiment. If specified, then |
y_valid
Required
|
Validation labels to use when fitting pipelines during an experiment. Both |
sample_weight_valid
Required
|
The weight to give to each validation sample when running scoring pipelines, each row should correspond to a row in X and y data. Specify this parameter when specifying |
cv_splits_indices
Required
|
Indices where to split training data for cross validation. Each row is a separate cross fold and within each crossfold, provide 2 numpy arrays, the first with the indices for samples to use for training data and the second with the indices to use for validation data. i.e., [[t1, v1], [t2, v2], ...] where t1 is the training indices for the first cross fold and v1 is the validation indices for the first cross fold. This option is supported when data is passed as separate Features dataset and Label column. To specify existing data as validation data, use |
validation_size
Required
|
What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive. Specify For more information, see Configure data splits and cross-validation in automated machine learning. |
n_cross_validations
Required
|
How many cross validations to perform when user validation data is not specified. Specify For more information, see Configure data splits and cross-validation in automated machine learning. |
y_min
Required
|
Minimum value of y for a regression experiment. The combination of |
y_max
Required
|
Maximum value of y for a regression experiment. The combination of |
num_classes
Required
|
The number of classes in the label data for a classification experiment. This setting is being deprecated. Instead, this value will be computed from the data. |
featurization
Required
|
'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Note: If the input data is sparse, featurization cannot be turned on. Column type is automatically detected. Based on the detected column type preprocessing/featurization is done as follows:
More details can be found in the article Configure automated ML experiments in Python. To customize featurization step, provide a FeaturizationConfig object. Customized featurization currently supports blocking a set of transformers, updating column purpose, editing transformer parameters, and dropping columns. For more information, see Customize feature engineering. Note: Timeseries features are handled separately when the task type is set to forecasting independent of this parameter. |
max_cores_per_iteration
Required
|
The maximum number of threads to use for a given training iteration. Acceptable values:
|
max_concurrent_iterations
Required
|
Represents the maximum number of iterations that would be executed in parallel. The default value is 1.
|
iteration_timeout_minutes
Required
|
Maximum time in minutes that each iteration can run for before it terminates. If not specified, a value of 1 month or 43200 minutes is used. |
mem_in_mb
Required
|
Maximum memory usage that each iteration can run for before it terminates. If not specified, a value of 1 PB or 1073741824 MB is used. |
enforce_time_on_windows
Required
|
Whether to enforce a time limit on model training at each iteration on Windows. The default is True. If running from a Python script file (.py), see the documentation for allowing resource limits on Windows. |
experiment_timeout_hours
Required
|
Maximum amount of time in hours that all iterations combined can take before the experiment terminates. Can be a decimal value like 0.25 representing 15 minutes. If not specified, the default experiment timeout is 6 days. To specify a timeout less than or equal to 1 hour, make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results. |
experiment_exit_score
Required
|
Target score for experiment. The experiment terminates after this score is reached.
If not specified (no criteria), the experiment runs until no further progress is made
on the primary metric. For for more information on exit criteria, see this >> |
enable_early_stopping
Required
|
Whether to enable early termination if the score is not improving in the short term. The default is True. Early stopping logic:
|
blocked_models
Required
|
list(str) or
list(Classification) <xref:for classification task> or
list(Regression) <xref:for regression task> or
list(Forecasting) <xref:for forecasting task>
A list of algorithms to ignore for an experiment. If |
blacklist_models
Required
|
list(str) or
list(Classification) <xref:for classification task> or
list(Regression) <xref:for regression task> or
list(Forecasting) <xref:for forecasting task>
Deprecated parameter, use blocked_models instead. |
exclude_nan_labels
Required
|
Whether to exclude rows with NaN values in the label. The default is True. |
verbosity
Required
|
The verbosity level for writing to the log file. The default is INFO or 20. Acceptable values are defined in the Python logging library. |
enable_tf
Required
|
Whether to enable/disable TensorFlow algorithms. The default is False. |
model_explainability
Required
|
Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. The default is True. For more information, see Interpretability: model explanations in automated machine learning. |
allowed_models
Required
|
list(str) or
list(Classification) <xref:for classification task> or
list(Regression) <xref:for regression task> or
list(Forecasting) <xref:for forecasting task>
A list of model names to search for an experiment. If not specified, then all models supported
for the task are used minus any specified in |
allowed_models
Required
|
A list of model names to search for an experiment. If not specified, then all models supported
for the task are used minus any specified in |
whitelist_models
Required
|
Deprecated parameter, use allowed_models instead. |
enable_onnx_compatible_models
Required
|
Whether to enable or disable enforcing the ONNX-compatible models. The default is False. For more information about Open Neural Network Exchange (ONNX) and Azure Machine Learning, see this article. |
forecasting_parameters
Required
|
An object to hold all the forecasting specific parameters. |
time_column_name
Required
|
The name of the time column. This parameter is required when forecasting to specify the datetime column in the input data used for building the time series and inferring its frequency. This setting is being deprecated. Please use forecasting_parameters instead. |
max_horizon
Required
|
The desired maximum forecast horizon in units of time-series frequency. The default value is 1. This setting is being deprecated. Please use forecasting_parameters instead. Units are based on the time interval of your training data, e.g., monthly, weekly that the forecaster should predict out. When task type is forecasting, this parameter is required. For more information on setting forecasting parameters, see Auto-train a time-series forecast model. |
grain_column_names
Required
|
The names of columns used to group a timeseries. It can be used to create multiple series. If grain is not defined, the data set is assumed to be one time-series. This parameter is used with task type forecasting. This setting is being deprecated. Please use forecasting_parameters instead. |
target_lags
Required
|
The number of past periods to lag from the target column. The default is 1. This setting is being deprecated. Please use forecasting_parameters instead. When forecasting, this parameter represents the number of rows to lag the target values based on the frequency of the data. This is represented as a list or single integer. Lag should be used when the relationship between the independent variables and dependant variable do not match up or correlate by default. For example, when trying to forecast demand for a product, the demand in any month may depend on the price of specific commodities 3 months prior. In this example, you may want to lag the target (demand) negatively by 3 months so that the model is training on the correct relationship. For more information, see Auto-train a time-series forecast model. |
feature_lags
Required
|
Flag for generating lags for the numeric features. This setting is being deprecated. Please use forecasting_parameters instead. |
target_rolling_window_size
Required
|
The number of past periods used to create a rolling window average of the target column. This setting is being deprecated. Please use forecasting_parameters instead. When forecasting, this parameter represents n historical periods to use to generate forecasted values, <= training set size. If omitted, n is the full training set size. Specify this parameter when you only want to consider a certain amount of history when training the model. |
country_or_region
Required
|
The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes, for example 'US' or 'GB'. This setting is being deprecated. Please use forecasting_parameters instead. |
use_stl
Required
|
Configure STL Decomposition of the time-series target column. use_stl can take three values: None (default) - no stl decomposition, 'season' - only generate season component and season_trend - generate both season and trend components. This setting is being deprecated. Please use forecasting_parameters instead. |
seasonality
Required
|
Set time series seasonality. If seasonality is set to -1, it will be inferred. If use_stl is not set, this parameter will not be used. This setting is being deprecated. Please use forecasting_parameters instead. |
short_series_handling_configuration
Required
|
The parameter defining how if AutoML should handle short time series. Possible values: 'auto' (default), 'pad', 'drop' and None.
Date numeric_value string target 2020-01-01 23 green 55 Output assuming minimal number of values is four: +————+—————+———-+——–+ | Date | numeric_value | string | target | +============+===============+==========+========+ | 2019-12-29 | 0 | NA | 55.1 | +————+—————+———-+——–+ | 2019-12-30 | 0 | NA | 55.6 | +————+—————+———-+——–+ | 2019-12-31 | 0 | NA | 54.5 | +————+—————+———-+——–+ | 2020-01-01 | 23 | green | 55 | +————+—————+———-+——–+ Note: We have two parameters short_series_handling_configuration and legacy short_series_handling. When both parameters are set we are synchronize them as shown in the table below (short_series_handling_configuration and short_series_handling for brevity are marked as handling_configuration and handling respectively). handling handling_configuration resulting handling resulting handling_configuration True auto True auto True pad True auto True drop True auto True None False None False auto False None False pad False None False drop False None False None False None |
freq
Required
|
Forecast frequency. When forecasting, this parameter represents the period with which the forecast is desired, for example daily, weekly, yearly, etc. The forecast frequency is dataset frequency by default. You can optionally set it to greater (but not lesser) than dataset frequency. We'll aggregate the data and generate the results at forecast frequency. For example, for daily data, you can set the frequency to be daily, weekly or monthly, but not hourly. The frequency needs to be a pandas offset alias. Please refer to pandas documentation for more information: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects |
target_aggregation_function
Required
|
The function to be used to aggregate the time series target column to conform to a user specified frequency. If the target_aggregation_function is set, but the freq parameter is not set, the error is raised. The possible target aggregation functions are: "sum", "max", "min" and "mean". freq target_aggregation_function Data regularity fixing mechanism None (Default) None (Default) The aggregation is not applied.If the valid frequency can not bedetermined the error will be raised. Some Value None (Default) The aggregation is not applied.If the number of data points compliantto given frequency grid is less then 90%these points will be removed, otherwisethe error will be raised. None (Default) Aggregation function The error about missing frequency parameteris raised. Some Value Aggregation function Aggregate to frequency using providedaggregation function. |
enable_voting_ensemble
Required
|
Whether to enable/disable VotingEnsemble iteration. The default is True. For more information about ensembles, see Ensemble configuration. |
enable_stack_ensemble
Required
|
Whether to enable/disable StackEnsemble iteration. The default is None. If enable_onnx_compatible_models flag is being set, then StackEnsemble iteration will be disabled. Similarly, for Timeseries tasks, StackEnsemble iteration will be disabled by default, to avoid risks of overfitting due to small training set used in fitting the meta learner. For more information about ensembles, see Ensemble configuration. |
debug_log
Required
|
The log file to write debug information to. If not specified, 'automl.log' is used. |
training_data
Required
|
The training data to be used within the experiment.
It should contain both training features and a label column (optionally a sample weights column).
If
|
validation_data
Required
|
The validation data to be used within the experiment.
It should contain both training features and label column (optionally a sample weights column).
If
|
test_data
Required
|
The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. The test data to be used for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. If this parameter or the |
test_size
Required
|
The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. What fraction of the training data to hold out for test data for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions. This should be between 0.0 and 1.0 non-inclusive.
If For regression based tasks, random sampling is used. For classification tasks, stratified sampling is used. Forecasting does not currently support specifying a test dataset using a train/test split. If this parameter or the |
label_column_name
Required
|
The name of the label column. If the input data is from a pandas.DataFrame which doesn't have column names, column indices can be used instead, expressed as integers. This parameter is applicable to |
weight_column_name
Required
|
The name of the sample weight column. Automated ML supports a weighted column as an input, causing rows in the data to be weighted up or down. If the input data is from a pandas.DataFrame which doesn't have column names, column indices can be used instead, expressed as integers. This parameter is applicable to |
cv_split_column_names
Required
|
List of names of the columns that contain custom cross validation split. Each of the CV split columns represents one CV split where each row are either marked 1 for training or 0 for validation. This parameter is applicable to Use either For more information, see Configure data splits and cross-validation in automated machine learning. |
enable_local_managed
Required
|
Disabled parameter. Local managed runs can not be enabled at this time. |
enable_dnn
Required
|
Whether to include DNN based models during model selection. The default in the init is None. However, the default is True for DNN NLP tasks, and it's False for all other AutoML tasks. |
Remarks
The following code shows a basic example of creating an AutoMLConfig object and submitting an experiment for regression:
automl_settings = {
"n_cross_validations": 3,
"primary_metric": 'r2_score',
"enable_early_stopping": True,
"experiment_timeout_hours": 1.0,
"max_concurrent_iterations": 4,
"max_cores_per_iteration": -1,
"verbosity": logging.INFO,
}
automl_config = AutoMLConfig(task = 'regression',
compute_target = compute_target,
training_data = train_data,
label_column_name = label,
**automl_settings
)
ws = Workspace.from_config()
experiment = Experiment(ws, "your-experiment-name")
run = experiment.submit(automl_config, show_output=True)
A full sample is available at Regression
Examples of using AutoMLConfig for forecasting are in these notebooks:
Examples of using AutoMLConfig for all task types can be found in these automated ML notebooks.
For background on automated ML, see the articles:
Configure automated ML experiments in Python. In this article, there is information about the different algorithms and primary metrics used for each task type.
Auto-train a time-series forecast model. In this article, there is information about which constructor parameters and
**kwargs
are used in forecasting.
For more information about different options for configuring training/validation data splits and cross-validation for your automated machine learning, AutoML, experiments, see Configure data splits and cross-validation in automated machine learning.
Methods
as_serializable_dict |
Convert the object into dictionary. |
get_supported_dataset_languages |
Get supported languages and their corresponding language codes in ISO 639-3. |
as_serializable_dict
Convert the object into dictionary.
as_serializable_dict() -> Dict[str, Any]
get_supported_dataset_languages
Get supported languages and their corresponding language codes in ISO 639-3.
get_supported_dataset_languages(use_gpu: bool) -> Dict[Any, Any]
Parameters
Name | Description |
---|---|
cls
Required
|
Class object of AutoMLConfig. |
use_gpu
Required
|
boolean indicating whether gpu compute is being used or not. |
Returns
Type | Description |
---|---|
dictionary of format {: }. Language code adheres to ISO 639-3 standard, please refer to https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes |