Sdílet prostřednictvím


RegressionJob Class

Configuration for AutoML Regression Job.

Initialize a new AutoML Regression task.

Inheritance
azure.ai.ml.entities._job.automl.tabular.automl_tabular.AutoMLTabular
RegressionJob

Constructor

RegressionJob(*, primary_metric: str | None = None, **kwargs: Any)

Parameters

Name Description
primary_metric
Required
str

The primary metric to use for optimization

kwargs
Required

Job-specific arguments

Keyword-Only Parameters

Name Description
primary_metric
Required

Methods

dump

Dumps the job content into a file in YAML format.

set_data

Define data configuration.

set_featurization

Define feature engineering configuration.

set_limits

Set limits for the job.

set_training

The method to configure training related settings.

dump

Dumps the job content into a file in YAML format.

dump(dest: str | PathLike | IO, **kwargs: Any) -> None

Parameters

Name Description
dest
Required
Union[<xref:PathLike>, str, IO[AnyStr]]

The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_data

Define data configuration.

set_data(*, training_data: Input, target_column_name: str, weight_column_name: str | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, n_cross_validations: str | int | None = None, cv_split_column_names: List[str] | None = None, test_data: Input | None = None, test_data_size: float | None = None) -> None

Keyword-Only Parameters

Name Description
training_data

Training data.

target_column_name
str

Column name of the target column.

weight_column_name

Weight column name, defaults to None

validation_data

Validation data, defaults to None

validation_data_size

Validation data size, defaults to None

n_cross_validations

n_cross_validations, defaults to None

cv_split_column_names

cv_split_column_names, defaults to None

test_data

Test data, defaults to None

test_data_size

Test data size, defaults to None

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_featurization

Define feature engineering configuration.

set_featurization(*, blocked_transformers: List[BlockedTransformers | str] | None = None, column_name_and_types: Dict[str, str] | None = None, dataset_language: str | None = None, transformer_params: Dict[str, List[ColumnTransformer]] | None = None, mode: str | None = None, enable_dnn_featurization: bool | None = None) -> None

Keyword-Only Parameters

Name Description
blocked_transformers

A list of transformer names to be blocked during featurization, defaults to None

column_name_and_types

A dictionary of column names and feature types used to update column purpose , defaults to None

dataset_language

Three character ISO 639-3 code for the language(s) contained in the dataset. Languages other than English are only supported if you use GPU-enabled compute. The language_code 'mul' should be used if the dataset contains multiple languages. To find ISO 639-3 codes for different languages, please refer to https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes, defaults to None

transformer_params

A dictionary of transformer and corresponding customization parameters , defaults to None

mode

"off", "auto", defaults to "auto", defaults to None

enable_dnn_featurization

Whether to include DNN based feature engineering methods, defaults to None

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_limits

Set limits for the job.

set_limits(*, enable_early_termination: bool | None = None, exit_score: float | None = None, max_concurrent_trials: int | None = None, max_cores_per_trial: int | None = None, max_nodes: int | None = None, max_trials: int | None = None, timeout_minutes: int | None = None, trial_timeout_minutes: int | None = None) -> None

Keyword-Only Parameters

Name Description
enable_early_termination

Whether to enable early termination if the score is not improving in the short term, defaults to None.

Early stopping logic:

  • No early stopping for first 20 iterations (landmarks).

  • Early stopping window starts on the 21st iteration and looks for early_stopping_n_iters iterations

    (currently set to 10). This means that the first iteration where stopping can occur is the 31st.

  • AutoML still schedules 2 ensemble iterations AFTER early stopping, which might result in higher scores.

  • Early stopping is triggered if the absolute value of best score calculated is the same for past

    early_stopping_n_iters iterations, that is, if there is no improvement in score for early_stopping_n_iters iterations.

exit_score

Target score for experiment. The experiment terminates after this score is reached. If not specified (no criteria), the experiment runs until no further progress is made on the primary metric. For for more information on exit criteria, see this article , defaults to None

max_concurrent_trials

This is the maximum number of iterations that would be executed in parallel. The default value is 1.

  • AmlCompute clusters support one iteration running per node. For multiple AutoML experiment parent runs

    executed in parallel on a single AmlCompute cluster, the sum of the max_concurrent_trials values for all experiments should be less than or equal to the maximum number of nodes. Otherwise, runs will be queued until nodes are available.

  • DSVM supports multiple iterations per node. max_concurrent_trials should

    be less than or equal to the number of cores on the DSVM. For multiple experiments run in parallel on a single DSVM, the sum of the max_concurrent_trials values for all experiments should be less than or equal to the maximum number of nodes.

  • Databricks - max_concurrent_trials should be less than or equal to the number of

    worker nodes on Databricks.

max_concurrent_trials does not apply to local runs. Formerly, this parameter was named concurrent_iterations.

max_cores_per_trial

The maximum number of threads to use for a given training iteration. Acceptable values:

  • Greater than 1 and less than or equal to the maximum number of cores on the compute target.

  • Equal to -1, which means to use all the possible cores per iteration per child-run.

  • Equal to 1, the default.

max_nodes

[Experimental] The maximum number of nodes to use for distributed training.

  • For forecasting, each model is trained using max(2, int(max_nodes / max_concurrent_trials)) nodes.

  • For classification/regression, each model is trained using max_nodes nodes.

Note- This parameter is in public preview and might change in future.

max_trials

The total number of different algorithm and parameter combinations to test during an automated ML experiment. If not specified, the default is 1000 iterations.

timeout_minutes

Maximum amount of time in minutes that all iterations combined can take before the experiment terminates. If not specified, the default experiment timeout is 6 days. To specify a timeout less than or equal to 1 hour, make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results, defaults to None

trial_timeout_minutes

Maximum time in minutes that each iteration can run for before it terminates. If not specified, a value of 1 month or 43200 minutes is used, defaults to None

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_training

The method to configure training related settings.

set_training(*, enable_onnx_compatible_models: bool | None = None, enable_dnn_training: bool | None = None, enable_model_explainability: bool | None = None, enable_stack_ensemble: bool | None = None, enable_vote_ensemble: bool | None = None, stack_ensemble_settings: StackEnsembleSettings | None = None, ensemble_model_download_timeout: int | None = None, allowed_training_algorithms: List[str] | None = None, blocked_training_algorithms: List[str] | None = None, training_mode: str | TrainingMode | None = None) -> None

Keyword-Only Parameters

Name Description
enable_onnx_compatible_models

Whether to enable or disable enforcing the ONNX-compatible models. The default is False. For more information about Open Neural Network Exchange (ONNX) and Azure Machine Learning,see this article.

enable_dnn_training

Whether to include DNN based models during model selection. However, the default is True for DNN NLP tasks, and it's False for all other AutoML tasks.

enable_model_explainability

Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. For more information, see Interpretability: model explanations in automated machine learning. , defaults to None

enable_stack_ensemble

Whether to enable/disable StackEnsemble iteration. If enable_onnx_compatible_models flag is being set, then StackEnsemble iteration will be disabled. Similarly, for Timeseries tasks, StackEnsemble iteration will be disabled by default, to avoid risks of overfitting due to small training set used in fitting the meta learner. For more information about ensembles, see Ensemble configuration , defaults to None

enable_vote_ensemble

Whether to enable/disable VotingEnsemble iteration. For more information about ensembles, see Ensemble configuration , defaults to None

stack_ensemble_settings

Settings for StackEnsemble iteration, defaults to None

ensemble_model_download_timeout

During VotingEnsemble and StackEnsemble model generation, multiple fitted models from the previous child runs are downloaded. Configure this parameter with a higher value than 300 secs, if more time is needed, defaults to None

allowed_training_algorithms

A list of model names to search for an experiment. If not specified, then all models supported for the task are used minus any specified in blocked_training_algorithms or deprecated TensorFlow models, defaults to None

blocked_training_algorithms

A list of algorithms to ignore for an experiment, defaults to None

training_mode

[Experimental] The training mode to use. The possible values are-

  • distributed- enables distributed training for supported algorithms.

  • non_distributed- disables distributed training.

  • auto- Currently, it is same as non_distributed. In future, this might change.

Note: This parameter is in public preview and may change in future.

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

Attributes

base_path

The base path of the resource.

Returns

Type Description
str

The base path of the resource.

creation_context

The creation context of the resource.

Returns

Type Description

The creation metadata for the resource.

featurization

Get the tabular featurization settings for the AutoML job.

Returns

Type Description

Tabular featurization settings for the AutoML job

id

The resource ID.

Returns

Type Description

The global ID of the resource, an Azure Resource Manager (ARM) ID.

inputs

limits

Get the tabular limits for the AutoML job.

Returns

Type Description

Tabular limits for the AutoML job

log_files

Job output files.

Returns

Type Description

The dictionary of log names and URLs.

log_verbosity

Get the log verbosity for the AutoML job.

Returns

Type Description
<xref:LogVerbosity>

log verbosity for the AutoML job

outputs

primary_metric

status

The status of the job.

Common values returned include "Running", "Completed", and "Failed". All possible values are:

  • NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.

  • Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.

  • Provisioning - On-demand compute is being created for a given job submission.

  • Preparing - The run environment is being prepared and is in one of two stages:

    • Docker image build

    • conda environment setup

  • Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state

    while waiting for all the requested nodes to be ready.

  • Running - The job has started to run on the compute target.

  • Finalizing - User code execution has completed, and the run is in post-processing stages.

  • CancelRequested - Cancellation has been requested for the job.

  • Completed - The run has completed successfully. This includes both the user code execution and run

    post-processing stages.

  • Failed - The run failed. Usually the Error property on a run will provide details as to why.

  • Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.

  • NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.

Returns

Type Description

Status of the job.

studio_url

Azure ML studio endpoint.

Returns

Type Description

The URL to the job details page.

task_type

Get task type.

Returns

Type Description
str

The type of task to run. Possible values include: "classification", "regression", "forecasting".

test_data

Get test data.

Returns

Type Description

Test data input

training

training_data

Get training data.

Returns

Type Description

Training data input

type

The type of the job.

Returns

Type Description

The type of the job.

validation_data

Get validation data.

Returns

Type Description

Validation data input