ClassificationJob Class
Configuration for AutoML Classification Job.
Initialize a new AutoML Classification task.
Constructor
ClassificationJob(*, primary_metric: str | None = None, positive_label: str | None = None, **kwargs: Any)
Keyword-Only Parameters
Name | Description |
---|---|
primary_metric
|
The primary metric to use for optimization, defaults to None Default value: None
|
positive_label
|
Positive label for binary metrics calculation, defaults to None Default value: None
|
featurization
|
Featurization settings. Defaults to None. |
limits
|
Limits settings. Defaults to None. |
training
|
Training settings. Defaults to None. |
primary_metric
|
The primary metric to use for optimization, defaults to None |
positive_label
|
Positive label for binary metrics calculation, defaults to None |
featurization
|
featurization settings. Defaults to None. |
limits
|
limits settings. Defaults to None. |
training
|
training settings. Defaults to None. |
Methods
dump |
Dumps the job content into a file in YAML format. |
set_data |
Define data configuration. |
set_featurization |
Define feature engineering configuration. |
set_limits |
Set limits for the job. |
set_training |
The method to configure training related settings. |
dump
Dumps the job content into a file in YAML format.
dump(dest: str | PathLike | IO, **kwargs: Any) -> None
Parameters
Name | Description |
---|---|
dest
Required
|
The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly. |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
set_data
Define data configuration.
set_data(*, training_data: Input, target_column_name: str, weight_column_name: str | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, n_cross_validations: str | int | None = None, cv_split_column_names: List[str] | None = None, test_data: Input | None = None, test_data_size: float | None = None) -> None
Keyword-Only Parameters
Name | Description |
---|---|
training_data
|
Training data. |
target_column_name
|
Column name of the target column. |
weight_column_name
|
Weight column name, defaults to None Default value: None
|
validation_data
|
Validation data, defaults to None Default value: None
|
validation_data_size
|
Validation data size, defaults to None Default value: None
|
n_cross_validations
|
n_cross_validations, defaults to None Default value: None
|
cv_split_column_names
|
cv_split_column_names, defaults to None Default value: None
|
test_data
|
Test data, defaults to None Default value: None
|
test_data_size
|
Test data size, defaults to None Default value: None
|
set_featurization
Define feature engineering configuration.
set_featurization(*, blocked_transformers: List[BlockedTransformers | str] | None = None, column_name_and_types: Dict[str, str] | None = None, dataset_language: str | None = None, transformer_params: Dict[str, List[ColumnTransformer]] | None = None, mode: str | None = None, enable_dnn_featurization: bool | None = None) -> None
Keyword-Only Parameters
Name | Description |
---|---|
blocked_transformers
|
A list of transformer names to be blocked during featurization, defaults to None Default value: None
|
column_name_and_types
|
A dictionary of column names and feature types used to update column purpose , defaults to None Default value: None
|
dataset_language
|
Three character ISO 639-3 code for the language(s) contained in the dataset. Languages other than English are only supported if you use GPU-enabled compute. The language_code 'mul' should be used if the dataset contains multiple languages. To find ISO 639-3 codes for different languages, please refer to https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes, defaults to None Default value: None
|
transformer_params
|
A dictionary of transformer and corresponding customization parameters , defaults to None Default value: None
|
mode
|
"off", "auto", defaults to "auto", defaults to None Default value: None
|
enable_dnn_featurization
|
Whether to include DNN based feature engineering methods, defaults to None Default value: None
|
set_limits
Set limits for the job.
set_limits(*, enable_early_termination: bool | None = None, exit_score: float | None = None, max_concurrent_trials: int | None = None, max_cores_per_trial: int | None = None, max_nodes: int | None = None, max_trials: int | None = None, timeout_minutes: int | None = None, trial_timeout_minutes: int | None = None) -> None
Keyword-Only Parameters
Name | Description |
---|---|
enable_early_termination
|
Whether to enable early termination if the score is not improving in the short term, defaults to None. Early stopping logic:
Default value: None
|
exit_score
|
Target score for experiment. The experiment terminates after this score is reached. If not specified (no criteria), the experiment runs until no further progress is made on the primary metric. For for more information on exit criteria, see this article , defaults to None Default value: None
|
max_concurrent_trials
|
This is the maximum number of iterations that would be executed in parallel. The default value is 1.
Default value: None
|
max_cores_per_trial
|
The maximum number of threads to use for a given training iteration. Acceptable values:
Default value: None
|
max_nodes
|
[Experimental] The maximum number of nodes to use for distributed training.
Note- This parameter is in public preview and might change in future. Default value: None
|
max_trials
|
The total number of different algorithm and parameter combinations to test during an automated ML experiment. If not specified, the default is 1000 iterations. Default value: None
|
timeout_minutes
|
Maximum amount of time in minutes that all iterations combined can take before the experiment terminates. If not specified, the default experiment timeout is 6 days. To specify a timeout less than or equal to 1 hour, make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results, defaults to None Default value: None
|
trial_timeout_minutes
|
Maximum time in minutes that each iteration can run for before it terminates. If not specified, a value of 1 month or 43200 minutes is used, defaults to None Default value: None
|
set_training
The method to configure training related settings.
set_training(*, enable_onnx_compatible_models: bool | None = None, enable_dnn_training: bool | None = None, enable_model_explainability: bool | None = None, enable_stack_ensemble: bool | None = None, enable_vote_ensemble: bool | None = None, stack_ensemble_settings: StackEnsembleSettings | None = None, ensemble_model_download_timeout: int | None = None, allowed_training_algorithms: List[str] | None = None, blocked_training_algorithms: List[str] | None = None, training_mode: str | TrainingMode | None = None) -> None
Keyword-Only Parameters
Name | Description |
---|---|
enable_onnx_compatible_models
|
Whether to enable or disable enforcing the ONNX-compatible models. The default is False. For more information about Open Neural Network Exchange (ONNX) and Azure Machine Learning,see this article. Default value: None
|
enable_dnn_training
|
Whether to include DNN based models during model selection. However, the default is True for DNN NLP tasks, and it's False for all other AutoML tasks. Default value: None
|
enable_model_explainability
|
Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. For more information, see Interpretability: model explanations in automated machine learning. , defaults to None Default value: None
|
enable_stack_ensemble
|
Whether to enable/disable StackEnsemble iteration. If enable_onnx_compatible_models flag is being set, then StackEnsemble iteration will be disabled. Similarly, for Timeseries tasks, StackEnsemble iteration will be disabled by default, to avoid risks of overfitting due to small training set used in fitting the meta learner. For more information about ensembles, see Ensemble configuration , defaults to None Default value: None
|
enable_vote_ensemble
|
Whether to enable/disable VotingEnsemble iteration. For more information about ensembles, see Ensemble configuration , defaults to None Default value: None
|
stack_ensemble_settings
|
Settings for StackEnsemble iteration, defaults to None Default value: None
|
ensemble_model_download_timeout
|
During VotingEnsemble and StackEnsemble model generation, multiple fitted models from the previous child runs are downloaded. Configure this parameter with a higher value than 300 secs, if more time is needed, defaults to None Default value: None
|
allowed_training_algorithms
|
A list of model names to search for an experiment. If not specified,
then all models supported for the task are used minus any specified in Default value: None
|
blocked_training_algorithms
|
A list of algorithms to ignore for an experiment, defaults to None Default value: None
|
training_mode
|
[Experimental] The training mode to use. The possible values are-
Note: This parameter is in public preview and may change in future. Default value: None
|
Attributes
base_path
creation_context
The creation context of the resource.
Returns
Type | Description |
---|---|
The creation metadata for the resource. |
featurization
Get the tabular featurization settings for the AutoML job.
Returns
Type | Description |
---|---|
Tabular featurization settings for the AutoML job |
id
inputs
limits
Get the tabular limits for the AutoML job.
Returns
Type | Description |
---|---|
Tabular limits for the AutoML job |
log_files
log_verbosity
Get the log verbosity for the AutoML job.
Returns
Type | Description |
---|---|
<xref:LogVerbosity>
|
log verbosity for the AutoML job |
outputs
primary_metric
The primary metric to use for optimization.
Returns
Type | Description |
---|---|
The primary metric to use for optimization. |
status
The status of the job.
Common values returned include "Running", "Completed", and "Failed". All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
Returns
Type | Description |
---|---|
Status of the job. |
studio_url
task_type
Get task type.
Returns
Type | Description |
---|---|
The type of task to run. Possible values include: "classification", "regression", "forecasting". |
test_data
training
Training Settings for AutoML Classification Job.
Returns
Type | Description |
---|---|
<xref:ClassificationTrainingSettings>
|
Training settings used for AutoML Classification Job. |