TextClassificationJob Class
Configuration for AutoML Text Classification Job.
- Inheritance
-
azure.ai.ml.entities._job.automl.nlp.automl_nlp_job.AutoMLNLPJobTextClassificationJob
Constructor
TextClassificationJob(*, target_column_name: str | None = None, training_data: Input | None = None, validation_data: Input | None = None, primary_metric: ClassificationPrimaryMetrics | None = None, log_verbosity: str | None = None, **kwargs: Any)
Parameters
Name | Description |
---|---|
target_column_name
Required
|
The name of the target column, defaults to None |
training_data
Required
|
Training data to be used for training, defaults to None |
validation_data
Required
|
Validation data to be used for evaluating the trained model, defaults to None |
primary_metric
Required
|
The primary metric to be displayed, defaults to None |
log_verbosity
Required
|
Log verbosity level, defaults to None |
Keyword-Only Parameters
Name | Description |
---|---|
target_column_name
Required
|
|
training_data
Required
|
|
validation_data
Required
|
|
primary_metric
Required
|
|
log_verbosity
Required
|
|
Examples
creating an automl text classification job
from azure.ai.ml import automl, Input
from azure.ai.ml.constants import AssetTypes
text_classification_job = automl.TextClassificationJob(
experiment_name="my_experiment",
compute="my_compute",
training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
target_column_name="terms",
tags={"my_custom_tag": "My custom value"},
)
Methods
dump |
Dumps the job content into a file in YAML format. |
extend_search_space |
Add (a) search space(s) for an AutoML NLP job. |
set_data |
Define data configuration for NLP job |
set_featurization |
Define featurization configuration for AutoML NLP job. |
set_limits |
Define limit configuration for AutoML NLP job |
set_sweep |
Define sweep configuration for AutoML NLP job |
set_training_parameters |
Fix certain training parameters throughout the training procedure for all candidates. |
dump
Dumps the job content into a file in YAML format.
dump(dest: str | PathLike | IO, **kwargs: Any) -> None
Parameters
Name | Description |
---|---|
dest
Required
|
The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly. |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
extend_search_space
Add (a) search space(s) for an AutoML NLP job.
extend_search_space(value: SearchSpace | List[SearchSpace]) -> None
Parameters
Name | Description |
---|---|
value
Required
|
either a SearchSpace object or a list of SearchSpace objects with nlp-specific parameters. |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
set_data
Define data configuration for NLP job
set_data(*, training_data: Input, target_column_name: str, validation_data: Input) -> None
Keyword-Only Parameters
Name | Description |
---|---|
training_data
|
Training data |
target_column_name
|
Column name of the target column. |
validation_data
|
Validation data |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
set_featurization
Define featurization configuration for AutoML NLP job.
set_featurization(*, dataset_language: str | None = None) -> None
Keyword-Only Parameters
Name | Description |
---|---|
dataset_language
|
Language of the dataset, defaults to None |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
set_limits
Define limit configuration for AutoML NLP job
set_limits(*, max_trials: int = 1, max_concurrent_trials: int = 1, max_nodes: int = 1, timeout_minutes: int | None = None, trial_timeout_minutes: int | None = None) -> None
Keyword-Only Parameters
Name | Description |
---|---|
max_trials
|
Maximum number of AutoML iterations, defaults to 1 Default value: 1
|
max_concurrent_trials
|
Maximum number of concurrent AutoML iterations, defaults to 1 Default value: 1
|
max_nodes
|
Maximum number of nodes used for sweep, defaults to 1 Default value: 1
|
timeout_minutes
|
Timeout for the AutoML job, defaults to None |
trial_timeout_minutes
|
Timeout for each AutoML trial, defaults to None |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
set_sweep
Define sweep configuration for AutoML NLP job
set_sweep(*, sampling_algorithm: str | SamplingAlgorithmType, early_termination: EarlyTerminationPolicy | None = None) -> None
Keyword-Only Parameters
Name | Description |
---|---|
sampling_algorithm
|
Required. Specifies type of hyperparameter sampling algorithm. Possible values include: "Grid", "Random", and "Bayesian". |
early_termination
|
Optional. early termination policy to end poorly performing training candidates, defaults to None. |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
set_training_parameters
Fix certain training parameters throughout the training procedure for all candidates.
set_training_parameters(*, gradient_accumulation_steps: int | None = None, learning_rate: float | None = None, learning_rate_scheduler: str | NlpLearningRateScheduler | None = None, model_name: str | None = None, number_of_epochs: int | None = None, training_batch_size: int | None = None, validation_batch_size: int | None = None, warmup_ratio: float | None = None, weight_decay: float | None = None) -> None
Keyword-Only Parameters
Name | Description |
---|---|
gradient_accumulation_steps
|
number of steps over which to accumulate gradients before a backward pass. This must be a positive integer., defaults to None |
learning_rate
|
initial learning rate. Must be a float in (0, 1)., defaults to None |
learning_rate_scheduler
|
the type of learning rate scheduler. Must choose from 'linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', and 'constant_with_warmup'., defaults to None |
model_name
|
the model name to use during training. Must choose from 'bert-base-cased', 'bert-base-uncased', 'bert-base-multilingual-cased', 'bert-base-german-cased', 'bert-large-cased', 'bert-large-uncased', 'distilbert-base-cased', 'distilbert-base-uncased', 'roberta-base', 'roberta-large', 'distilroberta-base', 'xlm-roberta-base', 'xlm-roberta-large', xlnet-base-cased', and 'xlnet-large-cased'., defaults to None |
number_of_epochs
|
the number of epochs to train with. Must be a positive integer., defaults to None |
training_batch_size
|
the batch size during training. Must be a positive integer., defaults to None |
validation_batch_size
|
the batch size during validation. Must be a positive integer., defaults to None |
warmup_ratio
|
ratio of total training steps used for a linear warmup from 0 to learning_rate. Must be a float in [0, 1]., defaults to None |
weight_decay
|
value of weight decay when optimizer is sgd, adam, or adamw. This must be a float in the range [0, 1]., defaults to None |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
Attributes
base_path
creation_context
The creation context of the resource.
Returns
Type | Description |
---|---|
The creation metadata for the resource. |
featurization
Featurization settings used for NLP job
Returns
Type | Description |
---|---|
featurization settings |
id
The resource ID.
Returns
Type | Description |
---|---|
The global ID of the resource, an Azure Resource Manager (ARM) ID. |
inputs
limits
Limit settings for NLP jobs
Returns
Type | Description |
---|---|
limit configuration for NLP job |
log_files
Job output files.
Returns
Type | Description |
---|---|
The dictionary of log names and URLs. |
log_verbosity
Log verbosity configuration
Returns
Type | Description |
---|---|
the degree of verbosity used in logging |
outputs
primary_metric
search_space
Search space(s) to sweep over for NLP sweep jobs
Returns
Type | Description |
---|---|
list of search spaces to sweep over for NLP jobs |
status
The status of the job.
Common values returned include "Running", "Completed", and "Failed". All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
Returns
Type | Description |
---|---|
Status of the job. |
studio_url
sweep
task_type
Get task type.
Returns
Type | Description |
---|---|
The type of task to run. Possible values include: "classification", "regression", "forecasting". |
test_data
training_data
training_parameters
Parameters that are used for all submitted jobs.
Returns
Type | Description |
---|---|
fixed training parameters for NLP jobs |
type
validation_data
Azure SDK for Python