TextNerJob Class

Configuration for AutoML Text NER Job.

Inheritance
azure.ai.ml.entities._job.automl.nlp.automl_nlp_job.AutoMLNLPJob
TextNerJob

Constructor

TextNerJob(*, training_data: Input | None = None, validation_data: Input | None = None, primary_metric: str | None = None, log_verbosity: str | None = None, **kwargs: Any)

Parameters

Name Description
training_data
Required

Training data to be used for training, defaults to None

validation_data
Required

Validation data to be used for evaluating the trained model, defaults to None

primary_metric
Required

The primary metric to be displayed, defaults to None

log_verbosity
Required

Log verbosity level, defaults to None

Keyword-Only Parameters

Name Description
training_data
Required
validation_data
Required
primary_metric
Required
log_verbosity
Required

Examples

creating an automl text ner job


   from azure.ai.ml import automl, Input
   from azure.ai.ml.constants import AssetTypes

   text_ner_job = automl.TextNerJob(
       experiment_name="my_experiment",
       compute="my_compute",
       training_data=Input(type=AssetTypes.MLTABLE, path="./training-mltable-folder"),
       validation_data=Input(type=AssetTypes.MLTABLE, path="./validation-mltable-folder"),
       tags={"my_custom_tag": "My custom value"},
   )

Methods

dump

Dumps the job content into a file in YAML format.

extend_search_space

Add (a) search space(s) for an AutoML NLP job.

set_data

Define data configuration for NLP job

set_featurization

Define featurization configuration for AutoML NLP job.

set_limits

Define limit configuration for AutoML NLP job

set_sweep

Define sweep configuration for AutoML NLP job

set_training_parameters

Fix certain training parameters throughout the training procedure for all candidates.

dump

Dumps the job content into a file in YAML format.

dump(dest: str | PathLike | IO, **kwargs: Any) -> None

Parameters

Name Description
dest
Required
Union[<xref:PathLike>, str, IO[AnyStr]]

The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

extend_search_space

Add (a) search space(s) for an AutoML NLP job.

extend_search_space(value: SearchSpace | List[SearchSpace]) -> None

Parameters

Name Description
value
Required

either a SearchSpace object or a list of SearchSpace objects with nlp-specific parameters.

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_data

Define data configuration for NLP job

set_data(*, training_data: Input, target_column_name: str, validation_data: Input) -> None

Keyword-Only Parameters

Name Description
training_data

Training data

target_column_name

Column name of the target column.

validation_data

Validation data

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_featurization

Define featurization configuration for AutoML NLP job.

set_featurization(*, dataset_language: str | None = None) -> None

Keyword-Only Parameters

Name Description
dataset_language

Language of the dataset, defaults to None

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_limits

Define limit configuration for AutoML NLP job

set_limits(*, max_trials: int = 1, max_concurrent_trials: int = 1, max_nodes: int = 1, timeout_minutes: int | None = None, trial_timeout_minutes: int | None = None) -> None

Keyword-Only Parameters

Name Description
max_trials

Maximum number of AutoML iterations, defaults to 1

Default value: 1
max_concurrent_trials

Maximum number of concurrent AutoML iterations, defaults to 1

Default value: 1
max_nodes

Maximum number of nodes used for sweep, defaults to 1

Default value: 1
timeout_minutes

Timeout for the AutoML job, defaults to None

trial_timeout_minutes

Timeout for each AutoML trial, defaults to None

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_sweep

Define sweep configuration for AutoML NLP job

set_sweep(*, sampling_algorithm: str | SamplingAlgorithmType, early_termination: EarlyTerminationPolicy | None = None) -> None

Keyword-Only Parameters

Name Description
sampling_algorithm

Required. Specifies type of hyperparameter sampling algorithm. Possible values include: "Grid", "Random", and "Bayesian".

early_termination

Optional. early termination policy to end poorly performing training candidates, defaults to None.

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

set_training_parameters

Fix certain training parameters throughout the training procedure for all candidates.

set_training_parameters(*, gradient_accumulation_steps: int | None = None, learning_rate: float | None = None, learning_rate_scheduler: str | NlpLearningRateScheduler | None = None, model_name: str | None = None, number_of_epochs: int | None = None, training_batch_size: int | None = None, validation_batch_size: int | None = None, warmup_ratio: float | None = None, weight_decay: float | None = None) -> None

Keyword-Only Parameters

Name Description
gradient_accumulation_steps

number of steps over which to accumulate gradients before a backward pass. This must be a positive integer., defaults to None

learning_rate

initial learning rate. Must be a float in (0, 1)., defaults to None

learning_rate_scheduler

the type of learning rate scheduler. Must choose from 'linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', and 'constant_with_warmup'., defaults to None

model_name

the model name to use during training. Must choose from 'bert-base-cased', 'bert-base-uncased', 'bert-base-multilingual-cased', 'bert-base-german-cased', 'bert-large-cased', 'bert-large-uncased', 'distilbert-base-cased', 'distilbert-base-uncased', 'roberta-base', 'roberta-large', 'distilroberta-base', 'xlm-roberta-base', 'xlm-roberta-large', xlnet-base-cased', and 'xlnet-large-cased'., defaults to None

number_of_epochs

the number of epochs to train with. Must be a positive integer., defaults to None

training_batch_size

the batch size during training. Must be a positive integer., defaults to None

validation_batch_size

the batch size during validation. Must be a positive integer., defaults to None

warmup_ratio

ratio of total training steps used for a linear warmup from 0 to learning_rate. Must be a float in [0, 1]., defaults to None

weight_decay

value of weight decay when optimizer is sgd, adam, or adamw. This must be a float in the range [0, 1]., defaults to None

Exceptions

Type Description

Raised if dest is a file path and the file already exists.

Raised if dest is an open file and the file is not writable.

Attributes

base_path

The base path of the resource.

Returns

Type Description
str

The base path of the resource.

creation_context

The creation context of the resource.

Returns

Type Description

The creation metadata for the resource.

featurization

Featurization settings used for NLP job

Returns

Type Description

featurization settings

id

The resource ID.

Returns

Type Description

The global ID of the resource, an Azure Resource Manager (ARM) ID.

inputs

limits

Limit settings for NLP jobs

Returns

Type Description

limit configuration for NLP job

log_files

Job output files.

Returns

Type Description

The dictionary of log names and URLs.

log_verbosity

Log verbosity configuration

Returns

Type Description

the degree of verbosity used in logging

outputs

primary_metric

search_space

Search space(s) to sweep over for NLP sweep jobs

Returns

Type Description

list of search spaces to sweep over for NLP jobs

status

The status of the job.

Common values returned include "Running", "Completed", and "Failed". All possible values are:

  • NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.

  • Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.

  • Provisioning - On-demand compute is being created for a given job submission.

  • Preparing - The run environment is being prepared and is in one of two stages:

    • Docker image build

    • conda environment setup

  • Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state

    while waiting for all the requested nodes to be ready.

  • Running - The job has started to run on the compute target.

  • Finalizing - User code execution has completed, and the run is in post-processing stages.

  • CancelRequested - Cancellation has been requested for the job.

  • Completed - The run has completed successfully. This includes both the user code execution and run

    post-processing stages.

  • Failed - The run failed. Usually the Error property on a run will provide details as to why.

  • Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.

  • NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.

Returns

Type Description

Status of the job.

studio_url

Azure ML studio endpoint.

Returns

Type Description

The URL to the job details page.

sweep

Sweep settings used for NLP job

Returns

Type Description

sweep settings

task_type

Get task type.

Returns

Type Description
str

The type of task to run. Possible values include: "classification", "regression", "forecasting".

test_data

Get test data.

Returns

Type Description

Test data input

training_data

Get training data.

Returns

Type Description

Training data input

training_parameters

Parameters that are used for all submitted jobs.

Returns

Type Description

fixed training parameters for NLP jobs

type

The type of the job.

Returns

Type Description

The type of the job.

validation_data

Get validation data.

Returns

Type Description

Validation data input