Guidance on ML Assisted Labeling, AutoML Training, Deployment, and Automation Using Azure ML for custom named entity recognition model

Question

Azure Community Question: Guidance on Azure ML NER Pipeline, Costs, and Automation

Hello Azure Community,

I’m working on an Azure ML pipeline for custom NER (Named Entity Recognition), covering labeling, training, deployment, and automation. I’d appreciate your expert advice on GPU usage, cost estimation, and automation approaches.

Workflow Overview:

Data Labeling: Azure ML Assisted Labeling for 500 text files (with 20% pre-labeled).
Model Training: Azure AutoML for an NER model (NLP task).
Deployment: Azure ML Managed CPU-based Endpoint for inference.
Automation: Weekly auto-labeling and retraining.

Questions:

ML Assisted Labeling:

For assisted labeling on 500 text files (20% labeled), does Azure ML require GPU or CPU?
How long might labeling take, and what cost should I expect?

AutoML Training for NER:

For NER model training in AutoML, do I need a GPU, or can I use CPU?
If GPU is required, which type is recommended?
Estimated time and cost for training on a GPU?

Inference with CPU Endpoint:

Can I use a CPU cluster for deploying the NER model?
Is CPU-based inference suitable for real-time predictions?

Handling New Data and Retraining:

How can I auto-label new files from Blob Storage and append them to the existing labeled dataset?
Should retraining the updated dataset use a GPU, and will Azure automatically replace the current endpoint with the latest model?

Automation Best Practices:

What is the best approach to automate auto-labeling, dataset updating, retraining, and endpoint deployment?
Should I use Azure Functions, Event Grid, Logic Apps, or a combination?

GPU Usage and Cost Model:

With weekly labeling and retraining, will GPU charges apply only during use, or continuously if the resource exists?
If I receive 100 new files daily but only label and retrain weekly, will charges apply only weekly?

Scenario Overview:

New Data: 100 files uploaded daily.
Frequency: Auto-labeling and retraining weekly.
Deployment: CPU endpoint with 1,000 daily inference requests.

Could you please guide me on cost expectations, resource selection, and automation best practices for this pipeline?

Thank you for your support! Azure Community Question: Guidance on Azure ML NER Pipeline, Costs, and Automation

Hello Azure Community,

I’m working on an Azure ML pipeline for custom NER (Named Entity Recognition), covering labeling, training, deployment, and automation. I’d appreciate your expert advice on GPU usage, cost estimation, and automation approaches.

Workflow Overview:

Data Labeling: Azure ML Assisted Labeling for 500 text files (with 20% pre-labeled).
Model Training: Azure AutoML for an NER model (NLP task).
Deployment: Azure ML Managed CPU-based Endpoint for inference.
Automation: Weekly auto-labeling and retraining.

Questions:

1. ML Assisted Labeling:

For assisted labeling on 500 text files (20% labeled), does Azure ML require GPU or CPU?
How long might labeling take, and what cost should I expect?

2. AutoML Training for NER:

For NER model training in AutoML, do I need a GPU, or can I use CPU?
If GPU is required, which type is recommended?
Estimated time and cost for training on a GPU?

3. Inference with CPU Endpoint:

Can I use a CPU cluster for deploying the NER model?
Is CPU-based inference suitable for real-time predictions?

4. Handling New Data and Retraining:

How can I auto-label new files from Blob Storage and append them to the existing labeled dataset?
Should retraining the updated dataset use a GPU, and will Azure automatically replace the current endpoint with the latest model?

5. Automation Best Practices:

What is the best approach to automate auto-labeling, dataset updating, retraining, and endpoint deployment?
Should I use Azure Functions, Event Grid, Logic Apps, or a combination?

6. GPU Usage and Cost Model:

With weekly labeling and retraining, will GPU charges apply only during use, or continuously if the resource exists?
If I receive 100 new files daily but only label and retrain weekly, will charges apply only weekly?

Scenario Overview:

New Data: 100 files uploaded daily.
Frequency: Auto-labeling and retraining weekly.
Deployment: CPU endpoint with 1,000 daily inference requests.

Could you please guide me on cost expectations, resource selection, and automation best practices for this pipeline?

Thank you for your support!

Accepted Answer

Azure ML Assisted Labeling typically uses CPU resources for the labeling process, as it involves human-in-the-loop tasks like data annotation and review. The time required for labeling 500 text files (with 20% pre-labeled) depends on the complexity of the text and the number of annotators, but it could take several hours to a few days.

Costs are primarily driven by the compute instance used for labeling and the storage of labeled data. Azure ML pricing for compute instances varies based on the VM size, but you can expect costs to be relatively low for CPU-based labeling.

I am not expert in this subject but what I read about thr NER model training in Azure AutoML, I recall that a GPU is generally recommended due to the computational intensity of NLP tasks.

Azure offers GPU options like the NC-series ( NC6 or NC12) for training so the training time depends on the dataset size and model complexity but could range from a few hours to a day. Costs will depend on the GPU type and training duration, with hourly rates for GPUs being higher than CPUs. You can always estimate costs using Azure pricing calculator.

You can deploy your NER model to a CPU-based endpoint for inference because it is suitable for real-time predictions if the model is optimized and the request volume is moderate (for example 1,000 daily requests). For higher volumes or latency-sensitive applications, a GPU endpoint might be more appropriate.

To auto-label new files from Blob Storage, you can use Azure ML data labeling capabilities or a pre-trained model to generate labels. These new labeled files can be appended to your existing dataset. Retraining on the updated dataset should ideally use a GPU for efficiency, especially if the dataset grows significantly. Azure ML can automate model replacement in the endpoint, but you’ll need to configure the pipeline to update the endpoint with the latest model.

For automation, a combination of Azure Functions, Event Grid, and Logic Apps is effective. Azure Functions can trigger labeling and retraining workflows, Event Grid can monitor Blob Storage for new files, and Logic Apps can orchestrate the pipeline. Azure ML Pipelines can also be used to automate the entire workflow, including dataset updates, retraining, and endpoint deployment.

GPU charges apply only when the resources are actively used. If you provision a GPU for weekly labeling and retraining, you’ll be charged only for the hours the GPU is in use. For your scenario, with 100 new files daily and weekly retraining, GPU costs will be incurred weekly during the retraining process.

Answer

Hi Dinnemidi Ananda Kumar,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

ML Assisted Labeling:
Azure ML Assisted Labeling can be performed using either CPU or GPU. However, for text labeling tasks, a CPU is generally sufficient.

The time required for labeling depends on the complexity of the text and the efficiency of the labeling team. Costs can vary.
AutoML Training for NER:

For training NER models, using a GPU is recommended due to the significant speed advantage over CPUs.

Azure offers various GPU options like the NC series (e.g., NC6, NC12) which are suitable for deep learning tasks.

Training time and cost depend on the dataset size and model complexity. Using a GPU can reduce training time significantly, but it will be more expensive than using a CPU.
Inference with CPU Endpoint:
Yes, you can use a CPU cluster for deploying the NER model. Azure ML Managed Endpoints support both CPU and GPU deployments.

CPU-based inference is generally suitable for real-time predictions, especially if the model is not too large.
Handling New Data and Retraining:

You can use Azure Data Factory or Logic Apps to automate the process of fetching new files from Blob Storage and appending them to the existing dataset

Retraining should ideally use a GPU for efficiency. Azure ML can automate the deployment of the latest model to replace the current endpoint

Automation Best Practices:
Combining Azure Functions, Event Grid, and Logic Apps can provide a robust automation framework. Azure Functions can handle the processing logic, Event Grid can manage event-driven workflows, and Logic Apps can orchestrate the entire process.

GPU Usage and Cost Model:
GPU charges apply only during usage. If you configure your compute clusters to scale down to zero when not in use, you can avoid continuous charges

Charges will apply only during the periods when labeling and retraining occur. If you label and retrain weekly, you will be charged only for those specific times.

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Answer

Thank you for your valuable time and detailed response. I encountered some issues while implementing the suggestions.

1)For ML Assisted Labeling, when I tried to create a CPU-based compute, Azure ML did not provide an option for CPU instances, indicating that CPU may not be supported for automatic data labeling.

Additionally, during AutoML training for the NER model, Azure only allowed GPU-based compute, without any option for CPU, suggesting that GPU is required for NER training in AutoML due to the high computational demand.
However, I currently do not have a GPU quota, so I would like to know if it is possible to achieve these tasks using CPU-only compute, or if this behavior is expected and GPU is mandatory. I’d appreciate your guidance on potential workarounds or solutions.

User's image

Share via

Guidance on ML Assisted Labeling, AutoML Training, Deployment, and Automation Using Azure ML for custom named entity recognition model

2 additional answers

Your answer