Complete Process for Custom NER in Azure ML: Data Import, Labeling, AutoML vs Designer, Deployment

Dinnemidi Ananda Kumar 60 Reputation points
2025-02-07T04:13:22.2066667+00:00

Hello Azure ML Community,

I am working on a Resume Entity Extraction Project and need guidance on using Azure Machine Learning (AML) to build a Custom Named Entity Recognition (NER) Model. I would like to understand the end-to-end process in AML, including:


🔹 1. Importing Data from Azure Blob Storage

  • I have a dataset stored in Azure Blob Storage.
  • The files include TXT, PDF, and DOCX resumes.
  • Questions:
    1. Which file formats does AML support for NER labeling? Can I use PDF and DOCX, or do I need to convert them to TXT?
      1. How do I import these files into Azure ML Data Labeling for annotation?
        1. Can I automate data import using Azure ML Pipelines?

🔹 2. Labeling Data in Azure ML

  • I need to label custom entities such as name, email, skills, experience, designation, and companies_worked.
  • I have used the Azure ML Data Labeling Tool, but my exported data looks like this:
  • { "image_url": "AmlDatastore://azure_blob/UI/Resume.txt", "label": [{"label": "name", "offsetStart": 0, "offsetEnd": 18}] }
    • However, I need it in a JSONL format for NER training:
  • { "text": "John Doe is a Data Scientist with 5 years of experience in Python.", "entities": [ {"category": "NAME", "start": 0, "end": 8}, {"category": "EXPERIENCE", "start": 34, "end": 51}, {"category": "SKILLS", "start": 55, "end": 61} ] Questions:
    1. How can I correctly format my labeled data to JSONL?
      1. Is there an official way in Azure ML to export labeled data directly in JSONL format?
        1. Can I use the Azure ML SDK to programmatically label data instead of the UI?

🔹 3. Choosing Between AutoML and Azure ML Designer for Training

  • I want to train a Custom Named Entity Recognition (NER) Model.
  • Options:
    1. Azure AutoML – Automatically trains models with hyperparameter tuning.
      1. Azure ML Designer – Drag-and-drop interface to create ML pipelines.
      • Questions:
        1. Which approach is better for NER: AutoML or Designer?
          1. How do I train an AutoML NER model on my labeled dataset?
            1. What preprocessing is needed for resumes before training?
              1. If I use Azure ML Designer, are there pre-built NLP components for NER?

I would really appreciate any guidance on the best approach to set up this pipeline in Azure ML.

Thanks in advance for your help! 😊🚀

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,108 questions
0 comments No comments
{count} votes

Accepted answer
  1. Vikram Singh 1,070 Reputation points Microsoft Employee
    2025-02-07T06:17:41.0533333+00:00

    Hi Dinnemidi Ananda Kumar,

    Welcome to the Microsoft Q&A Forum! Thank you for posting your query here.

    I understand you're working on a Resume Entity Extraction Project and need guidance on building a Custom Named Entity Recognition (NER) Model using Azure Machine Learning (AML). Here's a comprehensive guide to help you with the end-to-end process:

    1. Importing Data from Azure Blob Storage

    Which file formats does AML support for NER labeling? Can I use PDF and DOCX, or do I need to convert them to TXT?

    Azure ML supports TXT files for NER labeling. You will need to convert PDF and DOCX files to TXT format before importing them.

    How do I import these files into Azure ML Data Labeling for annotation?

    You can import these files into Azure ML Data Labeling for annotation by uploading them to your Azure Blob Storage and then connecting your storage account to your Azure ML workspace.

    Can I automate data import using Azure ML Pipelines?

    Yes, you can automate data import using Azure ML Pipelines. You can create a pipeline that includes steps for data ingestion, preprocessing, and labeling.

    1. Labeling Data in Azure ML

    How can I correctly format my labeled data to JSONL?

    Use the Azure ML Data Labeling Tool to label custom entities such as name, email, skills, experience, designation, and companies_worked.

    Is there an official way in Azure ML to export labeled data directly in JSONL format?

    To format your labeled data correctly to JSONL, you can use a script to convert the exported data to the required format. Unfortunately, there isn't an official way in Azure ML to export labeled data directly in JSONL format.

    Can I use the Azure ML SDK to programmatically label data instead of the UI?

    Yes, you can use the Azure ML SDK to programmatically label data instead of using the UI. This can be done by creating a custom labeling script and integrating it with your Azure ML pipeline.

    1. Choosing Between AutoML and Azure ML Designer for Training

    Which approach is better for NER: AutoML or Designer?

    Both approaches have their advantages. AutoML is great for automatically training models with hyperparameter tuning, while Azure ML Designer offers a drag-and-drop interface to create ML pipelines.

    How do I train an AutoML NER model on my labeled dataset?

    To train an AutoML NER model on your labeled dataset, you need to prepare your data in the CoNLL format and then use the AutoML capabilities in Azure ML to train the model.

    What preprocessing is needed for resumes before training?

    Preprocessing steps may include converting documents to TXT format, tokenizing text, and normalizing entities.

    If I use Azure ML Designer, are there pre-built NLP components for NER?

    Azure ML Designer includes pre-built NLP components for NER, which can be used to create and train your custom NER model.

    For more detailed guidance, you can refer to the following resources:

    If the reply was helpful, please don't forget to upvote and/or accept as answer, this can be beneficial to other community members.

    Thanks

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.