How to Train Custom Model in Document Intelligence Studio for Accurate Handwriting Recognition?

YKMESS 20 Reputation points
2024-08-14T13:03:05.2233333+00:00

I am currently working on creating a custom model in Document Intelligence Studio to read PDF forms. However, I am facing issues with incorrect recognition of handwritten text. The model often misinterprets certain characters or words, leading to inaccuracies in the extracted data.

I would like to train the model to better recognize my handwriting, but I'm not sure how to go about it. Specifically, I am looking for guidance on how to:

  • Provide training data that includes my handwritten samples.
  • Annotate the data correctly to improve recognition accuracy.
  • Retrain the model to recognize specific handwriting styles.

Any advice on best practices for training a custom model for handwriting recognition in Document Intelligence Studio would be greatly appreciated. If there are specific tools or steps within the platform that I should be aware of, please let me know.

Thank you in advance for your help!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,983 questions
0 comments No comments
{count} votes

Accepted answer
  1. santoshkc 13,600 Reputation points Microsoft External Staff
    2024-08-14T14:39:07.37+00:00

    Hi @YKMESS,

    Thank you for reaching out to Microsoft Q&A forum!

    To improve the accuracy of handwriting recognition in Azure Document Intelligence Studio, follow these key steps as recommended:

    Provide Training Data: Start by collecting a diverse set of PDF samples containing your handwritten text. Ensure these samples represent various handwriting styles, sizes, and contexts. Store these samples in Azure Blob Storage or another supported service for easy access.

    Annotate the Data: Use Document Intelligence Studio’s annotation tool to label the handwritten text in your PDFs. Consistent and accurate annotations are crucial—double-check to ensure that each labeled text matches the handwritten content correctly. This accuracy in annotation will directly impact the model's ability to learn and recognize your handwriting.

    Retrain the Model: Create a new training job in Document Intelligence Studio using your annotated dataset. Monitor the training process and evaluate the model’s performance using separate validation samples. If necessary, fine-tune the model by adjusting the data or parameters based on the evaluation results to improve accuracy. Regularly updating your dataset and retraining the model will help maintain and enhance recognition performance over time.

    For best practice to achieve accurate handwriting recognition, ensure your training data is high quality and representative of the handwriting styles you need. Include a diverse range of samples to enhance the model's robustness. Regularly update your dataset and retrain the model to adapt to new handwriting samples and evolving styles.

    I hope you understand. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


1 additional answer

Sort by: Most helpful
  1. Pankaj Singla 0 Reputation points
    2025-01-31T03:39:32.2433333+00:00

    At the top of this post, I want to find:

    1. What is the minimum number of images/data needed if my data contains handwritten text?
    2. If the layouts differ and each layout has a different type of handwritten text, do I need to take 5-10 samples for each category, or is one sample enough for each category?
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.