How to Train Custom Model in Document Intelligence Studio for Accurate Handwriting Recognition?

Question

How to Train Custom Model in Document Intelligence Studio for Accurate Handwriting Recognition?

YKMESS 20

I am currently working on creating a custom model in Document Intelligence Studio to read PDF forms. However, I am facing issues with incorrect recognition of handwritten text. The model often misinterprets certain characters or words, leading to inaccuracies in the extracted data.

I would like to train the model to better recognize my handwriting, but I'm not sure how to go about it. Specifically, I am looking for guidance on how to:

Provide training data that includes my handwritten samples.
Annotate the data correctly to improve recognition accuracy.
Retrain the model to recognize specific handwriting styles.

Any advice on best practices for training a custom model for handwriting recognition in Document Intelligence Studio would be greatly appreciated. If there are specific tools or steps within the platform that I should be aware of, please let me know.

Thank you in advance for your help!

Accepted answer

1 additional answer

Your answer

Answer 1

santoshkc 13,600 Microsoft External Staff

Hi @YKMESS,

Thank you for reaching out to Microsoft Q&A forum!

To improve the accuracy of handwriting recognition in Azure Document Intelligence Studio, follow these key steps as recommended:

Provide Training Data: Start by collecting a diverse set of PDF samples containing your handwritten text. Ensure these samples represent various handwriting styles, sizes, and contexts. Store these samples in Azure Blob Storage or another supported service for easy access.

Annotate the Data: Use Document Intelligence Studio’s annotation tool to label the handwritten text in your PDFs. Consistent and accurate annotations are crucial—double-check to ensure that each labeled text matches the handwritten content correctly. This accuracy in annotation will directly impact the model's ability to learn and recognize your handwriting.

Retrain the Model: Create a new training job in Document Intelligence Studio using your annotated dataset. Monitor the training process and evaluate the model’s performance using separate validation samples. If necessary, fine-tune the model by adjusting the data or parameters based on the evaluation results to improve accuracy. Regularly updating your dataset and retraining the model will help maintain and enhance recognition performance over time.

For best practice to achieve accurate handwriting recognition, ensure your training data is high quality and representative of the handwriting styles you need. Include a diverse range of samples to enhance the model's robustness. Regularly update your dataset and retrain the model to adapt to new handwriting samples and evolving styles.

I hope you understand. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

YKMESS 20 Reputation points

2024-08-14T15:55:34.5+00:00
Hi. @santoshkc

Thank you for the detailed explanation on improving handwriting recognition accuracy. I'm using Document Intelligence Studio's annotation tool, but I am facing difficulties with the correct process for labeling handwritten text. Here's what I've done so far:

I ran "Run Layout" on the PDF containing the handwritten text to get the initial recognition results.

I selected the incorrectly recognized handwritten text and attempted to link it with a custom tag.

At this point, I would like to input the correct text for training, but I can't find the feature to do so.

I noticed that a JSON file is generated in Azure Storage after running these steps. Should I directly edit this JSON file to correct the handwritten text? If so, how can I use the edited JSON file to continue training the custom model in Document Intelligence Studio?

Any guidance on the correct process for labeling and correcting handwriting annotations would be greatly appreciated.

Thank you for your support!
santoshkc 13,600 Reputation points Microsoft External Staff

2024-08-16T04:37:42.8233333+00:00

Hi @YKMESS,

Thank you for your follow-up query.

Since Document Intelligence Studio only accepts file formats like PDF, images, and Microsoft Office documents, you cannot directly use the JSON file for training. Instead, manually correct the handwritten text annotations using the platform’s annotation tool. Import your PDFs or images into Document Intelligence Studio, adjust the text labels as needed, and save the updated annotations. Then, create a new training job with this corrected data to integrate the changes into the model's training process. Monitor the training and evaluate the model’s performance to ensure accuracy.

See: Input file requirements.

I hope this helps. Thank you.

Answer 2

Pankaj Singla 0

At the top of this post, I want to find:

What is the minimum number of images/data needed if my data contains handwritten text?
If the layouts differ and each layout has a different type of handwritten text, do I need to take 5-10 samples for each category, or is one sample enough for each category?

Share via

How to Train Custom Model in Document Intelligence Studio for Accurate Handwriting Recognition?

1 additional answer

Your answer