Document intelligence custom model layout strange reading
Hello,
I am adding new files to my existing model and I have encountered with problem on one of the vendor invoices. It has problems reading the text correctly and fully. Plus doesn't divide it where the space is so I can't get only the amount. Could You please take a look. I added screenshot and three files that has the same issue
Azure AI Document Intelligence
-
santoshkc 11,370 Reputation points • Microsoft Vendor
2024-12-10T14:27:24.0666667+00:00 Hi @Zīle, Raimonds,
Thank you for reaching out to Microsoft Q&A forum!
I tried to reproduce the scenario with your given documents, and the model was unable to extract the text correctly. However, when tested with other documents, the extraction worked as expected. This indicates that the issue with incorrect text extraction is likely due to the specific format of this vendor's invoice. To resolve this, ensure that bounding boxes are precisely drawn during labeling to capture the relevant fields accurately. Adding more examples of the problematic format to the training dataset should help the model handle such variations effectively.
I trained the model using my own document, and it was able to extract the information successfully. Here is the successfully extracted format with the correct sample document, demonstrating that the model performs accurately when trained with properly labeled and formatted data:
I hope you understand! Thank you.
-
santoshkc 11,370 Reputation points • Microsoft Vendor
2024-12-11T09:52:07.5866667+00:00 Hi @Zīle, Raimonds,
Following up to see if the given response was helpful. Thank you.
-
santoshkc 11,370 Reputation points • Microsoft Vendor
2024-12-12T09:10:44.0266667+00:00 Hi @Zīle, Raimonds,
We haven’t heard from you on the last response and was just checking back to see if the given response was helpful. In case if you have any other resolution, please do share that same with the community as it can be helpful to others. Thank you.
-
Zīle, Raimonds 0 Reputation points
2024-12-17T08:27:30.39+00:00 Hi,
Sorry didn't expect so fast replay. The thing that You are suggesting is to use "Draw region" instead of just clicking on the extracted text?
-
Zīle, Raimonds 0 Reputation points
2024-12-17T08:31:29.2966667+00:00 Hi,
Didn't expect such fast reply. So the thing that You suggest is to try the draw region instead of just clicking on the extracted value?
-
santoshkc 11,370 Reputation points • Microsoft Vendor
2024-12-17T10:16:12.9366667+00:00 Hi @Zīle, Raimonds,
Thank you for your follow-up query.
You can proceed with either 'Draw Region' or just clicking on the text. However, in your case, the model performs accurately when trained with properly labeled and formatted data.
I hope you understand! Thank you.
-
Zīle, Raimonds 0 Reputation points
2024-12-18T09:32:05.7633333+00:00 Hi,
I tried by using the draw region and it doesn't read the text correctly at all. The numbers are missing and I see this as OCR issue.
In the labeling I did draw region and like this for the documents
Before when I wasn't using the draw region even the labeling model couldn't see the text correctly, that is why I started this question.
The Studio doesn't even understand that there is space in between and doesn't lets me select just the number. This definitely is a OCR problem.
This issue is happening only with these type of invoices, I have 100 more that are being read without issue.
-
santoshkc 11,370 Reputation points • Microsoft Vendor
2024-12-18T12:47:04.7166667+00:00 Hi @Zīle, Raimonds,
I tried to repro the issue using custom extraction model and able to extract the text by clicking on the text.
Labelling data before training:
Extracted text after training:
I hope you understand. Thank you.
-
Zīle, Raimonds 0 Reputation points
2024-12-19T08:48:59.26+00:00 Ok I tried to do the same in 2024-11-30 (4.0 General Availability) and it works, the problem is occurring in 2023-07-31 (3.1 General Availability). I guess there is no easy way to switch an existing project to a newer API version?
-
santoshkc 11,370 Reputation points • Microsoft Vendor
2024-12-19T14:36:13.8166667+00:00 Hi @Zīle, Raimonds,
It's good to hear that the issue is resolved in the 4.0 General Availability version. Unfortunately, there's no direct way to switch an existing project to a newer API version within Azure Document Intelligence. You'll need to create a new project and retrain the custom model using the newer version.
To make the transition smoother:
- Export the training data (including labeled files) from your existing project.
- Import the data into a new project using the 4.0 API.
- Retrain the model using the updated API version.
Thank you.
Sign in to comment