how to get multiple occurrences back in document from custom model and how to avoid if irrelavent data is present at position where the field is trained in custom model

Question

Hi Team, we are planning to move from general document to custom model. we have 2 questions
1)we trained the model but if the same field occurs twice in a document the Doc Intelligence is returning only first occurrence from the entire document . can you please let us know is there a way to get multiple occurrences from a document.
2) we trained the model with multiple templates and when we try to test , the value present at the position is not related to the field trained still model mapping the value to the field as it in the same position. can you please provide is there a way to avoid this kind of scenarios.

Answer

Hi -, S Yogesh,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!
I understand that you are facing the issue where the custom model returns only the first occurrence of a field when the same field appears multiple times in a document.

When working with Azure Document Intelligence custom models, handling multiple occurrences of a field and avoiding irrelevant data mapping can be achieved. For extracting multiple occurrences of a field, Azure Document Intelligence typically maps only the first occurrence by default. To address this, you can implement post-processing logic to iterate through all key-value pairs or table rows extracted by the model. This ensures you capture all instances programmatically. If the fields appear in tables, leveraging the Prebuilt Table Model alongside the custom model can help extract repeated field values more effectively. Additionally, during the model training process, explicitly labeling all occurrences of the same field using consistent naming conventions (e.g., field1, field2) can help the model recognize multiple instances.

To avoid irrelevant data mapping, ensure your training data includes diverse templates with varying layouts and blank fields. Filtering predictions based on confidence scores can help exclude low-confidence results. You can also preprocess documents with the Read API to validate text and postprocess results to check for specific formats like dates or numbers. Analyzing bounding regions further ensures predictions are contextually accurate.

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Share via

how to get multiple occurrences back in document from custom model and how to avoid if irrelavent data is present at position where the field is trained in custom model

1 answer

Your answer