Poor results with custom classification

Maher 21 Reputation points
2025-01-15T18:48:58.99+00:00

I was trying to use the Azure Document Intelligence custom classification model to classify invoice layouts and the service provided very poor results. All my documents are invoices and not different types (e.g. receipts).

Ideally I wanted to build a library of invoice layouts so when a new one is received it would recognize the class and apply rules to the class (e.g. customer, extraction, regex, etc).

Since all my documents are invoices many of the layouts are similar and since classification doesn't recognize things like colors, logos, etc I assume it has a very hard time classifying properly.

When I tested with invoices that visually looked very different, with high confidence it added it to a class that visually looked totally different.

My datasets were about 10 samples.

Not sure if I'm the only one but I had to abandon document classification all together with Azure and rely instead on custom routing rules within my application.

Not sure if I'm misunderstanding something with the service but wanted to share the feedback of my results.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,909 questions
{count} votes

Accepted answer
  1. SriLakshmi C 2,490 Reputation points Microsoft Vendor
    2025-01-15T20:30:28.86+00:00

    Hello Maher,

    Greetings and Welcome to Microsoft Q&A! Thanks for posting the question.

    I understand that you are facing some challenges with the Azure Document Intelligence custom classification model, especially given the similarity in your invoice layouts. consider these steps,

    Increasing the number of samples per class can significantly enhance the model's accuracy. While 10 samples per class can be a starting point, providing more diverse examples helps the model learn better and generalize across similar layouts. please refer this Custom classification model - Document Intelligence - Azure AI services | Microsoft Learn.

    The custom classification model primarily relies on layout and text features, which can be challenging when dealing with similar invoice layouts. Since the model doesn't consider visual elements like colors and logos, it might struggle with accurate classification in such cases.

    Utilize the incremental training feature of the latest custom classification model. This allows you to add new samples to existing classes or introduce new classes over time, continuously improving the model's performance.

    Ensure that your documents are preprocessed correctly. High-quality scans or text-based PDFs can significantly improve the model's ability to classify documents accurately. Proper preprocessing helps in extracting clear and consistent features from the documents.

    Also refer this Build and train a custom classifier - Document Intelligence - Azure AI services | Microsoft Learn.

    I hope you understand! Thank you.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Patrick Gonzalez 0 Reputation points
    2025-02-04T02:54:29.8966667+00:00

    Maher, I'm also seeing very poor results using the custom classification model to recognize invoices. Have you seen any better results, or did you stop attempting to use the model?

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.