When do I use a composed model vs a custom model in azure ai document intelligence studio?

Erich O 0

Our company receives invoices from various vendors, each with unique layouts, but they all contain similar key data fields, such as invoice number, line items, and total amounts. We need an automated solution to extract these fields accurately from invoices, regardless of their varying formats

My question is do I just create 1 model and train that model on all of the different pdfs from the different vendors? Or do I create 1 model per vendor and then create a composed model with all of the models per vendor? I feel that if there is 1 model per vendor to keep the models unique per invoice, there will be too many different models.Thoughts?

Manas Mohanty 635 Reputation points Microsoft Vendor

2025-02-13T10:12:23.4833333+00:00

Hi Erich O!

I agree with Marcin Policht's pointers on using a single custom model with diverse data set. You can test with Custom neural model, which is based on deep learning can handle structured, semi-structured and un-structured data.

Thank you.
Manas Mohanty 635 Reputation points Microsoft Vendor

2025-02-17T05:04:50.0033333+00:00

Hi Erich O!

We have not heard from you. Hope the pointers provided were useful to you.Thank you.

1 answer

Marcin Policht 36,265 Reputation points MVP

2025-02-12T21:52:45.3966667+00:00
AFAIK, for your use case, the most suitable approach would depend on the complexity and variability of your invoice layouts. Effectively you might want to consider the following options:

Single model (generalized model)

Pros:

Easier to manage and scale.

A well-trained model can generalize across multiple invoice formats.

Less overhead compared to maintaining multiple models.

Cons:

If the invoice layouts vary significantly, generalization might be harder.

Requires a larger, well-annotated training dataset covering all variations.

Could result in lower accuracy if the layouts are drastically different.

Multiple models (one per vendor) + composed model

Pros:

Higher accuracy per vendor since each model is tailored to a specific format.

Easier to troubleshoot errors for a specific vendor.

Composed models allow automatic routing to the correct model.

Cons:

Managing and maintaining multiple models is complex.

Increased training and retraining effort.

If new vendors are introduced, a new model must be created.

Hybrid approach (composed model for outliers)

A single general model should be your starting point. If accuracy drops for certain vendors, introduce specific models only for those vendors and use a composed model that routes invoices accordingly.

Use a general model for most invoices.

For problematic vendors/layouts, train specialized models only where necessary.

If using Azure Form Recognizer, classify invoices first and then route them to the best-fit model.

Effectively, you might want to:

Start with a single model trained on diverse invoices.

If the model struggles with certain vendors, create vendor-specific models only where necessary.

Use a composed model to auto-route invoices to the right model if needed.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin
Please sign in to rate this answer.
Erich O 0 Reputation points

2025-02-13T14:51:46.09+00:00

For the single model approach, do I train the model on 1 invoice from each vendor or do I need 5 of the same style format invoices from each vendor?

Marcin Policht 36,265 Reputation points MVP

2025-02-13T15:12:03.8266667+00:00

For the single model approach, you should train the model using multiple invoices per vendor, rather than just one. A good starting point is at least 5 invoices per vendor, but the ideal number depends on the variability within each vendor’s invoices.

There are several reasons for it:

Accounting for layout variations – Even within the same vendor, invoices may have slight differences in structure, font sizes, spacing, or additional fields. Training with multiple samples helps the model generalize better.

Improving accuracy – The model learns patterns and reduces the risk of misclassification or missing fields when extracting data.

Handling noisy or uncommon cases – Some invoices may have extra fields, missing values, or different line item structures. A single invoice might not capture all possible variations.

Validating model performance – With more examples, you can better test and validate the model to ensure it performs well across different layouts.

Effectively, consider the following:

Collect at least 5 invoices per vendor (more if possible).

Ensure these invoices represent a mix of common variations.

If a vendor has drastically different layouts, consider increasing the sample size to cover all formats.

Use a diverse dataset that includes invoices with different line item counts, amounts, and possible edge cases.

If your model struggles with a particular vendor’s invoices, you can always add more samples from that vendor to fine-tune the model.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Erich O 0 Reputation points

2025-02-13T16:30:37.9333333+00:00

How do I add more samples to the dataset after I have already trained my model? I do not see the option to retrain a model on more data. The only option I get is to train a new model.

Erich O 0 Reputation points

2025-02-13T16:39:13.0533333+00:00

If I have already trained my model, how do I retrain it on more invoices? When I add more invoices and label them and go to train, I am only able to train a new model.

Manas Mohanty 635 Reputation points Microsoft Vendor

2025-02-13T16:43:48.6933333+00:00

Hi Erich O

Yes, we have to create a new model once we have added minimum 5 samples from each vendor and labelled it. Re-training option is not there.

Thank you.

Erich O 0 Reputation points

2025-02-13T16:49:23.12+00:00

But you said that you can add more samples to the dataset for the original model? I am confused now. If I train my model on 5 original invoices per vendor, and it struggles, I want to be able to retrain it on more data.

Marcin Policht 36,265 Reputation points MVP

2025-02-13T22:56:38.21+00:00

It appears that you misinterpreted my response. Apologies for the lack of clarity. In this context, "incremental" retraining on an existing model is not supported. Instead, you need to train a new model with the expanded dataset, which means including both the original and newly added invoices.

Incremental retraining is available with document classifiers. More at

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept/incremental-classifier?view=doc-intel-4.0.0

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Erich O 0 Reputation points

2025-02-17T19:03:14.4866667+00:00

How can you create the new model without having to re-label all of the original invoices?

Marcin Policht 36,265 Reputation points MVP

2025-02-17T23:00:23.9066667+00:00

As I mentioned, the incremental training only applies to document classifier models and not custom models. I'm not aware of the option to accomplish this

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

When do I use a composed model vs a custom model in azure ai document intelligence studio?

1 answer

Your answer