Hi Patrick Gonzalez,
I understand your frustration with the model's performance after increasing the number of training documents. Let's delve a bit deeper into the issue and explore some additional strategies to improve your custom classification model for invoices.
When you initially trained the model with 10 documents, it performed decently because it had a limited but specific set of data to learn from. However, when you increased the number of training documents to 99, the model might have encountered more variability and noise, leading to overfitting or misclassification. Here are some advanced steps to address this:
- Incremental Training: Instead of training the model with all 99 documents at once, try incremental training. Start with a smaller subset of high-quality, diverse documents and gradually add more data while monitoring the model's performance. This approach can help the model adapt better to new data without being overwhelmed.
- Data Augmentation: Enhance your training dataset by including variations of the same document type. This can involve slight modifications in layout, text, or format. Data augmentation helps the model generalize better and reduces the risk of overfitting.
- Feature Engineering: Focus on extracting more relevant features from your documents. For instance, consider using additional metadata or contextual information that can help the model distinguish between invoices and non-invoices more accurately.
- Model Versioning and Monitoring: Keep track of different versions of your model and their performance metrics. This allows you to compare and roll back to a previous version if the new one doesn't perform as expected. Additionally, continuously monitor the model's performance in production to detect any degradation over time.
For more detailed guidance, you can refer to Transparency note for Document Intelligence - Azure AI services | Microsoft Learn
I hope these suggestions help you improve your model's performance. If you have any further questions or need additional assistance, please feel free to ask.
Please accept as Yes if the answer is helpful so that it can help others in the community.
Thanks!