Hi @Swami Nawale ,
Welcome to Q&A forum!
- Unsupported PDF Formats & Multiple Tables on a Single Page
- Ensure PDFs Are in a Supported Format: Syntex works best with PDFs that have an embedded text layer. If your PDFs are scanned images or generated in a non‑standard format (for example, using unusual fonts or encodings), the model may have difficulty extracting data. Consider pre‑processing these files with an OCR tool (such as Adobe Acrobat’s OCR or a dedicated OCR service) to convert them into searchable, text‑based PDFs.
- Simplify the Layout: When a page contains multiple tables, the extraction engine may “get confused” about where one table ends and another begins. If possible, reformat the source documents to separate tables onto different pages or reduce overlapping elements. In cases where you can’t change the source, you might try splitting the page into individual sections before ingestion.
- Data Extraction from Multi-Page PDFs
- Split or Pre-Process Documents: If the model consistently extracts data from only the first few pages, consider splitting large PDFs into smaller chunks (for example, one PDF per page or per table section). This can help the model process each segment individually, which may lead to more complete extraction.
Augment Your Training Samples: Syntex’s extraction quality depends on the examples you provide during training. Include multi‑page documents and samples with large tables (100+ rows) in your training set so the model learns the structure and nuances of your documents.
Ensure Consistent Formatting: Inconsistent table structures (such as merged cells, variable column widths, or split headers) can cause extraction errors. If you have control over the document generation process, standardizing table formats can improve the consistency of extraction.
- Automatic Processing Issues
- Review Library and Content Type Settings: Automatic classification and processing in Syntex are driven by configuration on your document libraries. Confirm that your library is properly configured to trigger Syntex processing—for example, ensuring the correct content type or metadata is set to prompt the model.
- Monitor for Known Service Issues: Check the Microsoft 365 Message Center for any advisories. In some cases, intermittent issues with the Syntex service may require a manual “Classify” action. Ensuring you’re running the latest service updates might help.
- Model Publishing Delays
- Expect Propagation Delays: After publishing changes to your Syntex model, it may take up to 30 minutes (or sometimes longer) for the new configuration to be fully propagated. This delay can be due to internal queueing and processing within the Syntex pipeline.
- Plan Updates During Off-Peak Hours: If possible, schedule model changes when there is less activity so that the delay doesn’t impact critical processing windows.
- Low Confidence Scores for Multi-Page PDFs
- Improve Document Quality: Ensure your PDFs are high resolution and that the text is clear. Poor quality scans can lead to lower confidence scores.
- Increase Training Data: Augment your model’s training set with a variety of multi‑page PDFs—especially ones with complex tables—to help the model better understand how to extract data from these layouts.
- Use Feedback Loops: Take advantage of any “feedback” mechanism in the Syntex UI to correct extraction errors. Over time, this feedback can improve model performance for similar documents.
- Custom Extraction Rules: If your documents have predictable patterns (for instance, large tables always follow a specific header format), consider exploring whether you can apply custom extraction rules or templates within Syntex to improve accuracy.
These challenges are not uncommon when processing complex documents with Syntex.
Hope this can help.
Please do let us know if you have any further queries.
Kindly consider accepting the answer if the information provided is helpful. This can assist other community members in resolving similar issues.