Requirements and limitations for models in Microsoft Syntex
Applies to: ✓ All custom models | ✓ All prebuilt models
Microsoft Syntex lets you create custom models and prebuilt models. Depending on the type of model you choose, there might be different requirements, such as file type and size, languages that need to be supported, geographical considerations, and other factors that will help you decide which type of model to use.
Custom models:
Prebuilt models:
- Contract processing
- Invoice processing
- Receipt processing
- Sensitive information processing
- Simple document processing
Custom models
Unstructured document processing
Icon | Description |
---|---|
Supported file types This model supports the following file types: .csv, .doc, .docx, .eml, .heic, .heif, .htm, .html, .jpeg, .jpg, .md, .msg, .pdf, .png, .ppt, .pptx, .rtf, .tif, .tiff, .txt, .xls, and .xlsx ( formulas in .xls and .xlsx files are not run). |
|
Supported languages This model supports all of the Latin-based languages, including: English, French, German, Italian, and Spanish. |
|
OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - If your .pdf files are password-locked, you must remove the lock before submitting them. - The combined file size of the documents used for training per collection must not exceed 50 MB, and PDF documents shouldn't have more than 500 pages. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. Images that are very wide or have odd dimensions (for example, floor plans) might get truncated in the OCR process and lose accuracy. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - If scanned from paper documents, scans should be high-quality images. - Must use the Latin alphabet (English characters). Note the following differences about Microsoft Office text-based files and OCR-scanned files (.pdf, image, or .tiff): - All files: Truncated at 64,000 characters (in training and when run against files in a document library). - OCR-scanned files: There's a 500-page limit. Only PDF and image file types are processed by OCR. |
|
Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support. |
|
Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only. |
Freeform document processing
Icon | Description |
---|---|
Supported file types This model supports the following file types: see file type requirements. |
|
Supported languages This model supports the following languages: see Model for General documents. |
|
OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet these requirements. |
|
Optimization tips If your model isn't performing as you want it to, try these steps to improve the performance of your model. |
|
Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support. |
|
Custom Power Platform environments If you use a custom environment (rather than the default environment) for Power Platform processing, there are additional setup requirements. For more information, see Custom Power Platform environments. |
|
Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only. You can have only one freeform or one structured model per library. |
Structured document processing
Icon | Description |
---|---|
Supported file types This model supports the following file types: see file type requirements. |
|
Supported languages This model supports the following languages: see Model for Fixed-template documents. |
|
OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet these requirements. |
|
Optimization tips If your model isn't performing as you want it to, try these steps to improve the performance of your model. |
|
Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support. |
|
Custom Power Platform environments If you use a custom environment (rather than the default environment) for Power Platform processing, there are additional setup requirements. For more information, see Custom Power Platform environments. |
|
Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only. You can have only one freeform or one structured model per library. |
Prebuilt models
Contract processing
Icon | Description |
---|---|
Supported file types This model supports the following file types: .bmp, .jpeg, .pdf, .png, and .tiff. |
|
Supported languages This model supports only English language contracts. |
|
OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less. |
|
Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support. |
|
Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only. |
Invoice processing
Icon | Description |
---|---|
Supported file types This model supports the following file types: .bmp, .jpeg, .pdf, .png, and .tiff. |
|
Supported languages This model supports invoices in English, Spanish, German, French, Italian, Portuguese, and Dutch. |
|
OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less. |
|
Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support. |
|
Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only. |
Receipt processing
Icon | Description |
---|---|
Supported file types This model supports the following file types: .bmp, .jpeg, .pdf, .png, and .tiff. |
|
Supported languages This model supports receipts in English, Croation, Czech, Danish, Dutch, Finnish, German, Hungarian, Italian, Japanese, Latvian, Lithuanian, Norwegian, Portuguese, Spanish, Swedish, and Vietnamese. |
|
OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less. |
|
Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support. |
|
Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only. |
Sensitive information processing
Icon | Description |
---|---|
Supported file types This model supports the following file types: .csv, .doc, .docx, .eml, .heic, .heif, .htm, .html, .jpeg, .jpg, .md, .msg, .pdf, .png, .ppt, .pptx, .rtf, .tif, .tiff, .txt, .xls, and .xlsx. |
|
Supported languages This model supports the following languages: see supported languages. This model also supports languages for both handwritten text and print text. |
|
OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less. Supports languages for both handwritten text and print text. |
|
Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support. |
|
Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only. |
Simple document processing
Icon | Description |
---|---|
Supported file types This model supports the following file types: .bmp, .jpeg, .pdf, .png, and .tiff. |
|
Supported languages This model supports documents in more than 100 languages. |
|
OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less. |
|
Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support. |
|
Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only. |