Requirements and limitations for prebuilt document processing in SharePoint

Article
01/22/2025

The following sections outline key factors to consider when planning to use a prebuilt document processing model.

Contract processing

Icon	Description
	Supported file types This model supports the following file types: .bmp, .jpeg, .pdf, .png, and .tiff.
	Supported languages This model supports only English language contracts.
	OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less.
	Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support.
	Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities are from the applied model only.

Invoice processing

Icon	Description
	Supported file types This model supports the following file types: .bmp, .jpeg, .pdf, .png, and .tiff.
	Supported languages This model supports invoices in English, Spanish, German, French, Italian, Portuguese, and Dutch.
	OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less.
	Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support.
	Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities are from the applied model only.

Receipt processing

Icon	Description
	Supported file types This model supports the following file types: .bmp, .jpeg, .pdf, .png, and .tiff.
	Supported languages This model supports receipts in English, Croatian, Czech, Danish, Dutch, Finnish, German, Hungarian, Italian, Japanese, Latvian, Lithuanian, Norwegian, Portuguese, Spanish, Swedish, and Vietnamese.
	OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less.
	Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support.
	Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities are from the applied model only.

Sensitive information processing

Icon	Description
	Supported file types This model supports the following file types: .csv, .doc, .docx, .eml, .heic, .heif, .htm, .html, .jpeg, .jpg, .md, .msg, .pdf, .png, .ppt, .pptx, .rtf, .tif, .tiff, .txt, .xls, and .xlsx.
	Supported languages This model supports the following languages: see supported languages. This model also supports languages for both handwritten text and print text.
	OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less. Supports languages for both handwritten text and print text.
	Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support.
	Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities are from the applied model only.

Simple document processing

Icon	Description
	Supported file types This model supports the following file types: .bmp, .jpeg, .pdf, .png, and .tiff.
	Supported languages This model supports documents in more than 100 languages.
	OCR considerations This model uses optical character recognition (OCR) technology to scan .pdf files, image files, and .tiff files. OCR processing works best on documents that meet the following requirements: - File format of .jpg, .png, or .pdf (text or scanned). Text-embedded .pdf files are better, because there won't be any errors in character extraction and location. - For .pdf and .tiff files, up to 2,000 pages can be processed. - The file size must be less than 50 MB. - For images, dimensions must be between 50 x 50 and 10,000 x 10,000 pixels. - For .pdf files, dimensions must be at most 11 x 17 inches, corresponding to Legal or A3 paper sizes and smaller. - The total size of the training data is 500 pages or less.
	Multi-Geo environments When setting up Syntex in a Microsoft 365 Multi-Geo environment, you can only configure it to use the model type in the central location. If you want to use this model type in a satellite location, contact Microsoft support.
	Multi-model libraries If two or more trained models are applied to the same library, the file is classified using the model that has the highest average confidence score. The extracted entities are from the applied model only.