Native document support for Azure AI Language (preview)

Article
03/06/2025

Important

Azure AI Language public preview releases provide early access to features that are in active development.
Features, approaches, and processes can change, before General Availability (GA), based on user feedback.

Azure AI Language is a cloud-based service that applies Natural Language Processing (NLP) features to text-based data. The native document support capability enables you to send API requests asynchronously, using an HTTP POST request body to send your data and HTTP GET request query string to retrieve the status results. Your processed documents are located in your Azure Blob Storage target container.

A native document refers to the file format used to create the original document such as Microsoft Word (docx) or a portable document file (pdf). Native document support eliminates the need for text preprocessing before using Azure AI Language resource capabilities. Currently, native document support is available for the following capabilities:

Personally Identifiable Information (PII). The PII detection feature can identify, categorize, and redact sensitive information in unstructured text. The PiiEntityRecognition API supports native document processing.
Document summarization. Document summarization uses natural language processing to generate extractive (salient sentence extraction) or abstractive (contextual word extraction) summaries for documents. Both AbstractiveSummarization and ExtractiveSummarization APIs support native document processing.

Supported document formats

Applications use native file formats to create, save, or open native documents. Currently PII and Document summarization capabilities supports the following native document formats:

File type	File extension	Description
Text	`.txt`	An unformatted text document.
Adobe PDF	`.pdf`	A portable document file formatted document.
Microsoft Word	`.docx`	A Microsoft Word document file.

Input guidelines

Supported file formats

Type	support and limitations
PDFs	Fully scanned PDFs aren't supported.
Text within images	Digital images with embedded text aren't supported.
Digital tables	Tables in scanned documents aren't supported.

Document Size

Attribute	Input limit
Total number of documents per request	≤ 20
Total content size per request	≤ 10 MB

Request headers and parameters

parameter	Description
`-X POST <endpoint>`	Specifies your Language resource endpoint for accessing the API.
`--header Content-Type: application/json`	The content type for sending JSON data.
`--header "Ocp-Apim-Subscription-Key:<key>`	Specifies the Language resource key for accessing the API.
`-data`	The JSON file containing the data you want to pass with your request.

PII detection overview Document Summarization overview

Share via

Native document support for Azure AI Language (preview)

Supported document formats

Input guidelines

Request headers and parameters

Feedback

Additional resources

Share via

Native document support for Azure AI Language (preview)

Supported document formats

Input guidelines

Request headers and parameters

Related content

Feedback

Additional resources