Document Models - Analyze Document

Analyzes document with document model.

POST {endpoint}/documentintelligence/documentModels/{modelId}:analyze?_overload=analyzeDocument&api-version=2024-11-30
POST {endpoint}/documentintelligence/documentModels/{modelId}:analyze?_overload=analyzeDocument&api-version=2024-11-30&pages={pages}&locale={locale}&stringIndexType={stringIndexType}&features={features}&queryFields={queryFields}&outputContentFormat={outputContentFormat}&output={output}

URI Parameters

Name In Required Type Description
endpoint
path True

string (uri)

The Document Intelligence service endpoint.

modelId
path True

string

maxLength: 64
pattern: ^[a-zA-Z0-9][a-zA-Z0-9._~-]{1,63}$

Unique document model name.

api-version
query True

string

minLength: 1

The API version to use for this operation.

features
query

DocumentAnalysisFeature[]

List of optional analysis features.

locale
query

string

Locale hint for text recognition and document analysis. Value may contain only the language code (ex. "en", "fr") or BCP 47 language tag (ex. "en-US").

output
query

AnalyzeOutputOption[]

Additional outputs to generate during analysis.

outputContentFormat
query

DocumentContentFormat

Format of the analyze result top-level content.

pages
query

string

pattern: ^(\d+(-\d+)?)(,\s*(\d+(-\d+)?))*$

1-based page numbers to analyze. Ex. "1-3,5,7-9"

queryFields
query

string[]

List of additional fields to extract. Ex. "NumberOfGuests,StoreNumber"

stringIndexType
query

StringIndexType

Method used to compute string offset and length.

Request Body

Name Type Description
base64Source

string (byte)

Base64 encoding of the document to analyze. Either urlSource or base64Source must be specified.

urlSource

string (uri)

Document URL to analyze. Either urlSource or base64Source must be specified.

Responses

Name Type Description
202 Accepted

The request has been accepted for processing, but processing has not yet completed.

Headers

  • Operation-Location: string
  • Retry-After: integer
Other Status Codes

DocumentIntelligenceErrorResponse

An unexpected error response.

Security

Ocp-Apim-Subscription-Key

Type: apiKey
In: header

OAuth2Auth

Type: oauth2
Flow: accessCode
Authorization URL: https://login.microsoftonline.com/common/oauth2/authorize
Token URL: https://login.microsoftonline.com/common/oauth2/token

Scopes

Name Description
https://cognitiveservices.azure.com/.default

Examples

Analyze Document from Base64
Analyze Document from Url

Analyze Document from Base64

Sample request

POST https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?_overload=analyzeDocument&api-version=2024-11-30&pages=1-2,4&locale=en-US&stringIndexType=textElements

{
  "base64Source": "e2Jhc2U2NEVuY29kZWRQZGZ9"
}

Sample response

Operation-Location: https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout/analyzeResults/3b31320d-8bab-4f88-b19c-2322a7f11034?api-version=2024-11-30

Analyze Document from Url

Sample request

POST https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentModels/customModel:analyze?_overload=analyzeDocument&api-version=2024-11-30&pages=1-2,4&locale=en-US&stringIndexType=textElements

{
  "urlSource": "http://host.com/doc.pdf"
}

Sample response

Operation-Location: https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentModels/customModel/analyzeResults/3b31320d-8bab-4f88-b19c-2322a7f11034?api-version=2024-11-30

Definitions

Name Description
AnalyzeDocumentRequest

Document analysis parameters.

AnalyzeOutputOption

Additional outputs to generate during analysis.

DocumentAnalysisFeature

Document analysis features to enable.

DocumentContentFormat

Format of the content in analyzed result.

DocumentIntelligenceError

The error object.

DocumentIntelligenceErrorResponse

Error response object.

DocumentIntelligenceInnerError

An object containing more specific information about the error.

StringIndexType

Method used to compute string offset and length.

AnalyzeDocumentRequest

Document analysis parameters.

Name Type Description
base64Source

string (byte)

Base64 encoding of the document to analyze. Either urlSource or base64Source must be specified.

urlSource

string (uri)

Document URL to analyze. Either urlSource or base64Source must be specified.

AnalyzeOutputOption

Additional outputs to generate during analysis.

Value Description
figures

Generate cropped images of detected figures.

pdf

Generate searchable PDF output.

DocumentAnalysisFeature

Document analysis features to enable.

Value Description
barcodes

Enable the detection of barcodes in the document.

formulas

Enable the detection of mathematical expressions in the document.

keyValuePairs

Enable the detection of general key value pairs (form fields) in the document.

languages

Enable the detection of the text content language.

ocrHighResolution

Perform OCR at a higher resolution to handle documents with fine print.

queryFields

Enable the extraction of additional fields via the queryFields query parameter.

styleFont

Enable the recognition of various font styles.

DocumentContentFormat

Format of the content in analyzed result.

Value Description
markdown

Markdown representation of the document content with section headings, tables, etc.

text

Plain text representation of the document content without any formatting.

DocumentIntelligenceError

The error object.

Name Type Description
code

string

One of a server-defined set of error codes.

details

DocumentIntelligenceError[]

An array of details about specific errors that led to this reported error.

innererror

DocumentIntelligenceInnerError

An object containing more specific information than the current object about the error.

message

string

A human-readable representation of the error.

target

string

The target of the error.

DocumentIntelligenceErrorResponse

Error response object.

Name Type Description
error

DocumentIntelligenceError

Error info.

DocumentIntelligenceInnerError

An object containing more specific information about the error.

Name Type Description
code

string

One of a server-defined set of error codes.

innererror

DocumentIntelligenceInnerError

Inner error.

message

string

A human-readable representation of the error.

StringIndexType

Method used to compute string offset and length.

Value Description
textElements

User-perceived display character, or grapheme cluster, as defined by Unicode 8.0.0.

unicodeCodePoint

Character unit represented by a single unicode code point. Used by Python 3.

utf16CodeUnit

Character unit represented by a 16-bit Unicode code unit. Used by JavaScript, Java, and .NET.