IndexingParametersConfiguration interface

Reference

Package:: @azure/search-documents

A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type.

Properties

allowSkillsetToReadFileData	If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill.
dataToExtract	Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs.
delimitedTextDelimiter	For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "\|").
delimitedTextHeaders	For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.
documentRoot	For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.
excludedFileNameExtensions	Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing.
executionEnvironment	Specifies the environment in which the indexer should execute.
failOnUnprocessableDocument	For Azure blobs, set to false if you want to continue indexing if a document fails indexing.
failOnUnsupportedContentType	For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.
firstLineContainsHeaders	For CSV blobs, indicates that the first (non-blank) line of each blob contains headers.
imageAction	Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer.
indexedFileNameExtensions	Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.
indexStorageMetadataOnlyForOversizedDocuments	For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://docs.microsoft.com/azure/search/search-limits-quotas-capacity.
parsingMode	Represents the parsing mode for indexing from an Azure blob data source.
pdfTextRotationAlgorithm	Determines algorithm for text extraction from PDF files in Azure blob storage.
queryTimeout	Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss".

Property Details

allowSkillsetToReadFileData

If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill.

allowSkillsetToReadFileData?: boolean

Property Value

boolean

dataToExtract

Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs.

dataToExtract?: "storageMetadata" | "allMetadata" | "contentAndMetadata"

Property Value

"storageMetadata" | "allMetadata" | "contentAndMetadata"

delimitedTextDelimiter

For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|").

delimitedTextDelimiter?: string

Property Value

string

delimitedTextHeaders

For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.

delimitedTextHeaders?: string

Property Value

string

documentRoot

For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.

documentRoot?: string

Property Value

string

excludedFileNameExtensions

Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing.

excludedFileNameExtensions?: string

Property Value

string

executionEnvironment

Specifies the environment in which the indexer should execute.

executionEnvironment?: "standard" | "private"

Property Value

"standard" | "private"

failOnUnprocessableDocument

For Azure blobs, set to false if you want to continue indexing if a document fails indexing.

failOnUnprocessableDocument?: boolean

Property Value

boolean

failOnUnsupportedContentType

For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.

failOnUnsupportedContentType?: boolean

Property Value

boolean

firstLineContainsHeaders

For CSV blobs, indicates that the first (non-blank) line of each blob contains headers.

firstLineContainsHeaders?: boolean

Property Value

boolean

imageAction

Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer.

imageAction?: "none" | "generateNormalizedImages" | "generateNormalizedImagePerPage"

Property Value

"none" | "generateNormalizedImages" | "generateNormalizedImagePerPage"

indexedFileNameExtensions

Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.

indexedFileNameExtensions?: string

Property Value

string

indexStorageMetadataOnlyForOversizedDocuments

For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://docs.microsoft.com/azure/search/search-limits-quotas-capacity.

indexStorageMetadataOnlyForOversizedDocuments?: boolean

Property Value

boolean

parsingMode

Represents the parsing mode for indexing from an Azure blob data source.

parsingMode?: "text" | "default" | "delimitedText" | "json" | "jsonArray" | "jsonLines"

Property Value

pdfTextRotationAlgorithm

Determines algorithm for text extraction from PDF files in Azure blob storage.

pdfTextRotationAlgorithm?: "none" | "detectAngles"

Property Value

"none" | "detectAngles"

queryTimeout

Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss".

queryTimeout?: string

Property Value

string

Share via

IndexingParametersConfiguration interface

Properties

Property Details

allowSkillsetToReadFileData

Property Value

dataToExtract

Property Value

delimitedTextDelimiter

Property Value

delimitedTextHeaders

Property Value

documentRoot

Property Value

excludedFileNameExtensions

Property Value

executionEnvironment

Property Value

failOnUnprocessableDocument

Property Value

failOnUnsupportedContentType

Property Value

firstLineContainsHeaders

Property Value

imageAction

Property Value

indexedFileNameExtensions

Property Value

indexStorageMetadataOnlyForOversizedDocuments

Property Value

parsingMode

Property Value

pdfTextRotationAlgorithm

Property Value

queryTimeout

Property Value

Additional resources