IndexingParametersConfiguration interface
A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type.
Properties
allow |
If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill. |
data |
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. |
delimited |
For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|"). |
delimited |
For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index. |
document |
For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property. |
excluded |
Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing. |
execution |
Specifies the environment in which the indexer should execute. |
fail |
For Azure blobs, set to false if you want to continue indexing if a document fails indexing. |
fail |
For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance. |
first |
For CSV blobs, indicates that the first (non-blank) line of each blob contains headers. |
image |
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. |
indexed |
Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types. |
index |
For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://docs.microsoft.com/azure/search/search-limits-quotas-capacity. |
parsing |
Represents the parsing mode for indexing from an Azure blob data source. |
pdf |
Determines algorithm for text extraction from PDF files in Azure blob storage. |
query |
Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss". |
Property Details
allowSkillsetToReadFileData
If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill.
allowSkillsetToReadFileData?: boolean
Property Value
boolean
dataToExtract
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs.
dataToExtract?: "storageMetadata" | "allMetadata" | "contentAndMetadata"
Property Value
"storageMetadata" | "allMetadata" | "contentAndMetadata"
delimitedTextDelimiter
For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|").
delimitedTextDelimiter?: string
Property Value
string
delimitedTextHeaders
For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.
delimitedTextHeaders?: string
Property Value
string
documentRoot
For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.
documentRoot?: string
Property Value
string
excludedFileNameExtensions
Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing.
excludedFileNameExtensions?: string
Property Value
string
executionEnvironment
Specifies the environment in which the indexer should execute.
executionEnvironment?: "standard" | "private"
Property Value
"standard" | "private"
failOnUnprocessableDocument
For Azure blobs, set to false if you want to continue indexing if a document fails indexing.
failOnUnprocessableDocument?: boolean
Property Value
boolean
failOnUnsupportedContentType
For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.
failOnUnsupportedContentType?: boolean
Property Value
boolean
firstLineContainsHeaders
For CSV blobs, indicates that the first (non-blank) line of each blob contains headers.
firstLineContainsHeaders?: boolean
Property Value
boolean
imageAction
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer.
imageAction?: "none" | "generateNormalizedImages" | "generateNormalizedImagePerPage"
Property Value
"none" | "generateNormalizedImages" | "generateNormalizedImagePerPage"
indexedFileNameExtensions
Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.
indexedFileNameExtensions?: string
Property Value
string
indexStorageMetadataOnlyForOversizedDocuments
For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://docs.microsoft.com/azure/search/search-limits-quotas-capacity.
indexStorageMetadataOnlyForOversizedDocuments?: boolean
Property Value
boolean
parsingMode
Represents the parsing mode for indexing from an Azure blob data source.
parsingMode?: "text" | "default" | "delimitedText" | "json" | "jsonArray" | "jsonLines"
Property Value
"text" | "default" | "delimitedText" | "json" | "jsonArray" | "jsonLines"
pdfTextRotationAlgorithm
Determines algorithm for text extraction from PDF files in Azure blob storage.
pdfTextRotationAlgorithm?: "none" | "detectAngles"
Property Value
"none" | "detectAngles"
queryTimeout
Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss".
queryTimeout?: string
Property Value
string