IndexingParametersConfiguration Class
A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type.
- Inheritance
-
azure.search.documents.indexes._generated._serialization.ModelIndexingParametersConfiguration
Constructor
IndexingParametersConfiguration(*, additional_properties: Dict[str, Any] | None = None, parsing_mode: str | _models.BlobIndexerParsingMode = 'default', excluded_file_name_extensions: str = '', indexed_file_name_extensions: str = '', fail_on_unsupported_content_type: bool = False, fail_on_unprocessable_document: bool = False, index_storage_metadata_only_for_oversized_documents: bool = False, delimited_text_headers: str | None = None, delimited_text_delimiter: str | None = None, first_line_contains_headers: bool = True, document_root: str | None = None, data_to_extract: str | _models.BlobIndexerDataToExtract = 'contentAndMetadata', image_action: str | _models.BlobIndexerImageAction = 'none', allow_skillset_to_read_file_data: bool = False, pdf_text_rotation_algorithm: str | _models.BlobIndexerPDFTextRotationAlgorithm = 'none', execution_environment: str | _models.IndexerExecutionEnvironment = 'standard', query_timeout: str = '00:05:00', **kwargs: Any)
Keyword-Only Parameters
Name | Description |
---|---|
additional_properties
|
Unmatched properties from the message are deserialized to this collection. |
parsing_mode
|
Represents the parsing mode for indexing from an Azure blob data source. Known values are: "default", "text", "delimitedText", "json", "jsonArray", and "jsonLines". Default value: default
|
excluded_file_name_extensions
|
Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing. |
indexed_file_name_extensions
|
Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types. |
fail_on_unsupported_content_type
|
For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance. |
fail_on_unprocessable_document
|
For Azure blobs, set to false if you want to continue indexing if a document fails indexing. |
index_storage_metadata_only_for_oversized_documents
|
For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://learn.microsoft.com/azure/search/search-limits-quotas-capacity. |
delimited_text_headers
|
For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index. |
delimited_text_delimiter
|
For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|"). |
first_line_contains_headers
|
For CSV blobs, indicates that the first (non-blank) line of each blob contains headers. Default value: True
|
document_root
|
For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property. |
data_to_extract
|
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. Known values are: "storageMetadata", "allMetadata", and "contentAndMetadata". Default value: contentAndMetadata
|
image_action
|
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. Known values are: "none", "generateNormalizedImages", and "generateNormalizedImagePerPage". Default value: none
|
allow_skillset_to_read_file_data
|
If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill. |
pdf_text_rotation_algorithm
|
Determines algorithm for text extraction from PDF files in Azure blob storage. Known values are: "none" and "detectAngles". Default value: none
|
execution_environment
|
Specifies the environment in which the indexer should execute. Known values are: "standard" and "private". Default value: standard
|
query_timeout
|
Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss". Default value: 00:05:00
|
Variables
Name | Description |
---|---|
additional_properties
|
Unmatched properties from the message are deserialized to this collection. |
parsing_mode
|
Represents the parsing mode for indexing from an Azure blob data source. Known values are: "default", "text", "delimitedText", "json", "jsonArray", and "jsonLines". |
excluded_file_name_extensions
|
Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing. |
indexed_file_name_extensions
|
Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types. |
fail_on_unsupported_content_type
|
For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance. |
fail_on_unprocessable_document
|
For Azure blobs, set to false if you want to continue indexing if a document fails indexing. |
index_storage_metadata_only_for_oversized_documents
|
For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://learn.microsoft.com/azure/search/search-limits-quotas-capacity. |
delimited_text_headers
|
For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index. |
delimited_text_delimiter
|
For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|"). |
first_line_contains_headers
|
For CSV blobs, indicates that the first (non-blank) line of each blob contains headers. |
document_root
|
For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property. |
data_to_extract
|
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. Known values are: "storageMetadata", "allMetadata", and "contentAndMetadata". |
image_action
|
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. Known values are: "none", "generateNormalizedImages", and "generateNormalizedImagePerPage". |
allow_skillset_to_read_file_data
|
If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill. |
pdf_text_rotation_algorithm
|
Determines algorithm for text extraction from PDF files in Azure blob storage. Known values are: "none" and "detectAngles". |
execution_environment
|
Specifies the environment in which the indexer should execute. Known values are: "standard" and "private". |
query_timeout
|
Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss". |
Methods
as_dict |
Return a dict that can be serialized using json.dump. Advanced usage might optionally use a callback as parameter: Key is the attribute name used in Python. Attr_desc is a dict of metadata. Currently contains 'type' with the msrest type and 'key' with the RestAPI encoded key. Value is the current value in this object. The string returned will be used to serialize the key. If the return type is a list, this is considered hierarchical result dict. See the three examples in this file:
If you want XML serialization, you can pass the kwargs is_xml=True. |
deserialize |
Parse a str using the RestAPI syntax and return a model. |
enable_additional_properties_sending | |
from_dict |
Parse a dict using given key extractor return a model. By default consider key extractors (rest_key_case_insensitive_extractor, attribute_key_case_insensitive_extractor and last_rest_key_case_insensitive_extractor) |
is_xml_model | |
serialize |
Return the JSON that would be sent to server from this model. This is an alias to as_dict(full_restapi_key_transformer, keep_readonly=False). If you want XML serialization, you can pass the kwargs is_xml=True. |
as_dict
Return a dict that can be serialized using json.dump.
Advanced usage might optionally use a callback as parameter:
Key is the attribute name used in Python. Attr_desc is a dict of metadata. Currently contains 'type' with the msrest type and 'key' with the RestAPI encoded key. Value is the current value in this object.
The string returned will be used to serialize the key. If the return type is a list, this is considered hierarchical result dict.
See the three examples in this file:
attribute_transformer
full_restapi_key_transformer
last_restapi_key_transformer
If you want XML serialization, you can pass the kwargs is_xml=True.
as_dict(keep_readonly: bool = True, key_transformer: ~typing.Callable[[str, ~typing.Dict[str, ~typing.Any], ~typing.Any], ~typing.Any] = <function attribute_transformer>, **kwargs: ~typing.Any) -> MutableMapping[str, Any]
Parameters
Name | Description |
---|---|
key_transformer
|
<xref:function>
A key transformer function. |
keep_readonly
|
Default value: True
|
Returns
Type | Description |
---|---|
A dict JSON compatible object |
deserialize
Parse a str using the RestAPI syntax and return a model.
deserialize(data: Any, content_type: str | None = None) -> ModelType
Parameters
Name | Description |
---|---|
data
Required
|
A str using RestAPI structure. JSON by default. |
content_type
Required
|
JSON by default, set application/xml if XML. Default value: None
|
Returns
Type | Description |
---|---|
An instance of this model |
Exceptions
Type | Description |
---|---|
DeserializationError if something went wrong
|
enable_additional_properties_sending
enable_additional_properties_sending() -> None
from_dict
Parse a dict using given key extractor return a model.
By default consider key extractors (rest_key_case_insensitive_extractor, attribute_key_case_insensitive_extractor and last_rest_key_case_insensitive_extractor)
from_dict(data: Any, key_extractors: Callable[[str, Dict[str, Any], Any], Any] | None = None, content_type: str | None = None) -> ModelType
Parameters
Name | Description |
---|---|
data
Required
|
A dict using RestAPI structure |
content_type
Required
|
JSON by default, set application/xml if XML. Default value: None
|
key_extractors
Required
|
Default value: None
|
Returns
Type | Description |
---|---|
An instance of this model |
Exceptions
Type | Description |
---|---|
DeserializationError if something went wrong
|
is_xml_model
is_xml_model() -> bool
serialize
Return the JSON that would be sent to server from this model.
This is an alias to as_dict(full_restapi_key_transformer, keep_readonly=False).
If you want XML serialization, you can pass the kwargs is_xml=True.
serialize(keep_readonly: bool = False, **kwargs: Any) -> MutableMapping[str, Any]
Parameters
Name | Description |
---|---|
keep_readonly
|
If you want to serialize the readonly attributes Default value: False
|
Returns
Type | Description |
---|---|
A dict JSON compatible object |