[Azure AI Search] Index Management

Question

[Azure AI Search] Index Management

이소영 20

Hello! I have some questions about implementing a search service through Azure AI Search.**

[Requirements for the service to be developed]**

There are multiple categories (ex> HR, purchase, etc.) in each project, and multiple documents are managed by category
The metadata managed by category is different, and filtering, facet, etc. search conditions should be applied to each metadata item.
Each document can be embedded with a different embedding model
There are not many documents that use the same schema and embedding model.
The overall amount of data that needs to be embedded is not large.

**
[Inquiry]**

(Support for multiple schemas for the same index)

I would like to inquire about whether multiple schemas are supported in one index of Azure AI Search.

A field for storing metadata is also required in the schema,

and when searching, filtering and facet processing should be possible for each metadata field (ex> title, owner).

(Is it possible to manage vector data with various dimensions vectorized through various embedding models within the same index?)

I would like to ask how I can manage vector data with various dimensions vectorized through various embedding models within a single index.

I would like to ask if there is any other efficient way other than defining vector fields by dimension within a single index and storing/retrieving data in different vector fields by dimension.

(Flexible management of storage size and maximum number of indexes per service)

I would like to ask if there is a way to flexibly manage the allowed storage size and maximum number of indexes per service.

If it is not possible to manage flexibly and is only possible with a fixed number, I would like to ask how to manage it with a fixed number but support a smaller size and a larger number of indexes per storage. (Currently, the Azure AI Search in use is the Standard S1 plan, with 160GB of storage per service and a maximum index count of 50.

I want to use a smaller storage size and a larger maximum index per service. I would like to ask if there is another way other than moving to an expensive plan like Standard S3.)

In addition, if there is an efficient way to configure Azure AI Search to meet the service requirements to be developed above, please share it.

1 answer

Your answer

Answer 1

Hi 이소영,

Please find insights based on inquiry points,

Multiple Schemas in a Single Index

Union Schema: Create a single index whose schema is the union of all possible fields across your categories. Documents that don’t use a field can simply leave it empty.
Separate Indexes: Alternatively, you might decide to maintain separate indexes per category if the schemas are drastically different. This approach can simplify filtering and faceting since each index contains a consistent set of fields.
Considerations for Filtering and Faceting: Any field you want to use for filtering or faceting must be explicitly defined in the index schema. If you choose a union schema, you must ensure that the fields you plan to filter on (such as title, owner, etc.) are consistently defined even if some documents do not use them.

Managing Vector Data with Various Dimensions

Multiple Vector Fields: Define separate vector fields in the same index for each embedding model/dimension. For example, if one model outputs 768-dimensional vectors and another 1024-dimensional vectors, you would add two fields accordingly. When indexing a document, you populate only the vector field that corresponds to its embedding.
Preprocessing to a Common Dimension: As an alternative, you might consider using a post-processing step (e.g., dimensionality reduction, zero-padding, or truncation) to standardize the output vectors to a common dimension. This approach would allow you to use a single vector field, but be sure the transformation does not adversely affect your search quality.
Note: There isn’t a built-in feature to dynamically handle vectors of varying dimensions in a single field. The multi-field approach is the standard method given the current product capabilities.

Flexible Storage and Index Count Management

Fixed Tier Limits: The quotas (storage and index count) are fixed based on the pricing tier you choose. If you require a higher index count or different storage characteristics, the typical approach is to move to a different tier (e.g., Standard S3), but that may come at a higher cost.
Multi-Service Approach: If you need to stay on a lower-cost plan like S1 but want to effectively support more indexes with smaller sizes, you can run multiple search service instances. This way, you can distribute your indexes across services, each with its own quota.
Index Consolidation: Evaluate whether you can consolidate multiple logical collections into a single index by introducing a discriminator field. This reduces the number of indexes you need to manage but may require additional filtering logic at query time.
Note: There isn’t a built-in configuration to allow a “smaller” storage size with a higher index count on a per-service basis within a single tier. The best strategy is either to consolidate indexes where possible or distribute them across multiple service instances.

https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-create-index?tabs=config-2024-07-01%2Crest-2024-07-01%2Cpush%2Cportal-check-index

https://learn.microsoft.com/en-us/azure/search/vector-search-overview

https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity

If you have any further assistant, do let me know.

If the answer is helpful, please click Accept Answer and kindly upvote it so that other people who faces similar issue may get benefitted from it.

Share via

[Azure AI Search] Index Management

1 answer

Your answer