Hi 이소영,
Please find insights based on inquiry points,
- Multiple Schemas in a Single Index
- Union Schema: Create a single index whose schema is the union of all possible fields across your categories. Documents that don’t use a field can simply leave it empty.
- Separate Indexes: Alternatively, you might decide to maintain separate indexes per category if the schemas are drastically different. This approach can simplify filtering and faceting since each index contains a consistent set of fields.
- Considerations for Filtering and Faceting: Any field you want to use for filtering or faceting must be explicitly defined in the index schema. If you choose a union schema, you must ensure that the fields you plan to filter on (such as title, owner, etc.) are consistently defined even if some documents do not use them.
- Managing Vector Data with Various Dimensions
- Multiple Vector Fields: Define separate vector fields in the same index for each embedding model/dimension. For example, if one model outputs 768-dimensional vectors and another 1024-dimensional vectors, you would add two fields accordingly. When indexing a document, you populate only the vector field that corresponds to its embedding.
- Preprocessing to a Common Dimension: As an alternative, you might consider using a post-processing step (e.g., dimensionality reduction, zero-padding, or truncation) to standardize the output vectors to a common dimension. This approach would allow you to use a single vector field, but be sure the transformation does not adversely affect your search quality.
- Note: There isn’t a built-in feature to dynamically handle vectors of varying dimensions in a single field. The multi-field approach is the standard method given the current product capabilities.
- Flexible Storage and Index Count Management
- Fixed Tier Limits: The quotas (storage and index count) are fixed based on the pricing tier you choose. If you require a higher index count or different storage characteristics, the typical approach is to move to a different tier (e.g., Standard S3), but that may come at a higher cost.
- Multi-Service Approach: If you need to stay on a lower-cost plan like S1 but want to effectively support more indexes with smaller sizes, you can run multiple search service instances. This way, you can distribute your indexes across services, each with its own quota.
- Index Consolidation: Evaluate whether you can consolidate multiple logical collections into a single index by introducing a discriminator field. This reduces the number of indexes you need to manage but may require additional filtering logic at query time.
- Note: There isn’t a built-in configuration to allow a “smaller” storage size with a higher index count on a per-service basis within a single tier. The best strategy is either to consolidate indexes where possible or distribute them across multiple service instances.
https://learn.microsoft.com/en-us/azure/search/vector-search-overview
https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity
If you have any further assistant, do let me know.
If the answer is helpful, please click Accept Answer and kindly upvote it so that other people who faces similar issue may get benefitted from it.