Hi @Nuno Rodrigues what are the typical sizes and formats of the images you'll be indexing? Are there any specific image formats you need to support (e.g., JPEG, PNG, TIFF)?
One way you can achieve your scenario is by creating a skillset with a custom skill that extracts the base64 encoded data for each image chunk during indexing. Here's a general breakdown:
- Define a skillset that includes your custom skill.
- Develop a custom skill that takes a chunk of the image data as input and outputs the base64 encoded data for that chunk.
- Associate the skillset with your Azure Search index during creation or update. This makes sure the custom skill runs on each chunk during indexing.
Another option is to leverage Azure AI Search's built-in vectorization capabilities. Here's a high-level process:
- Configure your indexer to use integrated vectorization.
- Consider using the built-in "Text Split" skill to split large documents by content boundaries before vectorization.
- Choose an appropriate embedding model for image vectorization.
- The indexing process automatically chunks large images, extracts image features using the embedding model, and indexes them for efficient retrieval.
Best,
-Grace