How to index JSON array where I have text and embedding inside?

Question

Hi,

My JSON is format as JSON array looks like {[{"chunk_id": "C:/Users/chunk_path", "parent_id": "C:/Users/parent_document_path", "chunk": "text", "title": "filename.xlsx", "text_vector": [a_series_number_of_embedings]},[{"chunk_id": "C:/Users/chunk_path", "parent_id": "C:/Users/parent_document_path", "chunk": "text", "title": "filename.xlsx", "text_vector": [a_series_number_of_embedings]}]. The embedding is generated using text data by text-embedding-ada-002. Could someone suggest what should I do next to use Azure AI search for my data and embeddings? I tried the portal to import JSON arrays from blob however received error message 'Sampling data source: Error detecting index schema from data source: "The requested name is valid, but no data of the requested type was found."'.

Answer

@Sherwin Shoujing ZHU Thanks for reaching here!

We need more details to better understand the approach being taken. If the "import and vectorize data" option is used, the system will automatically generate the chunks and chunk_ids for proper functionality.

However, if the "import data" option is being used, it does not support vector mapping. This makes it unclear if the wizard is being utilized in a way that aligns with its intended functionality.

If the goal is to import pre-chunked data, it would require a programmatic method to map the chunks as individual documents, rather than relying on the one-document-to-many structure that the "import and vectorize data" feature provides.

In this case, a regular indexer should be set up without integrated vectorization or skillsets, assuming all the required chunks and data are already prepared. Field mappings can then be configured as necessary, ensuring they correspond to the correct fields in the schema.

Search over JSON blobs - Azure AI Search | Microsoft Learn

https://learn.microsoft.com/en-us/azure/search/search-indexer-field-mappings

Please let us know.

Share via

How to index JSON array where I have text and embedding inside?

1 answer

Your answer