How to index JSON array where I have text and embedding inside?

Sherwin Shoujing ZHU 0 Reputation points
2024-11-19T09:02:19.3333333+00:00

Hi,

My JSON is format as JSON array looks like {[{"chunk_id": "C:/Users/chunk_path", "parent_id": "C:/Users/parent_document_path", "chunk": "text", "title": "filename.xlsx", "text_vector": [a_series_number_of_embedings]},[{"chunk_id": "C:/Users/chunk_path", "parent_id": "C:/Users/parent_document_path", "chunk": "text", "title": "filename.xlsx", "text_vector": [a_series_number_of_embedings]}]. The embedding is generated using text data by text-embedding-ada-002. Could someone suggest what should I do next to use Azure AI search for my data and embeddings? I tried the portal to import JSON arrays from blob however received error message 'Sampling data source: Error detecting index schema from data source: "The requested name is valid, but no data of the requested type was found."'.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,083 questions
{count} votes

1 answer

Sort by: Most helpful
  1. SnehaAgrawal-MSFT 21,851 Reputation points
    2024-11-19T13:19:00.61+00:00

    @Sherwin Shoujing ZHU Thanks for reaching here!

    We need more details to better understand the approach being taken. If the "import and vectorize data" option is used, the system will automatically generate the chunks and chunk_ids for proper functionality.

    However, if the "import data" option is being used, it does not support vector mapping. This makes it unclear if the wizard is being utilized in a way that aligns with its intended functionality.

    If the goal is to import pre-chunked data, it would require a programmatic method to map the chunks as individual documents, rather than relying on the one-document-to-many structure that the "import and vectorize data" feature provides.

    In this case, a regular indexer should be set up without integrated vectorization or skillsets, assuming all the required chunks and data are already prepared. Field mappings can then be configured as necessary, ensuring they correspond to the correct fields in the schema.

    Search over JSON blobs - Azure AI Search | Microsoft Learn

    https://learn.microsoft.com/en-us/azure/search/search-indexer-field-mappings

    Please let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.