How to create index using custom chunking within the enrichment pipeline in Azure AI Search

Question

When using the built-in SplitSkill in azure indexer pipeline, Azure AI Search automatically provides a chunk_id field to each chunk, allowing them to be individually indexed. However, when replacing this step with a custom Web API skill that returns multiple chunks, the lack of a chunk_id at the root of each chunk prevents the indexer from creating separate documents in the index. Although the pipeline runs error-free, no documents appear in the portal because the indexing process can't form properly keyed documents from the provided JSON structure.

How can I introduce chunk_id to the pipeline without getting the wrong output type error?

Answer

Hi @Filiz Camuz
Thanks for the question and using MS Q&A platform.
1.Ensure that your custom Web API skill returns each chunk with a unique chunk_id. You can do this by modifying the JSON structure of the response.
Here’s an example of how your API should format the output:

{
 "values": [
   {
     "chunk_id": "1",
     "content": "This is the first chunk of text."
   },
   {
     "chunk_id": "2",
     "content": "This is the second chunk of text."
   }
   // Add more chunks as needed
 ]
}

2.Ensure that the form of JSON is as what Azure would expect. It should be that each chunk will be an object under the values array and that chunk_id will always be a string or number with no repeat.
3.If there is output type error, then ensure that; The data types of chunk_id and other fields match those expected by Azure. You are returning a valid JSON response with proper syntax (no trailing commas).
4.Before integrating it back into the Azure pipeline, test your API independently: Use tools like Postman or curl to send requests to your API and verify that it returns the expected JSON structure with chunk_id.
5.After confirming that your API returns the correct format, update your Azure indexer configuration if needed: Ensure the indexer is configured to handle and process the chunk_id field correctly.
references:
https://learn.microsoft.com/en-us/azure/search/tutorial-rag-build-solution-pipeline

https://learn.microsoft.com/en-us/azure/search/search-how-to-semantic-chunking

I hope this information is helpful.

Share via

How to create index using custom chunking within the enrichment pipeline in Azure AI Search

1 answer

Your answer