How to create index using custom chunking within the enrichment pipeline in Azure AI Search

Filiz Camuz 0

When using the built-in SplitSkill in azure indexer pipeline, Azure AI Search automatically provides a chunk_id field to each chunk, allowing them to be individually indexed. However, when replacing this step with a custom Web API skill that returns multiple chunks, the lack of a chunk_id at the root of each chunk prevents the indexer from creating separate documents in the index. Although the pipeline runs error-free, no documents appear in the portal because the indexing process can't form properly keyed documents from the provided JSON structure.

How can I introduce chunk_id to the pipeline without getting the wrong output type error?

Laxman Reddy Revuri 1,130 Reputation points Microsoft Vendor

2024-12-20T04:35:59.5733333+00:00
Hi @Filiz Camuz
Thanks for the question and using MS Q&A platform.
1.Ensure that your custom Web API skill returns each chunk with a unique chunk_id. You can do this by modifying the JSON structure of the response.
Here’s an example of how your API should format the output:

{ "values": [ { "chunk_id": "1", "content": "This is the first chunk of text." }, { "chunk_id": "2", "content": "This is the second chunk of text." } // Add more chunks as needed ] }

2.Ensure that the form of JSON is as what Azure would expect. It should be that each chunk will be an object under the values array and that chunk_id will always be a string or number with no repeat.
3.If there is output type error, then ensure that; The data types of chunk_id and other fields match those expected by Azure. You are returning a valid JSON response with proper syntax (no trailing commas).
4.Before integrating it back into the Azure pipeline, test your API independently: Use tools like Postman or curl to send requests to your API and verify that it returns the expected JSON structure with chunk_id.
5.After confirming that your API returns the correct format, update your Azure indexer configuration if needed: Ensure the indexer is configured to handle and process the chunk_id field correctly.
references:
https://learn.microsoft.com/en-us/azure/search/tutorial-rag-build-solution-pipeline

https://learn.microsoft.com/en-us/azure/search/search-how-to-semantic-chunking

I hope this information is helpful.
Laxman Reddy Revuri 1,130 Reputation points Microsoft Vendor

2024-12-23T00:55:55.8433333+00:00

Hi @Filiz Camuz
Following up to see if you have chance to check my previous response and help us with requested information to check and assist you further on this.
Laxman Reddy Revuri 1,130 Reputation points Microsoft Vendor

2024-12-24T05:56:54.2966667+00:00

Hi @Filiz Camuz
Following up to see if you have chance to check my previous response and help us with requested information to check and assist you further on this.

Share via

How to create index using custom chunking within the enrichment pipeline in Azure AI Search

Your answer