SplitSkill in Azure Cognitive Search retrieve chunk_id

Andrea Quarta 30 Reputation points
2025-02-04T16:07:53.2833333+00:00

When using the SplitSkill in Azure Cognitive Search, I need to know how to retrieve the unique chunk ID for each split section of the document. Since the skill divides the text into chunks (pages), I want to understand where the chunk ID is stored and how I can access it. Is the chunk ID available as metadata, or do I need to explicitly map it in the index schema?

link to skill: https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-textsplit

{
    "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
    "textSplitMode" : "pages", 
    "maximumPageLength": 1000,
    "pageOverlapLength": 100,
    "maximumPagesToTake": 1,
    "defaultLanguageCode": "en",
    "inputs": [
        {
            "name": "text",
            "source": "/document/content"
        },
        {
            "name": "languageCode",
            "source": "/document/language"
        }
    ],
    "outputs": [
        {
            "name": "textItems",
            "targetName": "mypages"
        }
    ]
}

Is the chunk ID available as metadata? how can i map it ?

User's image

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,174 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Shree Hima Bindu Maganti 2,895 Reputation points Microsoft Vendor
    2025-02-04T17:57:16.47+00:00

    Hi @Andrea Quarta
    Thanks for the question and using MS Q&A platform.
    The SplitSkill in Azure Cognitive Search does not automatically generate a unique chunk ID for each section of a document. To retrieve and store a chunk ID, you need to explicitly map it in your index schema by defining an additional output field in your skillset that captures the chunk ID as metadata.

    To achieve this, modify the outputs section of your SplitSkill configuration to include a field for the chunk ID. This field can then be mapped to your index schema, enabling you to access it later when querying the indexed data.
    Chunk large documents for vector search solutions in Azure AI Search

    Chunk and vectorize by document layout or structure
    Let me know if you have any assistances. If the answer is helpful, please click Accept Answer and kindly upvote it so that other people who faces similar issue may get benefitted from it.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.