Advise on the Table with Dynamic header / double header

Chiew Lerk Qing 0 Reputation points
2025-03-03T08:47:56.15+00:00

I has a table as following:User's image

I am using the "Custom Extraction Model" to extract the data from above table as following: User's image

However, on "test" after training the model, it returns the following where all data cluttered into single row and other rows are left empty as following:
User's image
User's image

Is there better suggestion to do the "label data" to extract the data or there is something that can modify such situation?
Note:

  1. The MM-yyyy on top header is dynamic.
  2. Conduct a few different strategy (like use "region" instead of the data) but still facing same issue.
  3. There are several other tables in the same documents of similar format but didn't face this issue.
  4. Free-tier, API ver: 2024-11-30.

Separately, if we subscribe , will there any technical support provided if we faced some issues other than raise it in this platform?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,958 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Chakaravarthi Rangarajan Bhargavi 1,030 Reputation points MVP
    2025-03-03T17:34:32.9733333+00:00

    Hi Chiew Lerk Qing,

    Thanks for the question and welcome to Microsoft Q&A.

    It looks like you're trying to extract tabular data where the header (MM-yyyy) is dynamic and facing issues with labeling data. Here are few which can help you with the expected results that you ask for

    Use "Key-Value Pair Extraction" Instead of Labeling Data:

    • Instead of relying on labeled data, use Azure AI Document Intelligence's key-value pair extraction to dynamically detect column headers and map them.
    • Reference: 🔗 Extract tables and key-value pairs

    Region-Based Extraction (But Define Specific Anchors):

    • Since "region" didn’t work well, try defining fixed reference points in the document, like a specific static text near your table, to ensure consistent extraction.
    • Reference: 🔗 Region-based form recognition

    Dynamic Table Header Handling (Regex for MM-yyyy Format):

    • Since headers change dynamically, implement a custom pre-processing step to detect and normalize headers before passing to the AI model.
    • Example: Use regex \b(0[1-9]|1[0-2])-\d{4}\b to extract MM-yyyy headers programmatically before AI processing.

    Custom Model Training for Complex Tables:

    • If standard models fail, train a custom Document Intelligence model with labeled datasets covering various formats.
    • Reference: 🔗 Train a custom model

    Regarding the Subscription & Support: Yes, if you subscribe to a paid tier, you get access to technical support beyond just this community forum. Azure offers Standard and Premium support plans, which provide faster response times and technical assistance via Azure Support. More details: 🔗 Azure Support Plans

    Let me know if you need further guidance! Please accept the answer and vote yes if you find the answer useful for helping the community. If you have further doubts, I would be happy to support you.

    Regards,

    Chakravarthi Rangarajan Bhargavi


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.