Issue with Extracting Table with Merged Cells in Azure Document Intelligence Custom Model

Udit Sati 0 Reputation points
2025-02-10T09:59:34.21+00:00

Hi Community
I have trained a Custom AI Model in Azure Document Intelligence to extract tables from PDFs. The model works well for most tables, but it's failing to extract one specific table that contains:

  • Merged cells in the header
  • Multi-line text in some columns
  • Arrows and phase indicators above the table that I don't need

When I test the model using Power Automate, I don’t get any JSON output for this table. Other tables in the same document are extracted correctly.
User's image

here is the sample of the table i need to extract (first 4 columns)

Troubleshooting Steps I Tried:

✔ Trained the model with multiple variations of the table. ✔ Enabled "Advanced Table Extraction" mode. ✔ Ensured proper labeling during model training. ✔ Checked if the issue is related to Power Automate by testing in the Azure AI Studio directly.

Question:

  • How can I improve table extraction for merged cells?
  • Is there a way to filter out non-table elements (like arrows) automatically before AI processing?
  • Should I preprocess the document using OCR in Power Automate to extract clean text first?

Any insights or suggestions would be greatly appreciated! 🚀

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,638 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,111 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vikram Singh 1,550 Reputation points Microsoft Employee
    2025-02-11T07:53:00.2+00:00

    Hi Udit Sati,

    Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    It sounds like you've put in a lot of effort to train your custom AI model in Azure Document Intelligence. Here are some suggestions to address the issues you're facing:

    How can I improve table extraction for merged cells?

    Ensure that your training set includes diverse samples of tables with merged cells. Explicitly annotate row and column boundaries to cover edge cases. Consider using fixed table fields for structured layouts, as they provide stricter column mapping.

    • Train your model with diverse table variations, including merged and non-merged headers.
    • Use the Prebuilt Layout Model (prebuilt-layout), which has better table handling than custom models.
    • Post-process extracted data using Python (Pandas) to reconstruct tables if needed. Microsoft Docs: Prebuilt Layout Model

    Is there a way to filter out non-table elements (like arrows) automatically before AI processing?

    You can preprocess the document to remove non-table elements like arrows and phase indicators. This can be done using custom scripts or tools that clean up the document before feeding it into the AI model.

    • Preprocess documents with Azure AI Vision or OpenCV to remove non-table elements before processing.
    • If unwanted elements are outside the table, consider the Prebuilt Key-Value Model (prebuilt-invoice, prebuilt-receipt) for structured data extraction. Microsoft Docs: Optimize Document Preprocessing

    Should I preprocess the document using OCR in Power Automate to extract clean text first?

    Preprocessing the document using OCR in Power Automate can help extract clean text and improve the accuracy of table extraction. This step can ensure that the OCR quality is high, which is crucial for scanned PDFs.

    • While Azure Document Intelligence includes OCR, using Azure AI Vision OCR in Power Automate may enhance text clarity before extraction.
    • Compare results from the Document Intelligence OCR Model (prebuilt-read) and your custom model to identify the best approach. Microsoft Docs: Azure AI Vision OCR

    I hope these suggestions help improve your model's performance. If you have any further questions or need more assistance, feel free to ask!

    If the response helped, please do click Accept Answer and Yes for was this answer helpful.

    Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.

    Thank You


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.