Is any AI module smart enough to tell me if a table spanning multiple pages is the same

Bijaya Rai 40 Reputation points
2024-12-23T11:07:56.7466667+00:00

Is any AI module smart enough to tell me if a table spanning multiple pages is part of a single table?

I have a few use cases:

  • A table spans to a new page and the table header also exists on the new page.
  • A table spans to a new page but the header is only present on the first page.
  • A document uses 2 column structure, e.g. journal or academic paper, and the table spans from the end of the first column to the start of the next columns.
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,812 questions
0 comments No comments
{count} votes

Accepted answer
  1. santoshkc 11,530 Reputation points Microsoft Vendor
    2024-12-23T12:26:36.0466667+00:00

    Hi @Bijaya Rai,

    Thank you for posting your query. Azure Document Intelligence is well-equipped to handle multi-page tables. The prebuilt layout model can detect tables that span multiple pages, even if headers are repeated or missing. The model uses spatial information to preserve table structures, ensuring continuity across pages.

    For more complex scenarios, such as multi-column layouts in journals or academic papers, a custom extraction model can be trained. This allows you to define relationships between rows and columns explicitly, making it easier to handle cases where tables span columns or lack repeated headers.

    Custom models provide additional flexibility to address irregular structures like merged cells and varying header formats. This approach ensures high accuracy for extracting structured data from complex documents.

    I hope you understand. And, if you have any further query do let us know.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.