Azure Document Intelligence Behavior and Interface Questions

Tommy He 5 Reputation points
2025-02-20T02:47:16.56+00:00

A few questions regarding Azure Document Intelligence:

  1. To confirm, is https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0&tabs=rest%2Csample-code the only main documentation site? It would be great for more details like regarding the interface design, expected ways of use. I know of quite a few YC companies using it all dissatisfied with the docs - and I think it's super useful as a good pdf parser especially with AI interfaces nowadays!
  2. Are all bounding regions guaranteed to come from the same page when they all come from the same paragraph, table, or figure?
  3. Are all polygons expressed with alternating X and Y coordinates within the bounding boxes?
  4. I want to get a list of all the textual data and figures i read order from the document. What's the expected way to iterate over the API to achieve this? I'm currently looping through all sections of the result and processing the paragraphs, tables, and figures (skipping sections). It seems to go in read order but not cover all pieces of text - we end up missing some headers.
  5. Will this regex match all the ways for a section type / index to be given and is it the expected way to do so? We do that and then index into each section type and have only seen these so far: class DocumentSectionType(StrEnum):
    PARAGRAPHS = "paragraphs"
    
    TABLES = "tables"
    
    FIGURES = "figures"
    
    SECTIONS = "sections"
    
    /([^/]+)/(\d+)
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,933 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Chakaravarthi Rangarajan Bhargavi 950 Reputation points MVP
    2025-02-20T03:07:31.7+00:00

    Hi Tommy He,

    Welcome to Microsoft Q&A forum! Thanks for your question. I'll address your queries regarding Azure Document Intelligence and provide references to the documentation where applicable.

    Question 1 : Is this the only main documentation site?

    Yes, the primary documentation for Azure Document Intelligence is: 🔗 Azure Document Intelligence Documentation

    For API and SDK references, check: 🔗 Azure SDK for Document Intelligence (GitHub)

    While this covers most functionalities, if you're looking for deeper interface details, consider exploring the Azure AI Services Blog for additional insights.

    Question 2: Are all bounding regions from the same page when extracted from the same paragraph, table, or figure?

    Not necessarily. While paragraphs and tables usually belong to a single page, tables and figures spanning multiple pages might have bounding boxes across different pages. You can verify the page number by checking the pageNumber attribute in the response JSON.

    Example:

    {
      "paragraphs": [
        {
          "content": "Sample text",
          "boundingRegions": [
            {
              "pageNumber": 1,
              "polygon": [ ... ]
            }
          ]
        }
      ]
    }
    

    Reference: 🔗 Azure Document Intelligence Layout Model

    Question 3: Are polygons expressed with alternating X and Y coordinates?

    Yes, the polygon array contains alternating X and Y coordinates, defining the bounding region of an extracted element. Example:

    "polygon": [ 100, 200, 150, 200, 150, 250, 100, 250 ]
    

    This represents four points defining the bounding box.

    Question 4: How can I extract all text and figures in reading order?

    The prebuilt layout model structures text in reading order. However, headers and footers may not always be captured. Your best approach is:

    1. Loop through all paragraphs, tables, and figures in the response.
    2. Sort by boundingRegions.pageNumber to maintain order.

    Example Python code:

    for page in result["pages"]:
        for paragraph in page.get("paragraphs", []):
            print(paragraph["content"])
        for table in page.get("tables", []):
            print(table["cells"])
    

    Reference: 🔗 Extracting Text Using Azure Document Intelligence

    Question 5: Regex to match section types

    Your regex /([^/]+)/(\d+)/ should work fine for extracting section types and indices. The expected document structure aligns with:

    class DocumentSectionType:
        PARAGRAPHS = "paragraphs"
        TABLES = "tables"
        FIGURES = "figures"
        SECTIONS = "sections"
    

    If you're missing elements, consider using debugging logs to validate section mappings.

    For Further references:

    Hope this helps! Please try these suggestions and let me know if you need further assistance.

    Regards,

    Chakravarthi Rangarajan Bhargavi

    • Please accept the answer and vote 'Yes' if you find it helpful. This helps support the community. Thanks!
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.