How can i extract flowchart shapes like process,decision etc from a pdf using doc intelligence

sayen vv 0 Reputation points
2025-02-24T09:36:50.3466667+00:00

I have a scenario where I need to extract flowchart data, including each node, its content, and the connection details between nodes. I am considering using Azure Document Intelligence's Custom Extraction Model for this task.

Would this approach be effective for extracting such structured data from a flowchart?

If so, how should I label each shape? Since a flowchart typically contains multiple process nodes, multiple decision nodes, and various connectors, I am facing challenges in labeling all the shapes consistently. Is there a proper way to handle this within the Azure Labeling Tool, or is there an alternative method to accurately capture and extract flowchart details?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,946 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vikram Singh 1,980 Reputation points Microsoft Employee
    2025-02-25T08:28:48.5766667+00:00

    Hi @sayen vv

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    To extract flowchart shapes like process, decision, etc., from a PDF using Azure Document Intelligence, you can use the Custom Extraction Model. Here are some key steps:

    1. Labeling Shapes: Create labels for each shape type (e.g., "Process", "Decision", "Connector") and apply them consistently across all training documents.
    2. Handling Multiple Nodes: Label each instance of shapes separately (e.g., "Process1", "Process2") to help the model distinguish between them.
    3. Using the Azure Labeling Tool: Manually label shapes by drawing bounding boxes and assigning appropriate labels. Consistency is crucial for accurate model training.
    4. Alternative Methods: Consider using prebuilt models or auto-labeling features to assist with labeling and improve consistency.
    5. Training and Testing: Train the custom model with labeled documents and test it with new flowchart documents. Refine labels or provide additional training data if needed.

    For detailed guidance, refer to the Azure Document Intelligence documentation.

    If the reply was helpful please don't forget to upvote and/or accept as answer, this can be beneficial to other community members.

    Thanks


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.