Hi @sayen vv
Welcome to the Microsoft Q&A and thank you for posting your questions here.
To extract flowchart shapes like process, decision, etc., from a PDF using Azure Document Intelligence, you can use the Custom Extraction Model. Here are some key steps:
- Labeling Shapes: Create labels for each shape type (e.g., "Process", "Decision", "Connector") and apply them consistently across all training documents.
- Handling Multiple Nodes: Label each instance of shapes separately (e.g., "Process1", "Process2") to help the model distinguish between them.
- Using the Azure Labeling Tool: Manually label shapes by drawing bounding boxes and assigning appropriate labels. Consistency is crucial for accurate model training.
- Alternative Methods: Consider using prebuilt models or auto-labeling features to assist with labeling and improve consistency.
- Training and Testing: Train the custom model with labeled documents and test it with new flowchart documents. Refine labels or provide additional training data if needed.
For detailed guidance, refer to the Azure Document Intelligence documentation.
- Read model OCR data extraction - Document Intelligence - Azure AI services | Microsoft Learn
- What's new in Document Intelligence - Azure AI services | Microsoft Learn
If the reply was helpful please don't forget to upvote and/or accept as answer, this can be beneficial to other community members.
Thanks