How to extract Figures with labels from the image

Anchit Gupta 0 Reputation points
2024-11-21T05:36:50.6933333+00:00

page_8

Problem Statement:
I have a pdf which consists of questions with their options, and it may also consist of a figure (with or without labels) associated with those questions.

  1. I want to extract the questions and their respective options along with figures (if available).
  2. How can I extract only the figures with labels?
Azure Computer Vision
Azure Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
383 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,746 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Pavankumar Purilla 1,230 Reputation points Microsoft Vendor
    2024-11-21T17:22:21.1933333+00:00

    Hi Anchit Gupta,
    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    To address your requirement:

    To extract figures with labels from a PDF, follow these steps:

    • Convert the PDF to Images: Use a library like pdf2image to convert each page of the PDF into images for processing.
    • Use Azure Document Intelligence: Leverage the prebuilt-layout model to detect and extract text regions, questions, options, and layout elements. This will help identify figures and their associated text labels.
    • Extract Graphical Labels: For graphical or embedded text labels within figures, utilize the Azure Computer Vision Read API to perform OCR on the figure regions.
    • Optional Customization: If the figures or labels follow a unique pattern, consider training a custom model using Azure Custom Vision or Document Intelligence Custom Model for better accuracy.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.