How can I extract both images and tables as separate sections from a PDF with Azure?

Andrii Batutin 5

Hi.

I need to get from PDF both

tables
images (like block diagrams, charts, schematics)

as separate sections

For images i need to get bounding box of the image

Also i use Azure Document Intelligence and it detect text that belongs to images, i need instead of that detect that there is a image on page, and not to have separate sections for text from images

Is there a way to do it with Azure? With Azure Vision?

santoshkc 9,400 Reputation points Microsoft Vendor

2024-01-29T12:56:11.2233333+00:00
Hi @Andrii Batutin,

Thank you for reaching out to Microsoft Q&A forum!

Regarding your queries:

Use Azure Document intelligence to extract tables from the PDF. You can train the model to recognize tables and extract the data from them.

Use Azure Computer Vision to image analysis and extract the image.

Regarding bounding boxes, you can try using Computer Vision itself.

I hope this helps! Thank you.
santoshkc 9,400 Reputation points Microsoft Vendor

2024-01-30T08:22:03.3533333+00:00

Hi @Andrii Batutin,

Following up to see if the given response was helpful. Thank you!
santoshkc 9,400 Reputation points Microsoft Vendor

2024-01-31T10:01:24.64+00:00

Hi @Andrii Batutin, We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Ali Moeen 25 Reputation points Microsoft Employee

2024-08-31T12:47:24.23+00:00

The marked answer does not focus on the actual question.

You can extract tables as markdown.

But you CANNOT extract images.

You can get the structure of the document and location of the images. Then use Python libraries like pdf2image to extract the image.
Patrick Kelly 0 Reputation points

2024-09-12T19:11:53.9766667+00:00

Hello,

I am encountering this same exact problem,

has anyone found a solution yet?
Patrick Kelly 0 Reputation points

2024-09-12T19:12:16.91+00:00

Has anyone found a solution to this yet? delete this comment

2 answers

Sedat SALMAN 14,065 Reputation points MVP

2024-01-29T12:45:49.77+00:00

Form Recognizer is a very powerful that can help you you can view and review the following article it will helo you how to build a solution for your question https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/extract-data-from-pdfs-using-form-recognizer-with-code-or/ba-p/2214299
Please sign in to rate this answer.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Share via

How can I extract both images and tables as separate sections from a PDF with Azure?

2 answers

Your answer