Text Analysis in PDF Documents with Azure AI Services

SIMON Witali 1

Hello everyone,

I am working on a project where I regularly handle extensive collections of PDF documents filled with specific technical terms and abbreviations. These documents also vary greatly in layout and formulation, presenting a unique challenge for text analysis.

After attempting a Retrieval Augmented Generation (RAG) solution and facing challenges with Azure Search, I’ve started exploring Azure Document Intelligence, which seems promising. However, I am exploring broader options and seeking more generalized advice.

Tutorial Link: https://learn.microsoft.com/en-us/training/modules/use-own-data-azure-openai/

I am looking for ideas on which Azure functions could assist me further and what a potential target architecture might look like for such a system. Specifically, I aim to:

Cluster documents by technology using a predefined list of terms and abbreviations.
Search documents for specific terms effectively.
Develop a Q&A system to answer general questions about the document contents.

If anyone has experience with similar challenges or could suggest specific Azure functions that could be useful for these needs, I would greatly appreciate your insights or any references to helpful resources.

Thank you in advance for your support!

Share via

Text Analysis in PDF Documents with Azure AI Services

Your answer