Text Analysis in PDF Documents with Azure AI Services

SIMON Witali 1 Reputation point
2025-01-23T19:41:45.4866667+00:00

Hello everyone,

I am working on a project where I regularly handle extensive collections of PDF documents filled with specific technical terms and abbreviations. These documents also vary greatly in layout and formulation, presenting a unique challenge for text analysis.

After attempting a Retrieval Augmented Generation (RAG) solution and facing challenges with Azure Search, I’ve started exploring Azure Document Intelligence, which seems promising. However, I am exploring broader options and seeking more generalized advice.

Tutorial Link: https://learn.microsoft.com/en-us/training/modules/use-own-data-azure-openai/

I am looking for ideas on which Azure functions could assist me further and what a potential target architecture might look like for such a system. Specifically, I aim to:

  1. Cluster documents by technology using a predefined list of terms and abbreviations.
  2. Search documents for specific terms effectively.
  3. Develop a Q&A system to answer general questions about the document contents.

If anyone has experience with similar challenges or could suggest specific Azure functions that could be useful for these needs, I would greatly appreciate your insights or any references to helpful resources.

Thank you in advance for your support!

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,165 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,578 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,878 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,070 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.