Text Analysis in PDF Documents with Azure AI Services
Hello everyone,
I am working on a project where I regularly handle extensive collections of PDF documents filled with specific technical terms and abbreviations. These documents also vary greatly in layout and formulation, presenting a unique challenge for text analysis.
After attempting a Retrieval Augmented Generation (RAG) solution and facing challenges with Azure Search, I’ve started exploring Azure Document Intelligence, which seems promising. However, I am exploring broader options and seeking more generalized advice.
Tutorial Link: https://learn.microsoft.com/en-us/training/modules/use-own-data-azure-openai/
I am looking for ideas on which Azure functions could assist me further and what a potential target architecture might look like for such a system. Specifically, I aim to:
- Cluster documents by technology using a predefined list of terms and abbreviations.
- Search documents for specific terms effectively.
- Develop a Q&A system to answer general questions about the document contents.
If anyone has experience with similar challenges or could suggest specific Azure functions that could be useful for these needs, I would greatly appreciate your insights or any references to helpful resources.
Thank you in advance for your support!