Indexing Multilingual Documents with Azure AI Foundry: OCR, Language Detection, and Translation

Jonathanwow 0 Reputation points
2024-12-03T12:39:52.77+00:00

Hey everyone,

I have a few questions about document indexing. I'm using Azure AI Foundry to build an AI chatbot and have uploaded my documents to blob storage in Azure AI Studio (Hub resources from the Foundry directly).

My documents are a mix of types including PDFs, Excel files, Word documents, and images. Some of these, like images and PDFs, contain text in non-English languages. When I choose to do the indexing (considering vector indexing), will it automatically handle OCR (document intelligence to extract details from those images/PDFs), language detection, and translation to the target language? For example, some of the images have French text, and I want to extract and convert this text to English before creating embeddings.

demopic

Can anyone confirm if this process is supported?

Thanks in advance for your help!

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,098 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,971 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,361 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.