Indexing Multilingual Documents with Azure AI Foundry: OCR, Language Detection, and Translation
Hey everyone,
I have a few questions about document indexing. I'm using Azure AI Foundry to build an AI chatbot and have uploaded my documents to blob storage in Azure AI Studio (Hub resources from the Foundry directly).
My documents are a mix of types including PDFs, Excel files, Word documents, and images. Some of these, like images and PDFs, contain text in non-English languages. When I choose to do the indexing (considering vector indexing), will it automatically handle OCR (document intelligence to extract details from those images/PDFs), language detection, and translation to the target language? For example, some of the images have French text, and I want to extract and convert this text to English before creating embeddings.
Can anyone confirm if this process is supported?
Thanks in advance for your help!