Azure Document Intelligence unable to read XFA PDFs

mv_gibson 1 Reputation point
2025-01-24T14:11:55.94+00:00

I am running into an issue where the Azure AI Document Intelligence interface is unable to read certain PDF files and displays "To view the full contents of this document, you need a later version of the PDF viewer...." I believe this is only happing for newer versions of PDFs which are in the XFA (XML Form Architecture).

Has anyone ran into this issue? How did you get around the problem?

Thanks so much,

Mike

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,895 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Pavankumar Purilla 3,005 Reputation points Microsoft Vendor
    2025-01-24T19:03:54.6366667+00:00

    Hi mv_gibson,

    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    Yes, it's possible that the Azure Document Intelligence service is unable to read XFA PDFs. XFA is a proprietary format that is not fully supported by all PDF viewers and parsers.

    To work around this issue, you can try converting the XFA PDFs to a different format that is supported by the Azure Document Intelligence service, such as a standard PDF or an image format like PNG or JPEG. There are many tools available that can convert PDFs to different formats, such as Adobe Acrobat, Ghostscript, or ImageMagick.

    Alternatively, you can try using a different PDF parsing library or service that supports XFA PDFs. There are many open source and commercial PDF parsing libraries available that support XFA, such as iText, PDFTron, or Apache PDFBox.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.