Document Intelligence Service .DOCX UnsupportedContent Error

Jason Bassos 0 Reputation points
2024-09-06T12:52:17.74+00:00

I'm attempting to use the 2024-02-29-preview prebuilt-layout model to extract data from a .docx document, which according to the documentation is supported by this model and version.

For most Word documents, everything is working fine, however I noticed that for a particular .docx I receive an UnsupportedContent error:

{
    "status": "failed",
    "createdDateTime": "2024-09-05T21:21:07Z",
    "lastUpdatedDateTime": "2024-09-05T21:21:07Z",
    "error": {
        "code": "InvalidRequest",
        "message": "Invalid request.",
        "details": [
            {
                "code": "UnsupportedContent",
                "message": "Content is not supported: The input content is corrupted or format is invalid.",
                "target": "0"
            }
        ]
    }
}

Prior to submitting the document, if I open the file in Word and Save As, then the document is able to be processed successfully. This leads me to believe that there is some kind of incompatibility with this specific document file. Once it is loaded into Word and re-saved, that must fix the issue.

My question is, why isn't the Document Intelligence Service able to detect this file as a valid .docx file? Even in the Intelligence Studio, I'm able to import the document and view it, however the analysis fails with the same error.

How does the service check the file format? I can't attach the original document in question because the file type is not allowed, but I've uploaded it here: https://www.dropbox.com/scl/fi/q5n138p9btcrjkd3ton7x/GeicoAuto.docx?rlkey=2e40ygjd3ik01gsv58r1kgnlb&st=fml8m96v&dl=0

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,714 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.