Document Intelligence Service .DOCX UnsupportedContent Error
I'm attempting to use the 2024-02-29-preview
prebuilt-layout
model to extract data from a .docx
document, which according to the documentation is supported by this model and version.
For most Word documents, everything is working fine, however I noticed that for a particular .docx
I receive an UnsupportedContent
error:
{
"status": "failed",
"createdDateTime": "2024-09-05T21:21:07Z",
"lastUpdatedDateTime": "2024-09-05T21:21:07Z",
"error": {
"code": "InvalidRequest",
"message": "Invalid request.",
"details": [
{
"code": "UnsupportedContent",
"message": "Content is not supported: The input content is corrupted or format is invalid.",
"target": "0"
}
]
}
}
Prior to submitting the document, if I open the file in Word and Save As
, then the document is able to be processed successfully. This leads me to believe that there is some kind of incompatibility with this specific document file. Once it is loaded into Word and re-saved, that must fix the issue.
My question is, why isn't the Document Intelligence Service able to detect this file as a valid .docx
file? Even in the Intelligence Studio, I'm able to import the document and view it, however the analysis fails with the same error.
How does the service check the file format? I can't attach the original document in question because the file type is not allowed, but I've uploaded it here: https://www.dropbox.com/scl/fi/q5n138p9btcrjkd3ton7x/GeicoAuto.docx?rlkey=2e40ygjd3ik01gsv58r1kgnlb&st=fml8m96v&dl=0