How to fix request timeout issue with large pdfs in Document Intelligence API?

Question

Hello guys, I am getting a request timeout issue with the document intelligence API request. The catch is the timeout doesn't occur when I extract the details using the Document Intelligence Studio. I am requesting to my custom extraction model.

 const bufferNode = Buffer.from(fileBuffer);
        let reqObj = {
          base64Source: bufferNode.toString("base64"),
        };

        const invoiceResponse = await azureClient
          .path("/documentModels/{modelId}:analyze", exceptionBank.model)
          .post({
            contentType: "application/json",
            body: reqObj,
            timeout: 1000 * 60 * 20,
          });

        if (isUnexpected(invoiceResponse)) {
          throw invoiceResponse.body.error;
        }

        const poller = await getLongRunningPoller(azureClient, invoiceResponse);
        let response = (await poller.pollUntilDone()).body;

I have increased the timeout parameter to 20 minutes it still times out at around 8-9 minutes. Due to our specific business requirements, we can't split the large pdfs into smaller ones. Any suggestions?

Answer

Hello Udit,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to know how you can fix request timeout issue with large pdfs in Document Intelligence API.

Consider the following steps for best practices:

Make sure there are no network or server-level timeout settings that could be causing the issue. This includes checking firewall settings, load balancers, and any intermediate proxies.
Review the custom model to ensure it is optimized for the type of documents being processed. This may involve retraining the model with a more diverse dataset or adjusting model parameters and also, consider using the latest version of the Document Intelligence API, as newer versions may have performance improvements.
Implement asynchronous processing to handle large files. This allows the server to process the document in the background, and you can poll for the results periodically.

Implement a retry mechanism to handle transient errors. Here is an updated version of the code with retry logic:

   const bufferNode = Buffer.from(fileBuffer);
   let reqObj = {
     base64Source: bufferNode.toString("base64"),
   };
   async function analyzeDocument(retries = 3) {
     try {
       const invoiceResponse = await azureClient
         .path("/documentModels/{modelId}:analyze", exceptionBank.model)
         .post({
           contentType: "application/json",
           body: reqObj,
           timeout: 1000 * 60 * 20,
         });
       if (isUnexpected(invoiceResponse)) {
         throw invoiceResponse.body.error;
       }
       const poller = await getLongRunningPoller(azureClient, invoiceResponse);
       let response = (await poller.pollUntilDone()).body;
       return response;
     } catch (error) {
       if (retries > 0) {
         console.log(`Retrying... Attempts left: ${retries}`);
         return analyzeDocument(retries - 1);
       } else {
         console.error("Error analyzing document:", error);
         throw error;
       }
     }
   }
   analyzeDocument().then(response => {
     console.log("Document analysis completed:", response);
   }).catch(error => {
     console.error("Document analysis failed:", error);
   });

If the issue persists, raise a ticket on via your Azure Portal.

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

How to fix request timeout issue with large pdfs in Document Intelligence API?

1 answer

Your answer