DeepSeek-R1 deployed in Azure AI Hub got timeout

Question

DeepSeek-R1 deployed in Azure AI Hub got timeout

Jikun Chen 0

I deployed a DeepSeek-R1 model in Azure AI hub. It worked well for a month. But recently, the endpoint of the model is unavailable, showing "Request failed with status code 429. Clear the output to start a new dialog.", or sometimes "Azure deepseek timeout of 120000ms exceeded. Clear the output to start a new dialog." As shown in the following capture, my content is not long. I also didn't request my model frequently.

User's image

By the way, I also found the following doc from https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/deploy-models-deepseek?pivots=programming-language-python, not sure if it's my issue.

"Cost and quota considerations for DeepSeek models deployed as serverless API endpoints

Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios."

Could you let me know how I can resolve this issue? Thanks!

1 answer

Your answer

Answer 1

Hello Jikun Chen,

Thank you for posting your question in the Microsoft Q&A forum.

Deploying a DeepSeek-R1 model in Azure AI Hub and encountering errors such as "Request failed with status code 429" or "Azure deepseek timeout of 120000ms exceeded", especially when the model had been functioning well for a month, these errors typically indicate issues related to rate limiting, resource constraints, or service timeouts.

Status Code 429 (Too Many Requests), this error occurs when the number of requests to the Azure AI service exceeds the allowed rate limit. Azure enforces rate limits to ensure fair usage and prevent overloading the service.

Timeout of 120000ms Exceeded, this error indicates that the request took longer than 120 seconds to process, causing the service to time out. Timeouts can occur due to high latency, insufficient compute resources, or complex model computations.

Please confirm the steps below:

Step 1: Verify Request Frequency and Rate Limits - Check your application’s request frequency to ensure it does not exceed Azure’s rate limits.

Step 2: Monitor Resource Utilization - High resource utilization can lead to timeouts and degraded performance. Use Azure Monitor to track the compute and memory usage of your deployed model

Step 3: Monitor Resource Utilization - High resource utilization can lead to timeouts and degraded performance. Use Azure Monitor to track the compute and memory usage of your deployed model

Step 4: Optimize Model Performance - If your model is computationally intensive, it may cause timeouts. Optimize the model by reducing its complexity or using techniques like quantization and pruning. You may check the Microsoft documentation that offers strategies for improving model performance. -

Step 5: Check Network Latency - High network latency between your application and the Azure AI Hub endpoint can cause timeouts. Use tools like Azure Network Watcher to diagnose network issues

Step 6: Implement Retry Logic - To handle transient errors like 429 and timeouts, implement retry logic in your application.

Step 7: Cost and Quota Considerations for DeepSeek Models - Check DeepSeek deployment documentation, to confirm if your application exceeds these limits, you may encounter rate-limiting errors. If the current rate limits are insufficient for your scenario, contact Microsoft Azure Support to request an increase in your quota. Link with more info for your reference - https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/deploy-models-deepseek?pivots=programming-language-python

If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.

Jikun Chen 0 Reputation points

2025-03-06T16:01:08.17+00:00

Hi Suwarna, thanks for your support! The timeout issue appears to have self-healed today, approximately four days after the issue initially occurred. While I haven't found the root cause yet, I'd like to discuss your suggestions further.

Clarifications - The error messages "Request failed with status code 429" and "Azure DeepSeek timeout of 120000ms exceeded" appear to be related, as these errors alternated during the outage period.

Regarding Step 1: Verify Request Frequency and Rate Limits -The AI endpoints are only accessed by my demo web app, which generates minimal traffic (<50 requests per day). Should the issue recur, I'm prepared to:

Disable the web app.

Test the endpoint directly via playground chat in Azure AI Hub.

Monitor for recovery

Regarding Step 2&3: Monitor Resource Utilization - I've checked the Kusto logs in the Azure AI hub where the model is deployed, but the tables are empty. Could you share some advice how I can monitor the Utilization/requests?

About Step 5: Check Network Latency -Given that the timeout occurred within Azure's internal environment (AI Hub playground chat), network latency seems an unlikely culprit.

Looking forward to your further suggestion :)
Saideep Anchuri 4,020 Reputation points Microsoft External Staff

2025-03-11T09:50:35.3933333+00:00

Hello Jikun Chen,

I recommend reporting this issue to the Azure support team. They will be able to investigate the issue further and provide a more targeted solution. You can report the issue by following these steps:

Go to the Azure portal and navigate to your OpenAI Service resource.

Click on the "Support + troubleshooting" tab.

Fill out the required information, including a detailed description of the issue and any steps you have taken to troubleshoot it.

Submit the support request.

The Azure support team will review your request and provide assistance as soon as possible Azure support.

Thank You.

Share via

DeepSeek-R1 deployed in Azure AI Hub got timeout

1 answer

Your answer