DeepSeek-R1 deployed in Azure AI Hub got timeout

Jikun Chen 0 Reputation points
2025-03-05T14:23:47.1466667+00:00

I deployed a DeepSeek-R1 model in Azure AI hub. It worked well for a month. But recently, the endpoint of the model is unavailable, showing "Request failed with status code 429. Clear the output to start a new dialog.", or sometimes "Azure deepseek timeout of 120000ms exceeded. Clear the output to start a new dialog." As shown in the following capture, my content is not long. I also didn't request my model frequently.

User's image

By the way, I also found the following doc from https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/deploy-models-deepseek?pivots=programming-language-python, not sure if it's my issue.

"Cost and quota considerations for DeepSeek models deployed as serverless API endpoints

Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios."

Could you let me know how I can resolve this issue? Thanks!

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,222 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Suwarna S Kale 1,191 Reputation points
    2025-03-05T15:04:57.5933333+00:00

    Hello Jikun Chen,

    Thank you for posting your question in the Microsoft Q&A forum.

    Deploying a DeepSeek-R1 model in Azure AI Hub and encountering errors such as "Request failed with status code 429" or "Azure deepseek timeout of 120000ms exceeded", especially when the model had been functioning well for a month, these errors typically indicate issues related to rate limiting, resource constraints, or service timeouts. 

    Status Code 429 (Too Many Requests), this error occurs when the number of requests to the Azure AI service exceeds the allowed rate limit. Azure enforces rate limits to ensure fair usage and prevent overloading the service.

    Timeout of 120000ms Exceeded, this error indicates that the request took longer than 120 seconds to process, causing the service to time out. Timeouts can occur due to high latency, insufficient compute resources, or complex model computations.

    Please confirm the steps below:

    Step 1: Verify Request Frequency and Rate Limits - Check your application’s request frequency to ensure it does not exceed Azure’s rate limits. 

    Step 2: Monitor Resource Utilization - High resource utilization can lead to timeouts and degraded performance. Use Azure Monitor to track the compute and memory usage of your deployed model

    Step 3: Monitor Resource Utilization - High resource utilization can lead to timeouts and degraded performance. Use Azure Monitor to track the compute and memory usage of your deployed model

    Step 4: Optimize Model Performance - If your model is computationally intensive, it may cause timeouts. Optimize the model by reducing its complexity or using techniques like quantization and pruning. You may check the Microsoft documentation that offers strategies for improving model performance. -

    Step 5: Check Network Latency - High network latency between your application and the Azure AI Hub endpoint can cause timeouts. Use tools like Azure Network Watcher to diagnose network issues

    Step 6: Implement Retry Logic - To handle transient errors like 429 and timeouts, implement retry logic in your application.

    Step 7: Cost and Quota Considerations for DeepSeek Models - Check DeepSeek deployment documentation, to confirm if your application exceeds these limits, you may encounter rate-limiting errors. If the current rate limits are insufficient for your scenario, contact Microsoft Azure Support to request an increase in your quota. Link with more info for your reference - https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/deploy-models-deepseek?pivots=programming-language-python

    If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.