Hello Jikun Chen,
Thank you for posting your question in the Microsoft Q&A forum.
Deploying a DeepSeek-R1 model in Azure AI Hub and encountering errors such as "Request failed with status code 429" or "Azure deepseek timeout of 120000ms exceeded", especially when the model had been functioning well for a month, these errors typically indicate issues related to rate limiting, resource constraints, or service timeouts.
Status Code 429 (Too Many Requests), this error occurs when the number of requests to the Azure AI service exceeds the allowed rate limit. Azure enforces rate limits to ensure fair usage and prevent overloading the service.
Timeout of 120000ms Exceeded, this error indicates that the request took longer than 120 seconds to process, causing the service to time out. Timeouts can occur due to high latency, insufficient compute resources, or complex model computations.
Please confirm the steps below:
Step 1: Verify Request Frequency and Rate Limits - Check your application’s request frequency to ensure it does not exceed Azure’s rate limits.
Step 2: Monitor Resource Utilization - High resource utilization can lead to timeouts and degraded performance. Use Azure Monitor to track the compute and memory usage of your deployed model
Step 3: Monitor Resource Utilization - High resource utilization can lead to timeouts and degraded performance. Use Azure Monitor to track the compute and memory usage of your deployed model
Step 4: Optimize Model Performance - If your model is computationally intensive, it may cause timeouts. Optimize the model by reducing its complexity or using techniques like quantization and pruning. You may check the Microsoft documentation that offers strategies for improving model performance. -
- https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-public-preview-of-direct-preference-optimization-capabilities-with-az/4358164
- https://techcommunity.microsoft.com/blog/educatordeveloperblog/fine-tuning-language-models-with-azure-ai-foundry-a-detailed-guide/4369281
Step 5: Check Network Latency - High network latency between your application and the Azure AI Hub endpoint can cause timeouts. Use tools like Azure Network Watcher to diagnose network issues
Step 6: Implement Retry Logic - To handle transient errors like 429 and timeouts, implement retry logic in your application.
Step 7: Cost and Quota Considerations for DeepSeek Models - Check DeepSeek deployment documentation, to confirm if your application exceeds these limits, you may encounter rate-limiting errors. If the current rate limits are insufficient for your scenario, contact Microsoft Azure Support to request an increase in your quota. Link with more info for your reference - https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/deploy-models-deepseek?pivots=programming-language-python
If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.