Azure AI Foundry - Rate Limit Exceeded Try again in 86400 seconds

Gareth Jayne 5 Reputation points
2025-01-23T22:31:49.7+00:00

Hi

I have deployed a gpt-4o model in Azure AI Foundry and am trying to test it in the Chat Playground. I've used the Add your data section to create a vector index that points to my Azure Blob storage location where the files for the gpt model to reference are stored. This vector index was successfully created and I can see this in the Azure ML Studio.

However, when I try testing this in the Chat Playground I keep getting the following error:

Server responded with status 429. Error message: {'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 86400 seconds.'}}

I waited 24 hours and tried again but the same error appeared again. I've had rate limit errors in the past with other projects but the number of seconds decrease if you try again whereas at the moment the above error always says 86400 seconds.

I assume it's something to do with the index as if I remove the data source from the playground and use the model without my data connected I don't get the rate limit error. My region is UK South and I've already increased the Tokens per Minute Rate Limit for this model to the maximum of 30K RPM.

Does anyone know why this might be happening?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,578 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,070 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. SriLakshmi C 2,010 Reputation points Microsoft Vendor
    2025-01-24T18:42:31.1666667+00:00

    Hello Gareth Jayne,

    Greetings and Welcome to Microsoft Q&A! Thanks for posting the question.

    I understand that you are increased the rate limit to maximum and still encountering the issue.

    I attempted to reproduce the issue in my environment, and it is working correctly for me.

    To give more context, As each request is received, Azure OpenAI computes an estimated max processed-token count that includes the following:

    • Prompt text and count
    • The max_tokens parameter setting
    • The best_of parameter setting

    As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets. For more details, see Understanding rate limits.

    Please see Manage Azure OpenAI Service quota for more details.

    If you are using Azure AI studio:

    You can view your quotas and limits in Azure AI studio Model Quota section.

    Did you check if you have exceeded the quota limit for your Azure OpenAI resources? You can view your quotas and limits in Azure AI studio Model Quota section.

    User's image

    Please see Manage and increase quotas for resources with Azure AI Studio for more details.

    If you are using OpenAI studio:

    To view your quota allocations across deployments in a given region, select Shared Resources> Quota in Azure OpenAI studio and click on the link to increase the quota*.*User's image

    Also, to minimize issues related to rate limits, it's a good idea to use the following techniques:

    • Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.
    • Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.
    • Implement retry logic in your application.
    • Avoid sharp changes in the workload. Increase the workload gradually.
    • Test different load increase patterns.

    Hope this helps. Do let me know if you have any further queries.


    If the response helped, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.