I have an AI model deployed in Azure AI Foundry. When I call it via the API, I get 'TooManyRequests' after a couple of requests.

Stephen 0

In Azure AI Foundry, I have the gpt-4o model deployed. In the UI, it is grouped under the Azure AI service “ai-sig6-azure-ai-services_aoai”. In the Azure Portal, I have an Azure AI Service called ai-sig6-azure-ai-services. The gpt-4o model has TKM of 30K and RPM of 180. I try to send several requests in a row and 1 or 2 will succeed and then I get the error HTTP Status Code ‘TooManyRequests’. I should not be anywhere close to those limits. I think there must be another limit that I am hitting, but cannot find it in the Azure Portal or Azure AI Foundry.

The http headers when I get the ‘TooManyRequests’ are:

Here are the response headers:

Retry-After: 49

x-ratelimit-reset-tokens: 49

apim-request-id: 8ef18262-d6c3-4b3b-a2bf-7cf1ccdddfee

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

X-Content-Type-Options: nosniff

policy-id: DeploymentRatelimit-Token

x-ms-region: East US 2

x-ratelimit-remaining-requests: 24

Date: Wed, 12 Feb 2025 14:14:46 GMT

Request failed with status code: TooManyRequests

What do I need to change so I don’t get this error?

Ketsha 325 Reputation points Microsoft Employee

2025-02-12T17:13:08.14+00:00

Hi - Please use the following table and refer to the link below since this will help with the Quotas and limits that you are hitting.

Here is the quota and limits information: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quotas-limits#quotas-and-limits-reference

Also, GPT-4 model details: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models#gpt-4-models
kothapally Snigdha 1,270 Reputation points Microsoft Vendor

2025-02-12T22:39:16.49+00:00

Hi Stephen,

Could you kindly provide the subscription details as requested in the Private feature?

Thank you!

Share via

I have an AI model deployed in Azure AI Foundry. When I call it via the API, I get 'TooManyRequests' after a couple of requests.

Your answer