I have an AI model deployed in Azure AI Foundry. When I call it via the API, I get 'TooManyRequests' after a couple of requests.
In Azure AI Foundry, I have the gpt-4o model deployed. In the UI, it is grouped under the Azure AI service “ai-sig6-azure-ai-services_aoai”. In the Azure Portal, I have an Azure AI Service called ai-sig6-azure-ai-services. The gpt-4o model has TKM of 30K and RPM of 180. I try to send several requests in a row and 1 or 2 will succeed and then I get the error HTTP Status Code ‘TooManyRequests’. I should not be anywhere close to those limits. I think there must be another limit that I am hitting, but cannot find it in the Azure Portal or Azure AI Foundry.
The http headers when I get the ‘TooManyRequests’ are:
Here are the response headers:
Retry-After: 49
x-ratelimit-reset-tokens: 49
apim-request-id: 8ef18262-d6c3-4b3b-a2bf-7cf1ccdddfee
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
policy-id: DeploymentRatelimit-Token
x-ms-region: East US 2
x-ratelimit-remaining-requests: 24
Date: Wed, 12 Feb 2025 14:14:46 GMT
Request failed with status code: TooManyRequests
What do I need to change so I don’t get this error?