Azure AI Foundry Serverless Model Rate Limits

Rishab Mehta 60 Reputation points
2025-02-21T21:02:08.97+00:00

Hello,

This page: https://learn.microsoft.com/en-us/azure/ai-foundry/model-inference/quotas-limits

says that the Rate Limits for deployed serverless models is 200.000 tokens per minute and 1.000 requests per minute, but this page: https://learn.microsoft.com/en-us/azure/ai-studio/how-to/fine-tune-serverless?tabs=chat-completion says that each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute.

So is it 200 and 1 or 200,000 and 1,000?

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,159 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.