Azure AI Foundry Serverless Model Rate Limits
Rishab Mehta
60
Reputation points
Hello,
This page: https://learn.microsoft.com/en-us/azure/ai-foundry/model-inference/quotas-limits
says that the Rate Limits for deployed serverless models is 200.000 tokens per minute and 1.000 requests per minute, but this page: https://learn.microsoft.com/en-us/azure/ai-studio/how-to/fine-tune-serverless?tabs=chat-completion says that each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute.
So is it 200 and 1 or 200,000 and 1,000?
Sign in to answer