Hello Deepak Agarwal,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that you would like to clarify on understanding Azure OpenAI usage tokens, limits and deployment types.
Most of all, AI architecture required experience, business requirements, and technology. However, regarding the best practices stated in the official documentation. @kothapally Snigdha have really answered most of the questions, mine is just to improve by clarifying the relationship between TPM and usage tier, the per-subscription-region-model TPM aggregation, and the possibility of requesting quota increases.
Regarding all your questions: TPM stands for Transactions Per Minute, which is a rate limit. Azure OpenAI has different quotas based on deployment types (Standard, Global, DataZone), regions, subscriptions, and tenants. The documentation mentions that TPM limits are per region and per model. For example, in East US, the maximum TPM for gpt-4o might be 1M, but if you deploy a service with 0.5M TPM, that's part of the regional cap.
Scenario:
Tenant | Subscription | Region | TPM Limit |
---|---|---|---|
MyTenant1.com | MyAzureSubscription1 | East US | 0.5M (existing) + 0.5M max additional |
MyTenant1.com | MyAzureSubscription1 | East US 2 | 1M |
MyTenant1.com | MyAzureSubscription2 | East US | 1M |
MyTenant1.com | MyAzureSubscription2 | East US 2 | 1M |
MyTenant2.com | MyAzureSubscription1 | East US | 1M |
NOTE:
TPM limits are per subscription, per region, per model. The sum of TPM across all deployments of the same model in a subscription-region cannot exceed the regional cap (1M for gpt-4o
in EastUS).
Secondly, TPM limits are not tenant-level. Tenants only aggregate usage tiers (monthly token caps). TPM is enforced per subscription-region-model.
Thirdly, deployments are subscription-level. Tenants cannot host deployments directly.
Fourth, each deployment type (Standard, Global, DataZone) has independent TPM limits. For example:
- Global deployments: 30M TPM total. You can split this across deployments (e.g., 15M each).
- DataZone: Separate quota.
Fifth, usage Tier (12B Tokens/Month):
- (A) This is a hard monthly cap across all deployments under the tenant (MyTenant1.com).
- (B) It ensures your total usage (across subscriptions/regions) does not exceed 12B tokens/month for
gpt-4o
. - (C) It cannot override TPM limits**. Even with unused tokens, you’re rate-limited by TPM.
About increasing limits can be in the category of:
- TPM: Request increases via Azure portal (up to regional maxima).
- Usage Tier: Contact Microsoft to negotiate (default is 12B for
gpt-4o
).
In summary:
- TPM Limits: Fixed table above reflects proper sharing of TPM within a subscription-region.
- Tenant Influence: Tenants do not control TPM; they track usage tiers.
- Usage Tier: Explicitly stated as a tenant-wide monthly token cap, separate from TPM.
I hope this is helpful! Do not hesitate to let me know if you have any other questions.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.