Clarity on understanding Azure OpenAI usage tokens, limits and deployment types

Question

Hi,

I have read the documentation on Azure OpenAI service quotas and limits and deployment types and to better understand it, have few follow-up queries.

Consider below setup of my deployment -

Azure Tenant Directory: MyTenant1.com
Subscription: MyAzureSubscription1
Azure OpenAI (AOAI) service: MyAzureOpenAIService1
Model: gpt-4o, TPM configured: 0.5M (Max is 1M for EastUS region as per documentation)
Location/Region of AOAI: EastUS

Queries for Deployment Type - Standard (or Regional)

If I want to deploy another AOAI service: MyAzureOpenAIService2 (with same model), can you help fill in the blanks for below scenarios? (as per documentation region EastUS2 has same TPM limit of 1M for same model i.e. gpt-4o)

Tenant	Subscription	Region	TPM Limit?
MyTenant1.com	MyAzureSubscription1	East US	?
MyTenant1.com	MyAzureSubscription1	East US 2	?
MyTenant1.com	MyAzureSubscription2	East US	?
MyTenant1.com	MyAzureSubscription2	East US 2	?
MyTenant2.com	MyAzureSubscription1	East US	?

Are all these TPM limits at Tenant level? How does Azure Tenant factor in these limits?
Can I deploy AOAI at Tenant level? If yes, how and what are the TPM limits?
Does Standard (or Regional) deployments eats into quota for other deployment types (GlobalStandard, DataZone etc.)? Does different deployment types have self-contained TPM limits as mentioned in the docs irrespective of any other AOAI deployment types?
Can I have 2 AOAI deployment with type GlobalDeployments for gpt-4o (total TPM limit is 30M) with 15M TPM divided between them? Similarly for DataZoneStandard?
There is Usage Tier mentioned in your docs at tenant level. For gpt-4o, I see it is mentioned as 12B tokens per month at Tenant level (which is MyTenant1.com in above example). If the TPM limits are already capped at region/subscription level (based on answers for above), what's the significance of this number? What does it tell me? Can it be increased and used going above and beyond capped TPM limits and how?

Appreciate your help and support in clarifying this.

Accepted Answer

Hello Deepak Agarwal,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to clarify on understanding Azure OpenAI usage tokens, limits and deployment types.

Most of all, AI architecture required experience, business requirements, and technology. However, regarding the best practices stated in the official documentation. @kothapally Snigdha have really answered most of the questions, mine is just to improve by clarifying the relationship between TPM and usage tier, the per-subscription-region-model TPM aggregation, and the possibility of requesting quota increases.

Regarding all your questions: TPM stands for Transactions Per Minute, which is a rate limit. Azure OpenAI has different quotas based on deployment types (Standard, Global, DataZone), regions, subscriptions, and tenants. The documentation mentions that TPM limits are per region and per model. For example, in East US, the maximum TPM for gpt-4o might be 1M, but if you deploy a service with 0.5M TPM, that's part of the regional cap.

Scenario:

Tenant	Subscription	Region	TPM Limit
MyTenant1.com	MyAzureSubscription1	East US	0.5M (existing) + 0.5M max additional
MyTenant1.com	MyAzureSubscription1	East US 2	1M
MyTenant1.com	MyAzureSubscription2	East US	1M
MyTenant1.com	MyAzureSubscription2	East US 2	1M
MyTenant2.com	MyAzureSubscription1	East US	1M

NOTE:

TPM limits are per subscription, per region, per model. The sum of TPM across all deployments of the same model in a subscription-region cannot exceed the regional cap (1M for gpt-4o in EastUS).

Secondly, TPM limits are not tenant-level. Tenants only aggregate usage tiers (monthly token caps). TPM is enforced per subscription-region-model.

Thirdly, deployments are subscription-level. Tenants cannot host deployments directly.

Fourth, each deployment type (Standard, Global, DataZone) has independent TPM limits. For example:

Global deployments: 30M TPM total. You can split this across deployments (e.g., 15M each).
DataZone: Separate quota.

Fifth, usage Tier (12B Tokens/Month):

(A) This is a hard monthly cap across all deployments under the tenant (MyTenant1.com).
(B) It ensures your total usage (across subscriptions/regions) does not exceed 12B tokens/month for gpt-4o.
(C) It cannot override TPM limits**. Even with unused tokens, you’re rate-limited by TPM.

About increasing limits can be in the category of:

TPM: Request increases via Azure portal (up to regional maxima).
Usage Tier: Contact Microsoft to negotiate (default is 12B for gpt-4o).

In summary:

TPM Limits: Fixed table above reflects proper sharing of TPM within a subscription-region.
Tenant Influence: Tenants do not control TPM; they track usage tiers.
Usage Tier: Explicitly stated as a tenant-wide monthly token cap, separate from TPM.

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Answer

Hi Deepak Agarwal,

Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.

MyTenant1.com | MyAzureSubscription1 | East US | 0.5M (already configured)
MyTenant1.com | MyAzureSubscription1 | East US | 0.5M (if deploying another service with the same configuration)
MyTenant1.com | MyAzureSubscription1 | East US 2 | 1M (max limit for the region)
MyTenant1.com | MyAzureSubscription2 | East US | 1M (max limit for the region)
MyTenant1.com | MyAzureSubscription2 | East US 2 | 1M (max limit for the region)
MyTenant2.com | MyAzureSubscription1 | East US | 1M (max limit for the region)
Yes, the TPM limits are generally at the subscription level but can be influenced by the tenant. Each subscription within a tenant can have its own limits based on the region and model.
You cannot deploy AOAI directly at the tenant level; deployments are made at the subscription level. The TPM limits are defined per subscription and per region.
Different deployment types (Standard, Global Standard, Data Zone, etc.) have self-contained TPM limits. The quotas for each deployment type do not affect each other.
Yes, you can have two AOAI deployments with type Global Deployments for gpt-4o, dividing the total TPM limit of 30M between them. The same applies for Data Zone Standard deployments.
The Usage Tier indicates the total number of tokens you can consume per month at the tenant level. For gpt-4o, this is capped at 12 billion tokens per month. This limit is separate from the TPM limits and represents the overall usage across all deployments. It cannot be increased beyond the specified limit, but it allows for tracking and managing usage across multiple deployments.

I hope these helps you. Thank you!

Share via

Clarity on understanding Azure OpenAI usage tokens, limits and deployment types

1 additional answer

Your answer