Clarity on understanding Azure OpenAI usage tokens, limits and deployment types

Deepak Agarwal 20 Reputation points
2025-02-12T17:27:17.1366667+00:00

Hi,

I have read the documentation on Azure OpenAI service quotas and limits and deployment types and to better understand it, have few follow-up queries.

Consider below setup of my deployment -

  1. Azure Tenant Directory: MyTenant1.com
  2. Subscription: MyAzureSubscription1
  3. Azure OpenAI (AOAI) service: MyAzureOpenAIService1
  4. Model: gpt-4o, TPM configured: 0.5M (Max is 1M for EastUS region as per documentation)
  5. Location/Region of AOAI: EastUS

Queries for Deployment Type - Standard (or Regional)

If I want to deploy another AOAI service: MyAzureOpenAIService2 (with same model), can you help fill in the blanks for below scenarios? (as per documentation region EastUS2 has same TPM limit of 1M for same model i.e. gpt-4o)

Tenant Subscription Region TPM Limit?
MyTenant1.com MyAzureSubscription1 East US ?
MyTenant1.com MyAzureSubscription1 East US 2 ?
MyTenant1.com MyAzureSubscription2 East US ?
MyTenant1.com MyAzureSubscription2 East US 2 ?
MyTenant2.com MyAzureSubscription1 East US ?
  • Are all these TPM limits at Tenant level? How does Azure Tenant factor in these limits?
  • Can I deploy AOAI at Tenant level? If yes, how and what are the TPM limits?
  • Does Standard (or Regional) deployments eats into quota for other deployment types (GlobalStandard, DataZone etc.)? Does different deployment types have self-contained TPM limits as mentioned in the docs irrespective of any other AOAI deployment types?
  • Can I have 2 AOAI deployment with type GlobalDeployments for gpt-4o (total TPM limit is 30M) with 15M TPM divided between them? Similarly for DataZoneStandard?
  • There is Usage Tier mentioned in your docs at tenant level. For gpt-4o, I see it is mentioned as 12B tokens per month at Tenant level (which is MyTenant1.com in above example). If the TPM limits are already capped at region/subscription level (based on answers for above), what's the significance of this number? What does it tell me? Can it be increased and used going above and beyond capped TPM limits and how?

Appreciate your help and support in clarifying this.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,657 questions
{count} votes

Accepted answer
  1. Sina Salam 17,571 Reputation points
    2025-02-12T22:36:02.0066667+00:00

    Hello Deepak Agarwal,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you would like to clarify on understanding Azure OpenAI usage tokens, limits and deployment types.

    Most of all, AI architecture required experience, business requirements, and technology. However, regarding the best practices stated in the official documentation. @kothapally Snigdha have really answered most of the questions, mine is just to improve by clarifying the relationship between TPM and usage tier, the per-subscription-region-model TPM aggregation, and the possibility of requesting quota increases.

    Regarding all your questions: TPM stands for Transactions Per Minute, which is a rate limit. Azure OpenAI has different quotas based on deployment types (Standard, Global, DataZone), regions, subscriptions, and tenants. The documentation mentions that TPM limits are per region and per model. For example, in East US, the maximum TPM for gpt-4o might be 1M, but if you deploy a service with 0.5M TPM, that's part of the regional cap.

    Scenario:

    Tenant Subscription Region TPM Limit
    MyTenant1.com MyAzureSubscription1 East US 0.5M (existing) + 0.5M max additional
    MyTenant1.com MyAzureSubscription1 East US 2 1M
    MyTenant1.com MyAzureSubscription2 East US 1M
    MyTenant1.com MyAzureSubscription2 East US 2 1M
    MyTenant2.com MyAzureSubscription1 East US 1M

    NOTE:

    TPM limits are per subscription, per region, per model. The sum of TPM across all deployments of the same model in a subscription-region cannot exceed the regional cap (1M for gpt-4o in EastUS).

    Secondly, TPM limits are not tenant-level. Tenants only aggregate usage tiers (monthly token caps). TPM is enforced per subscription-region-model.

    Thirdly, deployments are subscription-level. Tenants cannot host deployments directly.

    Fourth, each deployment type (Standard, Global, DataZone) has independent TPM limits. For example:

    • Global deployments: 30M TPM total. You can split this across deployments (e.g., 15M each).
    • DataZone: Separate quota.

    Fifth, usage Tier (12B Tokens/Month):

    • (A) This is a hard monthly cap across all deployments under the tenant (MyTenant1.com).
    • (B) It ensures your total usage (across subscriptions/regions) does not exceed 12B tokens/month for gpt-4o.
    • (C) It cannot override TPM limits**. Even with unused tokens, you’re rate-limited by TPM.

    About increasing limits can be in the category of:

    • TPM: Request increases via Azure portal (up to regional maxima).
    • Usage Tier: Contact Microsoft to negotiate (default is 12B for gpt-4o).

    In summary:

    • TPM Limits: Fixed table above reflects proper sharing of TPM within a subscription-region.
    • Tenant Influence: Tenants do not control TPM; they track usage tiers.
    • Usage Tier: Explicitly stated as a tenant-wide monthly token cap, separate from TPM.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


1 additional answer

Sort by: Most helpful
  1. kothapally Snigdha 1,505 Reputation points Microsoft Vendor
    2025-02-12T18:53:59.6966667+00:00

    Hi Deepak Agarwal,

    Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.

    • MyTenant1.com | MyAzureSubscription1 | East US | 0.5M (already configured)
    • MyTenant1.com | MyAzureSubscription1 | East US | 0.5M (if deploying another service with the same configuration)
    • MyTenant1.com | MyAzureSubscription1 | East US 2 | 1M (max limit for the region)
    • MyTenant1.com | MyAzureSubscription2 | East US | 1M (max limit for the region)
    • MyTenant1.com | MyAzureSubscription2 | East US 2 | 1M (max limit for the region)
    • MyTenant2.com | MyAzureSubscription1 | East US | 1M (max limit for the region)
    • Yes, the TPM limits are generally at the subscription level but can be influenced by the tenant. Each subscription within a tenant can have its own limits based on the region and model.
    • You cannot deploy AOAI directly at the tenant level; deployments are made at the subscription level. The TPM limits are defined per subscription and per region.
    • Different deployment types (Standard, Global Standard, Data Zone, etc.) have self-contained TPM limits. The quotas for each deployment type do not affect each other.
    • Yes, you can have two AOAI deployments with type Global Deployments for gpt-4o, dividing the total TPM limit of 30M between them. The same applies for Data Zone Standard deployments.
    • The Usage Tier indicates the total number of tokens you can consume per month at the tenant level. For gpt-4o, this is capped at 12 billion tokens per month. This limit is separate from the TPM limits and represents the overall usage across all deployments. It cannot be increased beyond the specified limit, but it allows for tracking and managing usage across multiple deployments.

    I hope these helps you. Thank you!

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.