450K of 450K TPM

Nico Vincent 0 Reputation points
2025-01-17T23:23:10.0433333+00:00

Hello,

We are utilizing Azure Sponsorship and currently working with the GPT-4 model. It appears that our quota of 450K/450K TPM (Tokens Per Minute) has been fully utilized.

However, I am a bit confused as TPM typically represents tokens per minute, and the API is now unresponsive, either hanging or timing out.

Could you please clarify how we can reset the quota or resolve this issue?

Thank you for your assistance!

Best regards,

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,582 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Marcin Policht 32,735 Reputation points MVP
    2025-01-18T00:50:31.46+00:00

    When your Tokens Per Minute (TPM) quota is fully utilized in Azure OpenAI, the service has hit the maximum number of tokens it can process within a minute, causing API requests to become unresponsive or time out. To address the issue, you can try the following:

    1. Leverage quota reset. Keep in mind that the TPM quota automatically resets at the start of the next minute. However, if you're consistently hitting the limit, you'll experience disruptions repeatedly.
    2. Optimize current token utilization by ensuring your prompts and expected responses are as concise as possible. Use response streaming to break down longer outputs into smaller chunks, reducing token bursts. Consolidate multiple smaller requests into fewer, larger ones if possible, to better manage token usage.
    3. Request quota increase
      1. Navigate to your Azure OpenAI resource.
      2. Go to the Quota + Usage tab.
      3. Submit a support request for a higher TPM limit.
    4. Regularly monitor token usage using Azure metrics to identify spikes or trends that may need attention. Adjust application logic to throttle or queue requests during high usage periods.
    5. Implement retry mechanisms in your application to handle cases when the service becomes temporarily unavailable due to quota exhaustion.

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.