Settings used from Microsoft Learn challenge but gpt-4to still shows exceeded token limit after one question

Thomas Frei 20 Reputation points
2024-12-19T13:44:11.1266667+00:00

Hi guys,

I follow the Microsoft Learn Challenge for "Trustworthy AI" Unfortunately every time I set up a gpt-4to playground I get the following error after just asking one question:

"Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-10-01-preview have exceeded token rate limit of your current AIServices S0 pricing tier. Please retry after 86400 seconds. Please contact Azure support service if you would like to further increase the default rate limit."

I'm doing the challenge to familarize myself with AI in general and Azure in particular but don't know how to fix this issue. Hope you can point me in the right direction.

I'm located in Europe but use the US East location as recommended by the learning script in case that is relevant.

Thanks for your help!

Tommy

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,448 questions
0 comments No comments
{count} votes

Accepted answer
  1. SriLakshmi C 1,375 Reputation points Microsoft Vendor
    2024-12-19T18:27:15.03+00:00

    Hello Thomas Frei,

    Welcome to Microsoft Q&A! Thanks for posting the question.

    The error message indicates that you’ve exceeded the token rate limit of your current AI Services S0 pricing tier.

    Azure OpenAI’s quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM).

    You can check this documentation for more details.

    User's image To give more context, Tokens-Per-Minute (TPM) and Requests-Per-Minute (RPM) rate limits for the deployment.

    TPM rate limits are based on the maximum number of tokens that are estimated to be processed by a request at the time the request is received.

    RPM rate limits are based on the number of requests received over time. The rate limit expects that requests be evenly distributed over a one-minute period. If this average flow isn't maintained, then requests may receive an error response even though the limit isn't met when measured over the course of a minute.

    Please refer this document Manage Azure OpenAI Service quota for more details.

    To view your quota allocations across deployments in a given region, select Shared Resources> Quota in Azure OpenAI studio and click on the link to increase the quota*.

    User's image Also ensure that resource and the resource group were created in the same region.

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    Thank you!


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.