Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-08-01-preview have exceeded token rate limit of your current AIServices S0 pricing tier

Question

Hi all we have an error "The execution of a locator method failed. Class = "Cargo Manifest", Locator = "AE_datiCM", Original error message:

Web service failure: error code=0x803d0013

The server returned a fault:

Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-08-01-preview have exceeded token rate limit of your current AIServices S0 pricing tier. Please retry after 86400 seconds. Please contact Azure support service if you would like to further increase the default rate limit", during the AI chat in Tungsten TotalAgility 8.1, in order to use the subscription . We need to increase S0 tier in order to use the chat . Many thanks in advance ******@lynxspa.com, ******@lynxspa.com

Accepted Answer

Hi Marco Moroni

Rate limits indicates that you are exceeding the estimated cumulative max-processed-token per minute at some time during your inference.

You might be sending longer queries or generating longer outputs or going through a huge index size.

Solution will be

Increase your max_token param from model deployment
Adjust your prompts to be shorter, precise and clear.
Adjust system message to keep the answer size within smaller chunks.
Implement retry mechanism with a sleep time

Reference thread.

Thank you.

Share via

Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-08-01-preview have exceeded token rate limit of your current AIServices S0 pricing tier

0 additional answers

Your answer