How to achieve 50,000 azure open ai api requests in 5 hours which contains minimum of 10k input tokens and 7k output tokens in each call?

Question

For my use case, I want to process the text data using azure openAI. The model im using is gpt-4o-mini. In this I need to process 15,000 * 50,000 tokens in 8 hours. So , the number of api calls per minutes or ratelimit is restricting my solution. Can you give me a better answer for this ?

Answer

The Azure OpenAI service has a limit of 40 million tokens per month for each model. The GPT-4o-mini model has a limit of 40 million tokens as well. Therefore, you need to make sure that you are not exceeding the token limit for the model. To achieve your goal of processing 15,000 * 50,000 tokens in 8 hours, you need to make sure that you are not exceeding the rate limit of the Azure OpenAI service. The rate limit for the service is 1,200 transactions per minute. Each transaction can contain a maximum of 2,048 tokens. To process 15,000 * 50,000 tokens in 8 hours, you need to make 375 API calls per minute. Each API call should contain 40 transactions, with each transaction containing 10,000 input tokens and 7,000 output tokens. However, this is just a theoretical calculation and it may not be possible to achieve this rate limit in practice.

To optimize the performance of your solution, you can try the following techniques: -

Implement retry logic in your application to handle any errors or timeouts.

Avoid sharp changes in the workload. Increase the workload gradually.

Test different load increase patterns to find the optimal rate limit for your solution.

Create another OpenAI service resource in the same or different regions and distribute the workload among them.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

I hope this helps! Let me know if you have any further questions

Answer

Hi

The Azure OpenAI service has a limit of 40 million tokens per month for each model. The GPT-4o-mini model has a limit of 40 million tokens as well. Therefore, you need to make sure that you are not exceeding the token limit for the model. To achieve your goal of processing 15,000 * 50,000 tokens in 8 hours, you need to make sure that you are not exceeding the rate limit of the Azure OpenAI service. The rate limit for the service is 1,200 transactions per minute. Each transaction can contain a maximum of 2,048 tokens. To process 15,000 * 50,000 tokens in 8 hours, you need to make 375 API calls per minute. Each API call should contain 40 transactions, with each transaction containing 10,000 input tokens and 7,000 output tokens. However, this is just a theoretical calculation and it may not be possible to achieve this rate limit in practice.

To optimize the performance of your solution, you can try the following techniques: -

Implement retry logic in your application to handle any errors or timeouts.

Avoid sharp changes in the workload. Increase the workload gradually.

Test different load increase patterns to find the optimal rate limit for your solution.

Create another OpenAI service resource in the same or different regions, and distribute the workload among them.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

I hope this helps! Let me know if you have any further questions

Share via

How to achieve 50,000 azure open ai api requests in 5 hours which contains minimum of 10k input tokens and 7k output tokens in each call?

2 answers

Your answer