Throttling issue with OpenAI API

Tung Nguyen Xuan 40 Reputation points
2025-01-15T08:08:03.7866667+00:00

Hi, I'm using the Azure OpenAI API in my RAG project and noticed something unusual. In one of the steps, I send 3-4 chat completion requests concurrently. However, the response time appears to be much longer compared to a single request, suggesting that the requests might not be processed concurrently on Azure OpenAI's side. Could you explain why this happens and what options I have to address it?

Deployment detail: gpt-4o, Model version 2024-08-06, eastus, deployment type standard, Tokens per Minute Rate Limit set to maximum (400K)

I attached the actual log when making 4 requests. There's a 9-second gap between the first and the last response.

User's image

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,539 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.