Throttling issue with OpenAI API
Hi, I'm using the Azure OpenAI API in my RAG project and noticed something unusual. In one of the steps, I send 3-4 chat completion requests concurrently. However, the response time appears to be much longer compared to a single request, suggesting that the requests might not be processed concurrently on Azure OpenAI's side. Could you explain why this happens and what options I have to address it?
Deployment detail: gpt-4o, Model version 2024-08-06, eastus, deployment type standard, Tokens per Minute Rate Limit set to maximum (400K)
I attached the actual log when making 4 requests. There's a 9-second gap between the first and the last response.