Facing similar issue, the O1 model takes more than 2 minutes to respond to a "Hi" text. Same behavior on both Azure OpenAI SDK as well as a plain RESTful API.
Severe Latency in Azure OpenAI Services (o1 and o3-mini Models) – Response Times Over 2 Minutes for Simple Queries
We are experiencing significant performance issues with the OpenAI models (o1 and o3-mini) on Azure, even within the Azure Playground. For simple queries like "Who are you?", the response time exceeds 2 minutes, which is far from normal. This delay is causing considerable disruption, and I have verified that the issue persists consistently.
Could you please investigate the cause of this latency and provide a resolution? The performance seems to be abnormally slow for trivial tasks, and I would appreciate guidance on resolving this.
Thank you for your assistance!
5 answers
Sort by: Most helpful
-
-
Juan Garassino 0 Reputation points
2025-02-19T20:58:01.88+00:00 Same Issue here, seems like theres not to many talking about it.
-
Subhrajit Bhowmik 20 Reputation points
2025-02-19T21:06:28.0133333+00:00 Same issue, our company's RAG is failing because of severe delay in response generation from GPT o3-mini model resulting in timeout. Any advice from MSFT ? Any known workarounds from the community ?
-
Subhrajit Bhowmik 20 Reputation points
2025-02-19T21:22:44.2433333+00:00 Seems like they have internally reduced the Rate limit, the deployment for us shows Rate limit (Tokens per minute) 3,570,000 but when I press edit it shows 1k. IS there anyone from Azure team that can tell us why this was changed ? Lower rate limit will cause generation delays causing a timeout
-
Pedro Castelo Branco Lourenço 6 Reputation points
2025-02-21T13:19:48.01+00:00 Any ETA on this issue? I see many different threads of people facing the very same issue. The lack of transparency and clear communication is depressing. If I look on Azure Status it shows there`s no issues.... but the truth is something else.