Severe Latency in Azure OpenAI Services (o1 and o3-mini Models) – Response Times Over 2 Minutes for Simple Queries
We are experiencing significant performance issues with the OpenAI models (o1 and o3-mini) on Azure, even within the Azure Playground. For simple queries like "Who are you?", the response time exceeds 2 minutes, which is far from normal. This delay is causing considerable disruption, and I have verified that the issue persists consistently.
Could you please investigate the cause of this latency and provide a resolution? The performance seems to be abnormally slow for trivial tasks, and I would appreciate guidance on resolving this.
Thank you for your assistance!
Azure OpenAI Service
-
Patel, Harshal • 0 Reputation points
2025-02-19T14:36:38.76+00:00 Facing similar issue, the O1 model takes more than 2 minutes to respond to a "Hi" text. Same behavior on both Azure OpenAI SDK as well as a plain RESTful API.
-
Josh Sirota • 40 Reputation points
2025-02-19T19:25:06.5166667+00:00 I'm seeing the same issue as of February 18, 2025. Also seeing openai.TimeoutErrors.
-
Salem Elmrayed • 25 Reputation points
2025-02-19T19:28:01.26+00:00 Same issue here O1 takes more than 10 minutes to generate response
-
Juan Garassino • 0 Reputation points
2025-02-19T20:58:01.88+00:00 Same Issue here, seems like theres not to many talking about it.
-
Subhrajit Bhowmik • 25 Reputation points
2025-02-19T21:06:28.0133333+00:00 Same issue, our company's RAG is failing because of severe delay in response generation from GPT o3-mini model resulting in timeout. Any advice from MSFT ? Any known workarounds from the community ?
-
Patel, Harshal • 0 Reputation points
2025-02-19T21:10:04.5+00:00 From what I can tell, downgrade the LLM model to a 4o or 4o-mini as they are working as expected. Seems like only the o* family is affected by this.
-
Martin Dreyer • 21 Reputation points
2025-02-19T21:15:57.5733333+00:00 We too are experiencing very slow responses from o3-mini, last few days.
-
Subhrajit Bhowmik • 25 Reputation points
2025-02-19T21:22:44.2433333+00:00 Seems like they have internally reduced the Rate limit, the deployment for us shows Rate limit (Tokens per minute) 3,570,000 but when I press edit it shows 1k. IS there anyone from Azure team that can tell us why this was changed ? Lower rate limit will cause generation delays causing a timeout
-
Manas Mohanty • 945 Reputation points • Microsoft Vendor
2025-02-20T05:51:08.4566667+00:00 Hi Movin Silva and Everyone
We are able to replicate this scenario for O1 models and reaching out to PG on this and will get back to you as soon as we have an update.
As part of my trials, I tested queries in Sweden Central
For O1 mini, it is taking few seconds to answer and hitting rate limits with lower TPM.
For O3 mini, it works faster if we change the max completion tokens and Reasoning effort
But please provide the regions in concern here.
Thank you.
-
Arthur • 15 Reputation points
2025-02-20T08:27:05.0166667+00:00 Same issue on o3-mini. I tested the same requests on Azure OpenAI vs OpenAI API directly and Azure takes 5-10x longer to complete. Global Standard EastUS2 deployment.
-
DB • 5 Reputation points
2025-02-20T10:25:02.8333333+00:00 Same issue here. Also getting 504's when waiting for a response. Region Sweden Central..
-
Pietro Agasi • 10 Reputation points
2025-02-20T15:29:32.3433333+00:00 I have the same issues and is creating problems in my platform, simply the o3-mini model is not responding. Someone solved?
-
Arthur FLAJOLET • 10 Reputation points
2025-02-20T15:49:02.2866667+00:00 Same issue here for o1 and o3-mini models, both in sweden-central and us-east2. The latency is sometimes up to 30 minutes. This started on 02/18. This happens even for simple requests, such as "What is your name?" as others have already reported.
-
Martin Dreyer • 21 Reputation points
2025-02-20T18:05:20.6166667+00:00 Issue in my case is in East US2 region.
-
Damian • 15 Reputation points
2025-02-20T20:34:49.14+00:00 Still huge issue!! o3-mini Eastus2
-
wiseGoat94 • 5 Reputation points
2025-02-20T21:06:48.4733333+00:00 Same for us since 15th Feb 2025. Earlier it was super fast. Latency increase from 5 sec to 2 mins average with complex prompts taking over 10 mins.
-
Jon McKinney • 30 Reputation points
2025-02-21T02:23:49.8966667+00:00 Yes, same here. And getting run-around by microsoft:
-
Pedro Castelo Branco Lourenço • 41 Reputation points
2025-02-21T13:19:48.01+00:00 Any ETA on this issue? I see many different threads of people facing the very same issue. The lack of transparency and clear communication is depressing. If I look on Azure Status it shows there`s no issues.... but the truth is something else.
-
Pietro Agasi • 10 Reputation points
2025-02-21T16:35:32.97+00:00 Yes, exactly. At first, I thought it was an issue only on my end since there were no updates from Azure about the problem. However, it’s actually affecting everyone.
-
Salem Elmrayed • 25 Reputation points
2025-02-21T19:12:00.91+00:00 No response from MSFT. Azure AI models are DOWN.
-
Damian • 15 Reputation points
2025-02-22T12:19:34.8833333+00:00 It's crazy how bad the experience is, and that no one is doing anything.
-
Alberto Romero • 0 Reputation points
2025-02-22T22:07:02.0133333+00:00 Been happening since 3 days ago. Hang ups for every api call in o1 model. Still no solution from Microsoft.
-
Salem Elmrayed • 25 Reputation points
2025-02-22T22:18:27.5233333+00:00 We had to rollback to 4o for now
-
Vikram Singh • 1,980 Reputation points • Microsoft Employee
2025-02-24T04:25:44.9766667+00:00 Hi Movin Silva,
Thanks for reaching out to us.
There is an ongoing issue in the EastUS2 region and the SwedenCentral region, which has caused unexpected latency in response times. The issue in the Sweden region has been successfully mitigated.
The product team is actively working on the issue, and we will provide more details as soon as we receive further information. We will keep you updated.
I hope this helps.
Thanks
-
George Lubomirov • 0 Reputation points
2025-02-24T11:41:06.9933333+00:00 This is getting worse. Was in the 6 min range, now it timeouts.
-
Martin Dreyer • 21 Reputation points
2025-02-24T22:27:32.76+00:00 In EastUS2 as at right now I find I'm getting good response times again. Back to normal.
-
Vikram Singh • 1,980 Reputation points • Microsoft Employee
2025-02-28T06:23:13.65+00:00 Hi Movin Silva & Everyone
Thank you for your patience. Latest update and for EastUS2 region -
Since this issue was caused by limit capacity, the product team update the design to enable dynamic routing which will solve time out issues.
The dynamic routing is built and will be release step by step, hope to cover most of the customers by the end of this week.
There are some options for now, one is Provisioned Managed sku for latency sensitive workload, and you may want to switch to other regions as alternative plan for this week.
Please refer to this document to Provisioned Managed sku - https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/provisioned-throughput-onboarding
Other regions/version for o1 -
I will update here again once this issue confirmed resolved, will let you know more, I hope this helps.
Sign in to comment