Severe Latency in Azure OpenAI Services (o1 and o3-mini Models) – Response Times Over 2 Minutes for Simple Queries

Movin Silva 80

We are experiencing significant performance issues with the OpenAI models (o1 and o3-mini) on Azure, even within the Azure Playground. For simple queries like "Who are you?", the response time exceeds 2 minutes, which is far from normal. This delay is causing considerable disruption, and I have verified that the issue persists consistently.

Could you please investigate the cause of this latency and provide a resolution? The performance seems to be abnormally slow for trivial tasks, and I would appreciate guidance on resolving this.

Thank you for your assistance!

Josh Sirota 25 Reputation points

2025-02-19T19:25:06.5166667+00:00

I'm seeing the same issue as of February 18, 2025. Also seeing openai.TimeoutErrors.
Salem Elmrayed 5 Reputation points

2025-02-19T19:28:01.26+00:00

Same issue here O1 takes more than 10 minutes to generate response
Martin Dreyer 1 Reputation point

2025-02-19T21:15:57.5733333+00:00

We too are experiencing very slow responses from o3-mini, last few days.
Manas Mohanty 655 Reputation points Microsoft Vendor

2025-02-20T05:51:08.4566667+00:00

Hi Movin Silva and Everyone

We are able to replicate this scenario for O1 models and reaching out to PG on this and will get back to you as soon as we have an update.

As part of my trials, I tested queries in Sweden Central

For O1 mini, it is taking few seconds to answer and hitting rate limits with lower TPM.

For O3 mini, it works faster if we change the max completion tokens and Reasoning effort

But please provide the regions in concern here.

Thank you.
Arthur 5 Reputation points

2025-02-20T08:27:05.0166667+00:00

Same issue on o3-mini. I tested the same requests on Azure OpenAI vs OpenAI API directly and Azure takes 5-10x longer to complete. Global Standard EastUS2 deployment.
DB 5 Reputation points

2025-02-20T10:25:02.8333333+00:00

Same issue here. Also getting 504's when waiting for a response. Region Sweden Central..
Pietro Agasi 0 Reputation points

2025-02-20T15:29:32.3433333+00:00

I have the same issues and is creating problems in my platform, simply the o3-mini model is not responding. Someone solved?
Arthur FLAJOLET 0 Reputation points

2025-02-20T15:49:02.2866667+00:00

Same issue here for o1 and o3-mini models, both in sweden-central and us-east2. The latency is sometimes up to 30 minutes. This started on 02/18. This happens even for simple requests, such as "What is your name?" as others have already reported.
Martin Dreyer 1 Reputation point

2025-02-20T18:05:20.6166667+00:00

Issue in my case is in East US2 region.
Damian 0 Reputation points

2025-02-20T20:34:49.14+00:00

Still huge issue!! o3-mini Eastus2
wiseGoat94 0 Reputation points

2025-02-20T21:06:48.4733333+00:00

Same for us since 15th Feb 2025. Earlier it was super fast. Latency increase from 5 sec to 2 mins average with complex prompts taking over 10 mins.

5 answers

Patel, Harshal 0 Reputation points

2025-02-19T14:36:38.76+00:00

Facing similar issue, the O1 model takes more than 2 minutes to respond to a "Hi" text. Same behavior on both Azure OpenAI SDK as well as a plain RESTful API.
Please sign in to rate this answer.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
Juan Garassino 0 Reputation points

2025-02-19T20:58:01.88+00:00

Same Issue here, seems like theres not to many talking about it.
Please sign in to rate this answer.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
Subhrajit Bhowmik 20 Reputation points

2025-02-19T21:06:28.0133333+00:00

Same issue, our company's RAG is failing because of severe delay in response generation from GPT o3-mini model resulting in timeout. Any advice from MSFT ? Any known workarounds from the community ?
Please sign in to rate this answer.
Patel, Harshal 0 Reputation points

2025-02-19T21:10:04.5+00:00

From what I can tell, downgrade the LLM model to a 4o or 4o-mini as they are working as expected. Seems like only the o* family is affected by this.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
Subhrajit Bhowmik 20 Reputation points

2025-02-19T21:22:44.2433333+00:00

Seems like they have internally reduced the Rate limit, the deployment for us shows Rate limit (Tokens per minute) 3,570,000 but when I press edit it shows 1k. IS there anyone from Azure team that can tell us why this was changed ? Lower rate limit will cause generation delays causing a timeout
Please sign in to rate this answer.
Jon McKinney 10 Reputation points

2025-02-21T02:23:49.8966667+00:00

Yes, same here. And getting run-around by microsoft:

https://learn.microsoft.com/en-us/answers/questions/2169293/o1-and-o3-mini-etc-all-taking-extra-long-to-respon?comment=question#newest-question-comment
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
Pedro Castelo Branco Lourenço 6 Reputation points

2025-02-21T13:19:48.01+00:00

Any ETA on this issue? I see many different threads of people facing the very same issue. The lack of transparency and clear communication is depressing. If I look on Azure Status it shows there`s no issues.... but the truth is something else.
Please sign in to rate this answer.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Severe Latency in Azure OpenAI Services (o1 and o3-mini Models) – Response Times Over 2 Minutes for Simple Queries

5 answers

Your answer