GPT-4o Deployment in West Europe - Severe Latency Issues

Guillaume Lameyse 0 Reputation points
2025-02-21T16:45:05.8533333+00:00

Hi,

We rely heavily on Azure’s GPT-4o deployment as a key component of our application. Our model deployment (August 2024 version) is hosted in the West Europe region. Typically, our API calls take:

  • 5 seconds for small requests
  • 45–90 seconds for larger requests

However, today we experienced severe performance degradation:

  • Small calls took 50 seconds instead of 5 (10x increase).
  • Large calls never returned a response at all—there were no errors, just no response.

We tested the same model using OpenAI's API, and it performed as expected, so the issue appears specific to our Azure deployment.

Additional Details:

  • We were well below our tokens per minute (TPM) and requests per minute (RPM) limits.
  • No recent changes were made to our application’s logic or request patterns.
  • This is a critical component for us, and we plan to significantly scale our usage in the coming months, so reliability is a major concern.

Questions:

  1. Is there any way to check the real-time status of our deployment?
  2. Are there known issues or regional limitations affecting West Europe?
  3. What steps can we take to ensure more stable performance and avoid similar incidents?
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,159 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.