GPT-4o Deployment in West Europe - Severe Latency Issues

Guillaume Lameyse 0

Hi,

We rely heavily on Azure’s GPT-4o deployment as a key component of our application. Our model deployment (August 2024 version) is hosted in the West Europe region. Typically, our API calls take:

5 seconds for small requests
45–90 seconds for larger requests

However, today we experienced severe performance degradation:

Small calls took 50 seconds instead of 5 (10x increase).
Large calls never returned a response at all—there were no errors, just no response.

We tested the same model using OpenAI's API, and it performed as expected, so the issue appears specific to our Azure deployment.

Additional Details:

We were well below our tokens per minute (TPM) and requests per minute (RPM) limits.
No recent changes were made to our application’s logic or request patterns.
This is a critical component for us, and we plan to significantly scale our usage in the coming months, so reliability is a major concern.

Questions:

Is there any way to check the real-time status of our deployment?
Are there known issues or regional limitations affecting West Europe?
What steps can we take to ensure more stable performance and avoid similar incidents?

Share via

GPT-4o Deployment in West Europe - Severe Latency Issues

Your answer