How to fix broken azure openai model deployment?
Our deployment, gpt-4o-data-zone, was unavailable for over four hours (we noticed it at Feb 27, 2025 at 10:28:30.825 am CET), impacting our operations significantly. All calls were failing with status 500 model_error. Do you know what could be a cause of this outage? More importantly, we would appreciate your guidance on how to handle such incidents in the future to minimize potential disruptions, as this could have a substantial negative impact on our customers. We tried re-creating the deployment but that didn't help either. Is there anything we could do on our side to recover from it?
We tried using it from postman and your chat UI as well but we got the same issue:
Kind regards,
Aleksandar