Azure OpenAI: o3-mini deployment has 1 minute hard timeout via API call

Question

When using o3-mini with stream: true the API cuts the connection after 1 minute waiting for some event.

Request example:

curl --location 'https://host.openai.azure.com/openai/deployments/o3-mini/chat/completions?api-version=2025-01-01-preview' \
--header 'api-key: 123' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "role": "user",
            "content": "complex task here"
        }
    ],
    "reasoning_effort": "high",
    "max_completion_tokens": 60000,
    "n": 1,
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "responseFormat",
            "schema": {
                "type": "object",
                "properties": {
                    "files": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {
                                    "type": "string"
                                },
                                "content": {
                                    "type": "string"
                                }
                            },
                            "required": [
                                "name",
                                "content"
                            ],
                            "additionalProperties": false
                        }
                    }
                },
                "required": [
                    "files"
                ],
                "additionalProperties": false
            },
            "strict": true
        }
    },
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "user": "app_math_teacher"
}'

Response:

408 Timeout

HTTP/1.1 408 Timeout
Content-Length: 75
Content-Type: application/json
apim-request-id: 8e3359d8-b764-490c-9de8-94ae3a55343e
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
x-ms-region: East US 2
x-ratelimit-remaining-requests: 1048
x-ratelimit-remaining-tokens: 10390000
Date: Tue, 04 Feb 2025 01:10:38 GMT
 
{ "error": { "code": "Timeout", "message": "The operation was timeout." } }

OBS: This only occurs with stream: true

Local: East US 2

Answer

Hi Andres da Silva Santos,

Thanks for posting your question on Microsoft Q&A.

This behavior could be related to the rate limits and quotas set for Azure OpenAI services. Azure OpenAI Service evaluates the rate of incoming requests over short periods, typically 1 or 10 seconds. If the number of requests exceeds the expected rate, new requests may receive a timeout or rate limit response.

To address this issue, consider the following steps:

Review Rate Limits: Ensure that your request rate is within the allowed limits. You can find more details on managing Azure OpenAI Service quotas.
Optimize Request Distribution: Distribute your requests evenly over time to avoid exceeding the rate limits.
Increase Quotas: If necessary, you can request an increase in your service quotas through the Azure portal.

For more detailed guidance, you can refer to the Azure OpenAI Service quota management documentation.

If the reply was helpful, please don't forget to upvote and/or accept it as an answer. Let me know if you have any other queries.

Thank you!

Share via

Azure OpenAI: o3-mini deployment has 1 minute hard timeout via API call

1 answer

Your answer