Azure OpenAI: o3-mini deployment has 1 minute hard timeout via API call

Andres da Silva Santos 135 Reputation points
2025-02-03T21:22:30.89+00:00

When using o3-mini with stream: true the API cuts the connection after 1 minute waiting for some event.

Request example:

curl --location 'https://host.openai.azure.com/openai/deployments/o3-mini/chat/completions?api-version=2025-01-01-preview' \
--header 'api-key: 123' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "role": "user",
            "content": "complex task here"
        }
    ],
    "reasoning_effort": "high",
    "max_completion_tokens": 60000,
    "n": 1,
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "responseFormat",
            "schema": {
                "type": "object",
                "properties": {
                    "files": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {
                                    "type": "string"
                                },
                                "content": {
                                    "type": "string"
                                }
                            },
                            "required": [
                                "name",
                                "content"
                            ],
                            "additionalProperties": false
                        }
                    }
                },
                "required": [
                    "files"
                ],
                "additionalProperties": false
            },
            "strict": true
        }
    },
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "user": "app_math_teacher"
}'

Response:

408 Timeout

HTTP/1.1 408 Timeout
Content-Length: 75
Content-Type: application/json
apim-request-id: 8e3359d8-b764-490c-9de8-94ae3a55343e
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
x-ms-region: East US 2
x-ratelimit-remaining-requests: 1048
x-ratelimit-remaining-tokens: 10390000
Date: Tue, 04 Feb 2025 01:10:38 GMT
 
{ "error": { "code": "Timeout", "message": "The operation was timeout." } }

OBS: This only occurs with stream: true

Local: East US 2

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,633 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vikram Singh 1,460 Reputation points Microsoft Employee
    2025-02-04T06:50:44.27+00:00

    Hi Andres da Silva Santos,

    Thanks for posting your question on Microsoft Q&A.

    This behavior could be related to the rate limits and quotas set for Azure OpenAI services. Azure OpenAI Service evaluates the rate of incoming requests over short periods, typically 1 or 10 seconds. If the number of requests exceeds the expected rate, new requests may receive a timeout or rate limit response.

    To address this issue, consider the following steps:

    1. Review Rate Limits: Ensure that your request rate is within the allowed limits. You can find more details on managing Azure OpenAI Service quotas.
    2. Optimize Request Distribution: Distribute your requests evenly over time to avoid exceeding the rate limits.
    3. Increase Quotas: If necessary, you can request an increase in your service quotas through the Azure portal.

    For more detailed guidance, you can refer to the Azure OpenAI Service quota management documentation.

    If the reply was helpful, please don't forget to upvote and/or accept it as an answer. Let me know if you have any other queries.

    Thank you!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.