GPT-4o has a 35k input token limit

Jon Mendizabal 20 Reputation points
2024-11-21T21:47:30.3133333+00:00

I'm using Azure Openai in one of my applications, and it looks like both gpt-4o and gpt-4o-mini have a 35k input token limit, even though the documentation says it's 128k.

I am checking the actual input tokens in the chat completion response, and 35k seems to be the limit, if I send a request slightly surpasing 35k tokens, the chat completion request never completes, the model never responds.

Am I missing something? Is there any scenario where this is a thing? Could it be the pricing tier (I'm using Standard S0) or any setting I need to modify?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,324 questions
{count} votes

Accepted answer
  1. Max Lacy 180 Reputation points
    2024-11-22T16:57:46.4833333+00:00

    It sounds like you're encountering an issue with the token limit for the gpt-4o and gpt-4o-mini models in Azure OpenAI.

    Documentation vs. Reality: The documentation states a 128k token limit, but you're experiencing a 35k limit. This discrepancy could be due to several factors, including model updates or specific configurations in your deployment.

    Possible Causes and Solutions

    1. Pricing Tier: The Standard S0 tier should support the documented token limits. However, it's worth verifying if there are any specific limitations or quotas associated with your subscription tier. You can check this in the Azure portal under your OpenAI resource settings
    2. Configuration Settings: Ensure that your deployment settings are correctly configured to support the higher token limit. Sometimes, default settings might impose lower limits.

    Context Limit vs. Completion Limit

    There is a difference between the context limit and the completion limit:

    Context Limit: This refers to the maximum number of tokens that can be included in the input prompt. For example, if the context limit is 128k tokens, you can provide up to 128k tokens worth of input data for the model to consider when generating a response.

    Completion Limit: This refers to the maximum number of tokens that the model can generate in its response. For instance, if the completion limit is 4k tokens, the model can generate up to 4k tokens in its output, regardless of the context size.

    In your scenario, it seems like you are hitting a limit at 35k tokens, which might be a combined limit for both context and completion. This could be due to specific configurations or limitations in your deployment settings.

    Next Steps

    Configuration Settings in Azure AI Foundry

    1. Access the AI Foundry Portal: Go to the Azure AI Foundry portal and sign in with your Azure subscription that has your Azure OpenAI Service resource.
    2. Check Model Deployments: From the Azure OpenAI Service resource view, click on "Model deployments" under the "Shared Resources" group. User's image
    3. Verify Configuration Settings: Ensure that your deployment settings, such as token limits and pricing tiers, are correctly configured. If you click on a deployment and then edit you can adjust settings like the maximum token limit, model version, and other parameters to match your requirements. User's image https://learn.microsoft.com/en-us/azure/ai-studio/ai-services/how-to/connect-azure-openai

    By following these steps, you should be able to verify and adjust the configuration settings for your Azure OpenAI deployment effectively.

    *Microsoft Copilot helped me organize my thoughts. I verified the results and walked through editing one of my deployments.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.