It sounds like you're encountering an issue with the token limit for the gpt-4o and gpt-4o-mini models in Azure OpenAI.
Documentation vs. Reality: The documentation states a 128k token limit, but you're experiencing a 35k limit. This discrepancy could be due to several factors, including model updates or specific configurations in your deployment.
Possible Causes and Solutions
- Pricing Tier: The Standard S0 tier should support the documented token limits. However, it's worth verifying if there are any specific limitations or quotas associated with your subscription tier. You can check this in the Azure portal under your OpenAI resource settings
- Configuration Settings: Ensure that your deployment settings are correctly configured to support the higher token limit. Sometimes, default settings might impose lower limits.
Context Limit vs. Completion Limit
There is a difference between the context limit and the completion limit:
Context Limit: This refers to the maximum number of tokens that can be included in the input prompt. For example, if the context limit is 128k tokens, you can provide up to 128k tokens worth of input data for the model to consider when generating a response.
Completion Limit: This refers to the maximum number of tokens that the model can generate in its response. For instance, if the completion limit is 4k tokens, the model can generate up to 4k tokens in its output, regardless of the context size.
In your scenario, it seems like you are hitting a limit at 35k tokens, which might be a combined limit for both context and completion. This could be due to specific configurations or limitations in your deployment settings.
Next Steps
Configuration Settings in Azure AI Foundry
- Access the AI Foundry Portal: Go to the Azure AI Foundry portal and sign in with your Azure subscription that has your Azure OpenAI Service resource.
- Check Model Deployments: From the Azure OpenAI Service resource view, click on "Model deployments" under the "Shared Resources" group.
- Verify Configuration Settings: Ensure that your deployment settings, such as token limits and pricing tiers, are correctly configured. If you click on a deployment and then edit you can adjust settings like the maximum token limit, model version, and other parameters to match your requirements. https://learn.microsoft.com/en-us/azure/ai-studio/ai-services/how-to/connect-azure-openai
By following these steps, you should be able to verify and adjust the configuration settings for your Azure OpenAI deployment effectively.
*Microsoft Copilot helped me organize my thoughts. I verified the results and walked through editing one of my deployments.