Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,159 questions
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Hello, I have deployed a Llama 3.3 70B model using Azure AI Foundry. As you can see in the image below from this page, the output limit should be 8192 tokens.
The problem is that when I use the model with Azure AI Inference Completions, the max token limit is 4096. I see no way to adjust this API limit in AI Foundry. If I try to set max tokens above 4096 the API call gives me an azure.core.exceptions.HttpResponseError: (Bad Request) max_tokens must be less than or equal to 4096.