Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,159 questions
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Hello, I have deployed a Llama 3.3 70B model using Azure AI Foundry. As you can see in the image below from this page, the output limit should be 8192 tokens.
The problem is that when I use the model with Azure AI Inference Completions, the max token limit is just 4096. If I try to even set the max_tokens field above 4096 in my API call I get a HTTP bad request error. Is there some way to change this so the max completion tokens are 8192 as was claimed?