Azure AI Foundry Completion Token Limit

Rishab Mehta 80

Hello, I have deployed a Llama 3.3 70B model using Azure AI Foundry. As you can see in the image below from this page, the output limit should be 8192 tokens.

The problem is that when I use the model with Azure AI Inference Completions, the max token limit is just 4096. If I try to even set the max_tokens field above 4096 in my API call I get a HTTP bad request error. Is there some way to change this so the max completion tokens are 8192 as was claimed?

Share via

Azure AI Foundry Completion Token Limit

Your answer