Azure AI Foundry Completion Token Limit

Rishab Mehta 80

Hello, I have deployed a Llama 3.3 70B model using Azure AI Foundry. As you can see in the image below from this page, the output limit should be 8192 tokens. User's image

The problem is that when I use the model with Azure AI Inference Completions, the max token limit is 4096. I see no way to adjust this API limit in AI Foundry. If I try to set max tokens above 4096 the API call gives me an azure.core.exceptions.HttpResponseError: (Bad Request) max_tokens must be less than or equal to 4096.

Share via

Azure AI Foundry Completion Token Limit

Your answer