Azure AI Foundry Completion Token Limit

Rishab Mehta 80 Reputation points
2025-02-21T15:47:50.8766667+00:00

Hello, I have deployed a Llama 3.3 70B model using Azure AI Foundry. As you can see in the image below from this page, the output limit should be 8192 tokens. User's image

The problem is that when I use the model with Azure AI Inference Completions, the max token limit is 4096. I see no way to adjust this API limit in AI Foundry. If I try to set max tokens above 4096 the API call gives me an azure.core.exceptions.HttpResponseError: (Bad Request) max_tokens must be less than or equal to 4096.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,159 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.