Trouble with TPM Azure AI

Bojana Taseva 0 Reputation points
2025-02-03T14:59:55.9333333+00:00

I have created an Azure AI bot using a trial Azure account, which provides a limit of 1,000 TPM and 6 RPM. My setup includes:

Data Source: SharePoint (indexed via Azure AI Search)

Search Index: One document with a single page Search Service Tier: S1

Model Used: GPT-35-Turbo-16K

Despite indexing only one document, the 1,000 TPM limit appears insufficient even for a single search query. I would appreciate clarification on the following:

Does indexing affect the token usage because while testing simple questions not related to SharePoint resources, I still hit over 1,000 TPM (when having Sharepoint index data source)

Why does 1,000 TPM seem inadequate even when querying just one document?

How can I calculate the actual token consumption per file or query?

What are the best practices to optimize token usage and reduce unnecessary token consumption?

Why does 1,000 TPM seem inadequate even when querying just one small document of 19 KB?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,185 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
11,209 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,659 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,123 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Manas Mohanty 540 Reputation points Microsoft Vendor
    2025-02-04T04:51:16.54+00:00

    Hi Bojana Taseva!

    Welcome to Azure OpenAI Q and A forum. Thank you for posting your query here.

    Sorry for the inconvenience caused.

    I see you are facing high token usage from GPT 3.5 Turbo 16 for a 19 kb Share point file.

    Here is the answer to your queries.

    1.On High token usage:

    From OpenAI side:

    Token consumption may rise due to high query size and complexity in query. It can handle 16k tokens max (context window size).

    From AI search indexing:

    You can optimize your index schema to reduce size of index.

    Reference on AI search.

    2. On optimizing token usage to avoid high token usage :

    • You can reduce max_token in model deployment to lower output size.
    • You can ask model to answer in specific word limit . for e.g." Please summarize the scenario and keep word count under 50 words"

    Keep your queries simple, precise instead of longer queries and adopt multi-shot prompting to get desired answer.

    3. Finding token used in each prompt.

    You can find your token usage (completion, prompt token and total tokens) under usage section in your ouput.

      "usage": 
             { "completion_tokens": 39, 
               "prompt_tokens": 58, 
               "total_tokens": 97 }
    

    Reference on output usage

    Please upvote the answer and say "yes" if the answer was useful to you.

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.