Trouble with TPM Azure AI

Question

I have created an Azure AI bot using a trial Azure account, which provides a limit of 1,000 TPM and 6 RPM. My setup includes:

Data Source: SharePoint (indexed via Azure AI Search)

Search Index: One document with a single page Search Service Tier: S1

Model Used: GPT-35-Turbo-16K

Despite indexing only one document, the 1,000 TPM limit appears insufficient even for a single search query. I would appreciate clarification on the following:

Does indexing affect the token usage because while testing simple questions not related to SharePoint resources, I still hit over 1,000 TPM (when having Sharepoint index data source)

Why does 1,000 TPM seem inadequate even when querying just one document?

How can I calculate the actual token consumption per file or query?

What are the best practices to optimize token usage and reduce unnecessary token consumption?

Why does 1,000 TPM seem inadequate even when querying just one small document of 19 KB?

Answer

Hi Bojana Taseva!

Welcome to Azure OpenAI Q and A forum. Thank you for posting your query here.

Sorry for the inconvenience caused.

I see you are facing high token usage from GPT 3.5 Turbo 16 for a 19 kb Share point file.

Here is the answer to your queries.

1.On High token usage:

From OpenAI side:

Token consumption may rise due to high query size and complexity in query. It can handle 16k tokens max (context window size).

From AI search indexing:

You can optimize your index schema to reduce size of index.

Reference on AI search.

2. On optimizing token usage to avoid high token usage :

You can reduce max_token in model deployment to lower output size.
You can ask model to answer in specific word limit . for e.g." Please summarize the scenario and keep word count under 50 words"

Keep your queries simple, precise instead of longer queries and adopt multi-shot prompting to get desired answer.

3. Finding token used in each prompt.

You can find your token usage (completion, prompt token and total tokens) under usage section in your ouput.

  "usage": 
         { "completion_tokens": 39, 
           "prompt_tokens": 58, 
           "total_tokens": 97 }

Reference on output usage

Please upvote the answer and say "yes" if the answer was useful to you.

Thank you.

Share via

Trouble with TPM Azure AI

1 answer

Your answer