gpt-4o | requested additional TPUs not showing up

Ivo Bacco 0 Reputation points
2024-12-09T11:09:11.59+00:00

We use the Azure Open AI service, with the gpt-4o model, integrated into a custom application and to process documents and extract information. 

The extracted information must be processed and available as soon as possible to the user users. 

These are the features and needs of the solution:

  • the number of pages of documents to be processed is variable: on average it is 15 but with peaks of 100 pages.
  • the average number of tokens per page is 1k
  • the number of documents in parallel processing must not be less than 4
  • processing times must be minimal

 

At present, the current limit of 30k TPM is too low and creates queues in the processing pipeline; therefore, a higher token rate limit (ideally between 100k and 200k TPM) is needed to allow simultaneous, latency-free processing. 

We request for TPU under the Provisioned-Managed deployment type, but we are unable to use the additional TPU with gpt-4o model, they are only available in gpt-4.

Are there other ways to increase the TPM? Or how do I make the PTU available also with gpt-4o?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,472 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.