gpt-4o | requested additional TPUs not showing up
We use the Azure Open AI service, with the gpt-4o model, integrated into a custom application and to process documents and extract information.
The extracted information must be processed and available as soon as possible to the user users.
These are the features and needs of the solution:
- the number of pages of documents to be processed is variable: on average it is 15 but with peaks of 100 pages.
- the average number of tokens per page is 1k
- the number of documents in parallel processing must not be less than 4
- processing times must be minimal
At present, the current limit of 30k TPM is too low and creates queues in the processing pipeline; therefore, a higher token rate limit (ideally between 100k and 200k TPM) is needed to allow simultaneous, latency-free processing.
We request for TPU under the Provisioned-Managed deployment type, but we are unable to use the additional TPU with gpt-4o model, they are only available in gpt-4.
Are there other ways to increase the TPM? Or how do I make the PTU available also with gpt-4o?