Azure OpenAI API Caching Issue with Model `gpt-4o-mini-2024-07-18`
There is an issue being faced with the Azure OpenAI service.
Using the OpenAI model version gpt-4o-mini-2024-07-18
and Azure API version 2024-10-21
, it was noted from Azure's documentation that both the OpenAI model and API versions should be eligible for caching.
The setup includes a static system prompt and a dynamic user prompt, with the system prompt being about 2000 tokens, which should enable caching.
However, after making over 50 API calls (not all concurrent), the OpenAI API reflected about 70% cached tokens, while the Azure OpenAI API showed a mere 0.1% cached tokens. Is this a recognized issue, and is anyone else experiencing similar results?