How caching works in Azure Open AI
Hi community,
I am using Azure Open AI as LLM Provider using GPT4o version 2024-11-20 (region Sweden Central), but I have problems in understanding how Prompt Caching works. I read https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching and it seems it should be automatically available for my model and api_version for prompts longer than 1014 tokens. The problem is that I very rarely see a cache hit (even with always a static prompt), I post a sample code here below, can someone help me in understanding if I am missing something or if is a Provider error? Thanks in advance!
import os
from openai import AzureOpenAI
endpoint = os.getenv("ENDPOINT_URL")
deployment = "gpt-4o"
subscription_key = os.getenv("AZURE_OPENAI_API_KEY)
client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=subscription_key,
api_version="2024-10-01-preview",
)
text = "Testing cache " * 3000
# Preparare la richiesta di chat
chat_prompt = [
{
"role": "system",
"content": [
{
"type": "text",
"text": "L'utente è un assistente per l'intelligenza artificiale che consente alle persone di trovare informazioni.",
}
],
},
{"role": "user", "content": text},
]
messages = chat_prompt
completion = client.chat.completions.create(
model=deployment,
messages=messages,
max_tokens=800,
temperature=0.7,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None,
stream=False,
)
print(completion.usage.to_json())