Azure openai embedding API takes 1.2 sec.. How to reduce this to 200ms

crawlinknetworks Test 0 Reputation points
2025-02-03T15:04:22.2666667+00:00

I am using azure openai service. For embedding if i am sending the request continuously then it takes around 200ms. If there is no request sent for 1min, the next request takes 1.2 sec which is high. How to approach this issue??

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,615 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,093 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Marcin Policht 33,775 Reputation points MVP
    2025-02-03T15:18:26.8066667+00:00

    Here are a few options

    1. Keep-Alive Requests (Low-Resource Pings)
      • Instead of sending large embedding requests, send a lightweight request (e.g., a short string) every 30-45 seconds to keep the service active.
    2. Optimize Request Frequency & Batching
      • Instead of sending embeddings one by one, batch multiple requests together where possible. This reduces the number of cold starts.
    3. Use Connection Reuse (Persistent HTTP Connection)
      • If making frequent calls, ensure you are reusing the same HTTP connection instead of opening a new connection for each request.
      • Use an HTTP client that supports persistent connections (e.g., in Python, use requests.Session() or httpx.Client()).

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.