Azure openai embedding API takes 1.2 sec.. How to reduce this to 200ms

crawlinknetworks Test 0

I am using azure openai service. For embedding if i am sending the request continuously then it takes around 200ms. If there is no request sent for 1min, the next request takes 1.2 sec which is high. How to approach this issue??

1 answer

Marcin Policht 33,775 Reputation points MVP

2025-02-03T15:18:26.8066667+00:00
Here are a few options

Keep-Alive Requests (Low-Resource Pings)

Instead of sending large embedding requests, send a lightweight request (e.g., a short string) every 30-45 seconds to keep the service active.

Optimize Request Frequency & Batching

Instead of sending embeddings one by one, batch multiple requests together where possible. This reduces the number of cold starts.

Use Connection Reuse (Persistent HTTP Connection)

If making frequent calls, ensure you are reusing the same HTTP connection instead of opening a new connection for each request.

Use an HTTP client that supports persistent connections (e.g., in Python, use requests.Session() or httpx.Client()).

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin
Please sign in to rate this answer.
crawlinknetworks Test 0 Reputation points

2025-02-04T06:19:26.27+00:00

import axios from "axios";

import http from "http";

import https from "https";

// Create HTTP and HTTPS agents with keepAlive enabled

const httpAgent = new http.Agent({ keepAlive: true });

const httpsAgent = new https.Agent({ keepAlive: true });

// Create a reusable Axios instance

const axiosInstance = axios.create({

httpAgent,

httpsAgent,

headers: {

"Content-Type": "application/json",

},

});

export async function generateAzureOpenaiEmbeddings(input: string, c: Context): Promise<any> {

const azureApiKey = c.env.AZURE_OPENAI_EMBEDDING_API_KEY;

const embeddingUrl = c.env.AZURE_OPENAI_EMBEDDING_URL;

if (!input || typeof input !== "string") {

console.error("Input text is required and must be a string."); return { data: [] };

}

try {

const response = await axiosInstance.post( embeddingUrl, { input }, { headers: { "api-key": azureApiKey }, } ); return response.data;

} catch (error) {

console.error("Error generating embeddings:", error); return { data: [] };

}

}

I am using typescript and i did as you told, but it looks like its not happening.Can you pls tell me if I am doing it correctly ??

santoshkc 12,035 Reputation points Microsoft Vendor

2025-02-04T13:34:00.6366667+00:00

Hi crawlinknetworks Test,

We have noticed that you rated an answer as not helpful. We appreciate your feedback and are committed to improving your experience with the Q&A.

Your TypeScript implementation looks correct, and you're using keepAlive: true for persistent HTTP connections. However, if you still notice delays after idle periods, try sending a lightweight "keep-alive" request every 30-45 seconds to keep the API warm. Ensure that your app reuses the same Axios instance instead of creating a new one for each request. Additionally, check for network latency issues, especially if your API is hosted in a different region from your server. If possible, consider moving your API deployment closer to your application to reduce response time.

Thank you.

Could please retake the survey on the above response.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Azure openai embedding API takes 1.2 sec.. How to reduce this to 200ms

1 answer

Your answer