Here are a few options
- Keep-Alive Requests (Low-Resource Pings)
- Instead of sending large embedding requests, send a lightweight request (e.g., a short string) every 30-45 seconds to keep the service active.
- Optimize Request Frequency & Batching
- Instead of sending embeddings one by one, batch multiple requests together where possible. This reduces the number of cold starts.
- Use Connection Reuse (Persistent HTTP Connection)
- If making frequent calls, ensure you are reusing the same HTTP connection instead of opening a new connection for each request.
- Use an HTTP client that supports persistent connections (e.g., in Python, use
requests.Session()
orhttpx.Client()
).
If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.
hth
Marcin