Assistants playground vs APIs

Mohamed Hussein 650 Reputation points
2025-03-06T20:10:29.1933333+00:00

Good Day,

When using Azure Open Ai Assistants playground at Azure AI Foundry| Azure OpenAI Service, response is very fast, just 3-5 seconds till last token

But when using API, response is very slow reach 70 seconds to get last token?

Any work arounds?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,777 questions
0 comments No comments
{count} votes

Accepted answer
  1. Pavankumar Purilla 4,025 Reputation points Microsoft External Staff
    2025-03-06T22:28:20.5166667+00:00

    Hi Mohamed Hussein,
    Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.

    The Assistants Playground in Azure OpenAI Service is faster than the API due to several optimizations. The Playground runs on Azure's internal infrastructure, which prioritizes low-latency responses for an interactive experience. It also streams tokens as they are generated, giving the impression of near-instant responses. Additionally, network latency is minimal since the Playground operates within Azure's environment, unlike API calls that may experience delays due to network travel. The Playground might also use optimized default settings, such as smaller max_tokens or lower temperature, which contribute to faster processing.

    Instead, API responses can be slower due to several factors. Larger payloads with long prompts or high max_tokens require more time to process. If token streaming is disabled, the API waits until the entire response is generated before sending it back. Concurrency limits, throttling, or regional deployment choices can further impact performance.

    To improve API response times, consider enabling token streaming ("stream": true) to receive tokens as they are generated. Optimize request parameters by reducing max_tokens and adjusting temperature or top_p to simplify processing. Using shorter prompts and deploying in a region closer can also help. If experiencing throttling, scaling up the deployment may be necessary. Monitoring Azure's performance metrics can identify bottlenecks, and caching frequent responses can reduce redundant API calls. Implementing these optimizations can significantly enhance API response speed.

    I hope this information helps. Thank you!


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.