How to Stream Agent Responses

Article
10/09/2024

Warning

The Semantic Kernel Agent Framework is in preview and is subject to change.

What is a Streamed Response?

A streamed response delivers the message content in small, incremental chunks. This approach enhances the user experience by allowing them to view and engage with the message as it unfolds, rather than waiting for the entire response to load. Users can begin processing information immediately, improving the sense of responsiveness and interactivity. As a result, it minimizes delays and keeps users more engaged throughout the communication process.

Streaming References:

Streaming in Semantic Kernel

AI Services that support streaming in Semantic Kernel use different content types compared to those used for fully-formed messages. These content types are specifically designed to handle the incremental nature of streaming data. The same content types are also utilized within the Agent Framework for similar purposes. This ensures consistency and efficiency across both systems when dealing with streaming information.

Agents are currently unavailable in Java.

Streaming Agent Invocation

The Agent Framework supports streamed responses when using Agent Chat or when directly invoking a Chat Completion Agent or Open AI Assistant Agent. In either mode, the framework delivers responses asynchronously as they are streamed. Alongside the streamed response, a consistent, non-streamed history is maintained to track the conversation. This ensures both real-time interaction and a reliable record of the conversation's flow.

Streamed response from Chat Completion Agent

When invoking a streamed response from a Chat Completion Agent, the Chat History is updated after the full response is received. Although the response is streamed incrementally, the history records only the complete message. This ensures that the Chat History reflects fully formed responses for consistency.

// Define agent
ChatCompletionAgent agent = ...;

// Create a ChatHistory object to maintain the conversation state.
ChatHistory chat = [];

// Add a user message to the conversation
chat.Add(new ChatMessageContent(AuthorRole.User, "<user input>"));

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(chat))
{
  // Process streamed response(s)...
}

# Define agent
agent = ChatCompletionAgent(...)

# Create a ChatHistory object to maintain the conversation state.
chat = ChatHistory()

# Add a user message to the conversation
chat.add_message(ChatMessageContent(AuthorRole.USER, "<user input>"))

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(chat)
{
  # Process streamed response(s)...
}

Agents are currently unavailable in Java.

Streamed response from Open AI Assistant Agent

When invoking a streamed response from an Open AI Assistant Agent, an optional Chat History can be provided to capture the complete messages for further analysis if needed. Since the assistant maintains the conversation state as a remote thread, capturing these messages is not always necessary. The decision to store and analyze the full response depends on the specific requirements of the interaction.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
string threadId = await agent.CreateThreadAsync();

// Add a user message to the conversation
chat.Add(threadId, new ChatMessageContent(AuthorRole.User, "<user input>"));

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(threadId))
{
  // Process streamed response(s)...
}

// Delete the thread when it is no longer needed
await agent.DeleteThreadAsync(threadId);

# Define agent
agent = OpenAIAssistantAgent(...)

# Create a thread for the agent conversation.
thread_id = await agent.create_thread()

# Add user message to the conversation
await agent.add_chat_message(ChatMessageContent(role=AuthorRole.USER, content="<user input>"))

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(thread_id=thread_id):
  # Process streamed response(s)...

Agents are currently unavailable in Java.

Streaming with Agent Chat

When using Agent Chat, the full conversation history is always preserved and can be accessed directly through the Agent Chat instance. Therefore, the key difference between streamed and non-streamed invocations lies in the delivery method and the resulting content type. In both cases, users can still access the complete history, but streamed responses provide real-time updates as the conversation progresses. This allows for greater flexibility in handling interactions, depending on the application's needs.

// Define agents
ChatCompletionAgent agent1 = ...;
OpenAIAssistantAgent agent2 = ...;

// Create chat with participating agents.
AgentGroupChat chat =
  new(agent1, agent2)
  {
    // Override default execution settings
    ExecutionSettings =
    {
        TerminationStrategy = { MaximumIterations = 10 }
    }
  };

// Invoke agents
string lastAgent = string.Empty;
await foreach (StreamingChatMessageContent response in chat.InvokeStreamingAsync())
{
    if (!lastAgent.Equals(response.AuthorName, StringComparison.Ordinal))
    {
        // Process begining of agent response
        lastAgent = response.AuthorName;
    }

    // Process streamed content...
}

# Define agents
agent1 = ChatCompletionAgent(...)
agent2 = OpenAIAssistantAgent(...)

# Create chat with participating agents
chat = AgentGroupChat(
  agents=[agent1, agent2],
  termination_strategy=DefaultTerminationStrategy(maximum_iterations=10),
)

# Invoke agents
last_agent = None
async for response in chat.invoke_stream():
    if message.content is not None:
        if last_agent != response.name:
            # Process beginning of agent response
            last_agent = message.name
        # Process streamed content

Agents are currently unavailable in Java.

Share via

How to Stream Agent Responses

What is a Streamed Response?

Streaming References:

Streaming in Semantic Kernel

Streaming Agent Invocation

Streamed response from Chat Completion Agent

Streamed response from Open AI Assistant Agent

Streaming with Agent Chat

Additional resources