How an agent works in the Microsoft 365 Agents SDK (preview)

Статья
12/03/2024

[This article is prerelease documentation and is subject to change.]

The Microsoft 365 Agents SDK is a collection of libraries and tools that let you code an agent. The Agents SDK facilitates communication between a client and agent and provides an easy path to incorporate Microsoft AI services, such as Graph, Azure OpenAI as well as non-Microsoft AI services. The SDK works together with the Azure Bot Service to easily host agents and makes them available to channels, such as Microsoft Teams, Facebook, or Slack.

The SDK sends information between the client the user is interacting with, and the agent. Each channel can include additional information in the activities they send. This information often includes channel-specific data unique to that client. Before creating agents, it's important to understand how an agent built using the SDK uses the activity protocol to communicate with its users.

In the Echobot agent sample, notice that there's a section in the sample that sends a conversation update when a party joins the conversation.

protected override async Task OnMembersAddedAsync(IList<ChannelAccount> membersAdded, ITurnContext<IConversationUpdateActivity> turnContext, CancellationToken cancellationToken)
        {
            IActivity message = MessageFactory.Text("Hello and Welcome!");
            foreach (var member in membersAdded)
            {
                if (member.Id != turnContext.Activity.Recipient.Id)
                {
                    await turnContext.SendActivityAsync(message, cancellationToken);
                }
            }      
        }

For example, on starting a conversation locally (for example via the emulator or dev tunnels), you might see two conversation update activities (one for the user joining the conversation and one for the agent joining). To distinguish these conversation update activities, check who is included in the membersAdded property of the activity.

The message activity carries conversation information between the parties. In the Echo agent example, the message activities are carrying simple text and the channel renders this text. Alternatively, the message activity might carry text to render into adaptive cards or text to be spoken.

Tip

It's up to the channel/client to interpret and implement the activity protocol. How each channel does this might be a little different based on what a client supports and functions. As a developer, you have to understand how the client works and functions and not assume they all implement it the same way. For example, some channels send conversation update activities first, and some send conversation update activities after they send the first message activity. A channel might include the agent and user in one conversation update activity, while another might send two conversation update activities.

Support for features varies by channel. You can test your agent using the Bot Framework Emulator, but you should also test all features of your agent on each channel in which you intend to make your agent available.

The Agents SDK

The Agents SDK allows you to build agents that can be hosted on the Azure AI Agent Service. The service defines a REST API and an activity protocol for how your agent and channels or users can interact. The Agents SDK builds upon this REST API and provides an abstraction of the service working between the client and the agent. This way you can focus on the conversational logic. While you don't need to understand the REST service to use the Agents SDK, you should understand some of its key features.

Agents you built using the Agents SDK typically have a conversational interface. You can use agents to shift simple, repetitive tasks, such as taking a dinner reservation or gathering profile information, to automated systems that might no longer require direct human intervention. Users converse with an agent using text, interactive cards, and speech. An agent interaction can be a quick single turn question and answer, or it can be a sophisticated multi-turn conversation that intelligently provides access to services.

Interactions involve the exchange of activities, which are handled in turns.

Activities

Every interaction between the user (or a channel) and the agent is represented as an activity. The Bot Framework Activity schema defines the activities that can be exchanged between a user or channel and an agent. Activities can represent human text or speech, app-to-app notifications, reactions to other messages, and so on.

Turns

The Agents SDK uses a turn-based conversational model. The SDK uses turns to operate a pipeline of operations from the initial client—where a user is asking a question—through the journey to an adapter, middleware (potentially) and then to the agent logic itself.

In a conversation, people often speak one-at-a-time, taking turns speaking. An agent typically responds to user input. Within the Agents SDK, a turn consists of the user's incoming activity to the agent and any activity the agent sends back to the user as an immediate response. Think of a turn as the processing needed when an agent receives an activity and goes through the pipeline for that type of activity, from start to finish.

For example, a user might ask an agent to perform a certain task. The agent might respond with a question to get more information about the task, at which point the turn ends. On the next turn, the agent receives a new message from the user that might contain the answer to the agent's question. Or the user might change the subject or ask you to ignore the initial request. Because of the complicated range of possibilities, you need to manage multi-turn conversations and store context and history of the conversation. More often than not, a conversation goes beyond a single turn of question and answer.

Agent application structure

The SDK defines a Bot class that handles the conversational reasoning for the agent app. The bot class:

Recognizes and interprets the user's input.
Reasons about the input and performs relevant tasks.
Generates responses about what the agent does.

The SDK also defines an Adapter class that handles connectivity with the channels. The adapter:

Provides a method for handling requests from and methods for generating requests to the user's channel.
Includes a middleware pipeline, which includes turn processing outside of your agent's turn handler.
Calls the agent's turn handler and catches errors not otherwise handled in the turn handler.

In addition, agents often need to retrieve and store state each turn. State is handled through storage, bot state, and property accessor classes. The SDK doesn't provide built-in storage, but does provide abstractions for storage and a few implementations of a storage layer.

The SDK doesn't require you to use a specific application layer to send and receive web requests. When you create an agent using the SDK, you provide the code to receive the HTTP traffic and forward it to the adapter. The SDK provides a few templates and samples that you can use to develop your own agents.

Note

The Agents SDK currently supports C# only. Node.js and Python support are coming soon. For new agent building, consider using Microsoft Copilot Studio and read more about choosing the right solution.

Agent logic

The bot object contains the conversational reasoning or logic for a turn and exposes a turn handler. A turn handler is a method that can accept incoming activities from the bot adapter.

The SDK provides a couple of different paradigms to manage your agent logic.

Activity handlers

Activity handlers provide an event-driven model in which the incoming activity types and subtypes are the events. Consider an activity handler for agents that have brief interactions with the user.

Use an activity handler and implement handlers for each activity type or subtype your agent needs to recognize and react to.
Use a Teams activity handler to create agents that can connect to the Teams channel (The Teams channel requires the agent to handle some channel-specific behavior).

The bot adapter

The adapter has a process activity method for starting a turn. The bot adapter:

Takes the request body (the request payload, translated to an activity) and the request header as arguments.
Checks whether the authentication header is valid.
Creates a context object for the turn. The context object includes information about the activity.
Sends the context object through its middleware pipeline.
Sends the context object to the bot object's turn handler.

The adapter also:

Formats and sends response activities. Responses are usually messages for the user, but can also include information for the user's channel.
Surfaces other methods provided by the Bot Connector REST API, such as update message and delete message.
Catches errors or exceptions not otherwise caught for the turn.

The turn context

The turn context object provides information about the activity such as the sender and receiver, the channel, and other data needed to process the activity. It also allows for the addition of information during the turn across various layers of the bot.

The turn context is one of the most important abstractions in the SDK. Not only does it carry the inbound activity to all the middleware components and the application logic but it also provides the mechanism whereby the middleware components and the agent logic can send outbound activities.

Middleware

Middleware is much like any other messaging middleware, comprising a linear set of components that are each executed in order, giving each a chance to operate on the activity. The final stage of the middleware pipeline is a callback to the bot class turn handler registered with the adapter's process activity method. Middleware implements an on turn method which the adapter calls.

The turn handler takes a turn context as its argument. Typically the application logic inside the turn handler function processes the inbound activity's content and generates one or more activities in response. The turn handler then sends these outbound activities using the send activity function on the turn context. Calling send activity on the turn context causes the middleware components to be invoked on the outbound activities. Middleware components execute before and after the agent's turn handler function. The execution is inherently nested and, as such, sometimes referred to being like an onion.

Agent state and storage

As with other web apps, an agent is inherently stateless. State within an agent follows the same paradigms as modern web applications, and the Agents SDK provides a storage layer and state management abstractions to make state management easier.

Messaging endpoint and provisioning

Typically, your application needs a REST endpoint at which to receive messages. The application also needs to create resources for your agent in accordance with the platform you decide to use.

HTTP Details

Activities arrive at the agent from the Bot Service via an HTTP POST request. The agent responds to the inbound POST request with a 200 HTTP status code. The agent sends activities to the channel on a separate HTTP POST to the Bot Service. The channel, in turn, acknowledges the activity received with a 200 HTTP status code.

The protocol doesn't specify the order in which these POST requests and their acknowledgments are made. However, to fit with common HTTP service frameworks, typically these requests are nested, meaning that the outbound HTTP request is made from the agent within the scope of the inbound HTTP request. Since there are two distinct HTTP connections back to back, the security model must provide for both.

Note

The agent has 15 seconds to acknowledge the call with a status 200 on most channels. If the agent doesn't respond within 15 seconds, an HTTP GatewayTimeout error (504) occurs.

The activity processing stack

Let’s follow the journey of the arrival of a message activity.

The channel sends the user's message to the Azure AI Bot Service, and the service forwards the message to the agent's messaging endpoint. The agent sends a response to the user within the scope of the turn.

In the example above, the agent replied to the message activity with another message activity containing the same text message. Processing starts with the HTTP POST request, with the activity information carried as a JSON payload, arriving at the web server.

The adapter, an integrated component of the SDK, is the core of the SDK runtime. The activity is carried as JSON in the HTTP POST body. This JSON is deserialized to create the activity object that is then handed to the adapter through its process activity method. On receiving the activity, the adapter creates a turn context and calls the middleware.

As mentioned previously, the turn context provides the mechanism for the agent to send outbound activities, most often in response to an inbound activity. The turn context provides send, update, and delete activity response methods. Each response method runs in an asynchronous process.

Important

The thread handling the primary agent turn deals with disposing of the context object when it's done. Be sure to await any activity calls so the primary thread will wait on the generated activity before finishing its processing and disposing of the turn context. Otherwise, if a response (including its handlers) takes any significant amount of time and tries to act on the context object, it may get a context was disposed error.

Bot templates

You need to choose the application layer use for your app. However, the SDK has templates and samples for ASP.NET (C#). The documentation is written assuming you use one of these platforms, but the SDK doesn't require it of you.

An agent is a web application, and templates are provided for each language version of the SDK. All templates provide a default endpoint implementation and adapter.

Additional information

Managing agent resources

You'll need to manage the resources for your agent, such as its app ID and password, and also information for any connected services. When you deploy your agent, it will need secure access to this information. To avoid complexity, most of the Bot Framework SDK articles don't describe how to manage this information.

To manage keys and secrets in Azure, see About Azure Key Vault.

Поделиться через