Chat completion

With chat completion, you can simulate a back-and-forth conversation with an AI agent. This is of course useful for creating chat bots, but it can also be used for creating autonomous agents that can complete business processes, generate code, and more. As the primary model type provided by OpenAI, Google, Mistral, Facebook, and others, chat completion is the most common AI service that you will add to your Semantic Kernel project.

When picking out a chat completion model, you will need to consider the following:

  • What modalities does the model support (e.g., text, image, audio, etc.)?
  • Does it support function calling?
  • How fast does it receive and generate tokens?
  • How much does each token cost?

Important

Of all the above questions, the most important is whether the model supports function calling. If it does not, you will not be able to use the model to call your existing code. Most of the latest models from OpenAI, Google, Mistral, and Amazon all support function calling. Support from small language models, however, is still limited.

Setting up your local environment

Some of the AI Services can be hosted locally and may require some setup. Below are instructions for those that support this.

No local setup.

Installing the necessary packages

Before adding chat completion to your kernel, you will need to install the necessary packages. Below are the packages you will need to install for each AI service provider.

dotnet add package Microsoft.SemanticKernel.Connectors.AzureOpenAI

Creating chat completion services

Now that you've installed the necessary packages, you can create chat completion services. Below are the several ways you can create chat completion services using Semantic Kernel.

Adding directly to the kernel

To add a chat completion service, you can use the following code to add it to the kernel's inner service provider.

using Microsoft.SemanticKernel;

IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddAzureOpenAIChatCompletion(
    deploymentName: "NAME_OF_YOUR_DEPLOYMENT",
    apiKey: "YOUR_API_KEY",
    endpoint: "YOUR_AZURE_ENDPOINT",
    modelId: "gpt-4", // Optional name of the underlying model if the deployment name doesn't match the model name
    serviceId: "YOUR_SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
    httpClient: new HttpClient() // Optional; if not provided, the HttpClient from the kernel will be used
);
Kernel kernel = kernelBuilder.Build();

Using dependency injection

If you're using dependency injection, you'll likely want to add your AI services directly to the service provider. This is helpful if you want to create singletons of your AI services and reuse them in transient kernels.

using Microsoft.SemanticKernel;

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddAzureOpenAIChatCompletion(
    deploymentName: "NAME_OF_YOUR_DEPLOYMENT",
    apiKey: "YOUR_API_KEY",
    endpoint: "YOUR_AZURE_ENDPOINT",
    modelId: "gpt-4", // Optional name of the underlying model if the deployment name doesn't match the model name
    serviceId: "YOUR_SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);

builder.Services.AddTransient((serviceProvider)=> {
    return new Kernel(serviceProvider);
});

Creating standalone instances

Lastly, you can create instances of the service directly so that you can either add them to a kernel later or use them directly in your code without ever injecting them into the kernel or in a service provider.

using Microsoft.SemanticKernel.Connectors.AzureOpenAI;

AzureOpenAIChatCompletionService chatCompletionService = new (
    deploymentName: "NAME_OF_YOUR_DEPLOYMENT",
    apiKey: "YOUR_API_KEY",
    endpoint: "YOUR_AZURE_ENDPOINT",
    modelId: "gpt-4", // Optional name of the underlying model if the deployment name doesn't match the model name
    httpClient: new HttpClient() // Optional; if not provided, the HttpClient from the kernel will be used
);

To create a chat completion service, you need to import the necessary modules and create an instance of the service. Below are the steps to create a chat completion service for each AI service provider.

Tip

There are three methods to supply the required information to AI services. You may either provide the information directly through the constructor, set the necessary environment variables, or create a .env file within your project directory containing the environment variables. You can visit this page to find all the required environment variables for each AI service provider: https://github.com/microsoft/semantic-kernel/blob/main/python/samples/concepts/setup/ALL_SETTINGS.md

from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

chat_completion_service = AzureChatCompletion(
    deployment_name="my-deployment",  
    api_key="my-api-key",
    endpoint="my-api-endpoint", # Used to point to your service
    service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)

# You can do the following if you have set the necessary environment variables or created a .env file
chat_completion_service = AzureChatCompletion(service_id="my-service-id")

Note

The AzureChatCompletion service also supports Microsoft Entra authentication. If you don't provide an API key, the service will attempt to authenticate using the Entra token.

You can start using the completion service right away or add the chat completion service to a kernel. You can use the following code to add a service to the kernel.

from semantic_kernel import Kernel

# Initialize the kernel
kernel = Kernel()

# Add the chat completion service created above to the kernel
kernel.add_service(chat_completion_service)

You can create instances of the chat completion service directly and either add them to a kernel or use them directly in your code without injecting them into the kernel. The following code shows how to create a a chat completion service and add it to the kernel.

import com.azure.ai.openai.OpenAIAsyncClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.microsoft.semantickernel.Kernel;
import com.microsoft.semantickernel.services.chatcompletion.ChatCompletionService;

// Create the client
OpenAIAsyncClient client = new OpenAIClientBuilder()
    .credential(azureOpenAIClientCredentials)
    .endpoint(azureOpenAIClientEndpoint)
    .buildAsyncClient();

// Create the chat completion service
ChatCompletionService openAIChatCompletion = OpenAIChatCompletion.builder()
    .withOpenAIAsyncClient(client)
    .withModelId(modelId)
    .build();

// Initialize the kernel
Kernel kernel = Kernel.builder()
    .withAIService(ChatCompletionService.class, openAIChatCompletion)
    .build();

Retrieving chat completion services

Once you've added chat completion services to your kernel, you can retrieve them using the get service method. Below is an example of how you can retrieve a chat completion service from the kernel.

var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
from semantic_kernel.connectors.ai.chat_completion_client_base import ChatCompletionClientBase

# Retrieve the chat completion service by type
chat_completion_service = kernel.get_service(type=ChatCompletionClientBase)

# Retrieve the chat completion service by id
chat_completion_service = kernel.get_service(service_id="my-service-id")
ChatCompletionService chatCompletionService = kernel.getService(ChatCompletionService.class);

Using chat completion services

Now that you have a chat completion service, you can use it to generate responses from an AI agent. There are two main ways to use a chat completion service:

  • Non-streaming: You wait for the service to generate an entire response before returning it to the user.
  • Streaming: Individual chunks of the response are generated and returned to the user as they are created.

Below are the two ways you can use a chat completion service to generate responses.

Non-streaming chat completion

To use non-streaming chat completion, you can use the following code to generate a response from the AI agent.

ChatHistory history = [];
history.AddUserMessage("Hello, how are you?");

var response = await chatCompletionService.GetChatMessageContentAsync(
    history,
    kernel: kernel
);
chat_history = ChatHistory()
chat_history.add_user_message("Hello, how are you?")

response = await chat_completion.get_chat_message_content(chat_history=history)
ChatHistory history = new ChatHistory();
history.addUserMessage("Hello, how are you?");

InvocationContext optionalInvocationContext = null;

List<ChatMessageContent<?>> response = chatCompletionService.getChatMessageContentsAsync(
    history,
    kernel,
    optionalInvocationContext
);

Streaming chat completion

To use streaming chat completion, you can use the following code to generate a response from the AI agent.

ChatHistory history = [];
history.AddUserMessage("Hello, how are you?");

var response = chatCompletionService.GetStreamingChatMessageContentsAsync(
    chatHistory: history,
    kernel: kernel
);

await foreach (var chunk in response)
{
    Console.Write(chunk);
}
chat_history = ChatHistory()
chat_history.add_user_message("Hello, how are you?")

response = chat_completion.get_streaming_chat_message_content(chat_history=history)

async for chunk in response:
    print(chunk, end="")

Note

Semantic Kernel for Java does not support the streaming response model.

Next steps

Now that you've added chat completion services to your Semantic Kernel project, you can start creating conversations with your AI agent. To learn more about using a chat completion service, check out the following articles: