Artificial intelligence in .NET (Preview)

Article
02/22/2025

With a growing variety of artificial intelligence (AI) services available, developers need a way to integrate and interact with these services in their .NET applications. The Microsoft.Extensions.AI libraries provide a unified approach for representing generative AI components, which enables seamless integration and interoperability with various AI services. This article introduces the libraries and provides installation instructions and usage examples to help you get started.

The 📦 Microsoft.Extensions.AI.Abstractions package provides the core exchange types: IChatClient and IEmbeddingGenerator<TInput,TEmbedding>. Any .NET library that provides an AI client can implement the IChatClient interface to enable seamless integration with consuming code.

The 📦 Microsoft.Extensions.AI package has an implicit dependency on the Microsoft.Extensions.AI.Abstractions package. This package enables you to easily integrate components such as telemetry and caching into your applications using familiar dependency injection and middleware patterns. For example, it provides the UseOpenTelemetry(ChatClientBuilder, ILoggerFactory, String, Action<OpenTelemetryChatClient>) extension method, which adds OpenTelemetry support to the chat client pipeline.

Install the package

To install the 📦 Microsoft.Extensions.AI and 📦 Microsoft.Extensions.AI.Abstractions NuGet packages, use the .NET CLI or add package references directly to your C# project file:

.NET CLI
PackageReference

dotnet add package Microsoft.Extensions.AI --prerelease

<PackageReference Include="Microsoft.Extensions.AI"
                  Version="*" />

For more information, see dotnet add package or Manage package dependencies in .NET applications.

The `IChatClient` interface

The IChatClient interface defines a client abstraction responsible for interacting with AI services that provide chat capabilities. It includes methods for sending and receiving messages with multi-modal content (such as text, images, and audio), either as a complete set or streamed incrementally. Additionally, it provides metadata information about the client and allows retrieving strongly typed services.

Important

For more usage examples and real-world scenarios, see AI for .NET developers.

The following sample implements IChatClient to show the general structure.

using System.Runtime.CompilerServices;
using Microsoft.Extensions.AI;

public sealed class SampleChatClient(Uri endpoint, string modelId) : IChatClient
{
    public ChatClientMetadata Metadata { get; } = new(nameof(SampleChatClient), endpoint, modelId);

    public async Task<ChatResponse> GetResponseAsync(
        IList<ChatMessage> chatMessages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        // Simulate some operation.
        await Task.Delay(300, cancellationToken);

        // Return a sample chat completion response randomly.
        string[] responses =
        [
            "This is the first sample response.",
            "Here is another example of a response message.",
            "This is yet another response message."
        ];

        return new([new ChatMessage()
        {
            Role = ChatRole.Assistant,
            Text = responses[Random.Shared.Next(responses.Length)],
        }]);
    }

    public async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IList<ChatMessage> chatMessages,
        ChatOptions? options = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        // Simulate streaming by yielding messages one by one.
        string[] words = ["This ", "is ", "the ", "response ", "for ", "the ", "request."];
        foreach (string word in words)
        {
            // Simulate some operation.
            await Task.Delay(100, cancellationToken);

            // Yield the next message in the response.
            yield return new ChatResponseUpdate
            {
                Role = ChatRole.Assistant,
                Text = word,
            };
        }
    }

    public object? GetService(Type serviceType, object? serviceKey) => this;

    public TService? GetService<TService>(object? key = null)
        where TService : class => this as TService;

    void IDisposable.Dispose() { }
}

You can find other concrete implementations of IChatClient in the following NuGet packages:

📦 Microsoft.Extensions.AI.AzureAIInference: Implementation backed by Azure AI Model Inference API.
📦 Microsoft.Extensions.AI.Ollama: Implementation backed by Ollama.
📦 Microsoft.Extensions.AI.OpenAI: Implementation backed by either OpenAI or OpenAI-compatible endpoints (such as Azure OpenAI).

The following subsections show specific IChatClient usage examples:

Request chat completion
Request chat completion with streaming
Tool calling
Cache responses
Use telemetry
Provide options
Functionality pipelines
Custom IChatClient middleware
Dependency injection

Request chat completion

To request a completion, call the IChatClient.GetResponseAsync method. The request is composed of one or more messages, each of which is composed of one or more pieces of content. Accelerator methods exist to simplify common cases, such as constructing a request for a single piece of text content.

using Microsoft.Extensions.AI;

IChatClient client = new SampleChatClient(
    new Uri("http://coolsite.ai"), "target-ai-model");

var response = await client.GetResponseAsync("What is AI?");

Console.WriteLine(response.Message);

The core IChatClient.GetResponseAsync method accepts a list of messages. This list represents the history of all messages that are part of the conversation.

using Microsoft.Extensions.AI;

IChatClient client = new SampleChatClient(
    new Uri("http://coolsite.ai"), "target-ai-model");

Console.WriteLine(await client.GetResponseAsync(
[
    new(ChatRole.System, "You are a helpful AI assistant"),
    new(ChatRole.User, "What is AI?"),
]));

Each message in the history is represented by a ChatMessage object. The ChatMessage class provides a ChatMessage.Role property that indicates the role of the message. By default, the ChatRole.User is used. The following roles are available:

ChatRole.Assistant: Instructs or sets the behavior of the assistant.
ChatRole.System: Provides responses to system-instructed, user-prompted input.
ChatRole.Tool: Provides additional information and references for chat completions.
ChatRole.User: Provides input for chat completions.

Each chat message is instantiated, assigning to its Contents property a new TextContent. There are various types of content that can be represented, such as a simple string or a more complex object that represents a multi-modal message with text, images, and audio:

Request chat completion with streaming

The inputs to IChatClient.GetStreamingResponseAsync are identical to those of GetResponseAsync. However, rather than returning the complete response as part of a ChatResponse object, the method returns an IAsyncEnumerable<T> where T is ChatResponseUpdate, providing a stream of updates that collectively form the single response.

using Microsoft.Extensions.AI;

IChatClient client = new SampleChatClient(
    new Uri("http://coolsite.ai"), "target-ai-model");

await foreach (var update in client.GetStreamingResponseAsync("What is AI?"))
{
    Console.Write(update);
}

Tip

Streaming APIs are nearly synonymous with AI user experiences. C# enables compelling scenarios with its IAsyncEnumerable<T> support, allowing for a natural and efficient way to stream data.

Tool calling

Some models and services support tool calling, where requests can include tools for the model to invoke functions to gather additional information. Instead of sending a final response, the model requests a function invocation with specific arguments. The client then invokes the function and sends the results back to the model along with the conversation history. The Microsoft.Extensions.AI library includes abstractions for various message content types, including function call requests and results. While consumers can interact with this content directly, Microsoft.Extensions.AI automates these interactions and provides:

AIFunction: Represents a function that can be described to an AI service and invoked.
AIFunctionFactory: Provides factory methods for creating commonly used implementations of AIFunction.
FunctionInvokingChatClient: Wraps an IChatClient to add automatic function invocation capabilities.

Consider the following example that demonstrates a random function invocation:

using System.ComponentModel;
using Microsoft.Extensions.AI;

[Description("Gets the current weather")]
string GetCurrentWeather() => Random.Shared.NextDouble() > 0.5
    ? "It's sunny"
    : "It's raining";

IChatClient client = new ChatClientBuilder(
        new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
    .UseFunctionInvocation()
    .Build();

var response = client.GetStreamingResponseAsync(
    "Should I wear a rain coat?",
    new() { Tools = [AIFunctionFactory.Create(GetCurrentWeather)] });

await foreach (var update in response)
{
    Console.Write(update);
}

The preceding example depends on the 📦 Microsoft.Extensions.AI.Ollama NuGet package.

The preceding code:

Defines a function named GetCurrentWeather that returns a random weather forecast.
- This function is decorated with a DescriptionAttribute, which is used to provide a description of the function to the AI service.
Instantiates a ChatClientBuilder with an OllamaChatClient and configures it to use function invocation.
Calls GetStreamingResponseAsync on the client, passing a prompt and a list of tools that includes a function created with Create.
Iterates over the response, printing each update to the console.

Cache responses

If you're familiar with Caching in .NET, it's good to know that Microsoft.Extensions.AI provides other such delegating IChatClient implementations. The DistributedCachingChatClient is an IChatClient that layers caching around another arbitrary IChatClient instance. When a unique chat history is submitted to the DistributedCachingChatClient, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same prompt is submitted, such that a cached response can be found in the cache, the DistributedCachingChatClient returns the cached response rather than needing to forward the request along the pipeline.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;

var sampleChatClient = new SampleChatClient(
    new Uri("http://coolsite.ai"), "target-ai-model");

IChatClient client = new ChatClientBuilder(sampleChatClient)
    .UseDistributedCache(new MemoryDistributedCache(
        Options.Create(new MemoryDistributedCacheOptions())))
    .Build();

string[] prompts = ["What is AI?", "What is .NET?", "What is AI?"];

foreach (var prompt in prompts)
{
    await foreach (var update in client.GetStreamingResponseAsync(prompt))
    {
        Console.Write(update);
    }

    Console.WriteLine();
}

The preceding example depends on the 📦 Microsoft.Extensions.Caching.Memory NuGet package. For more information, see Caching in .NET.

Use telemetry

Another example of a delegating chat client is the OpenTelemetryChatClient. This implementation adheres to the OpenTelemetry Semantic Conventions for Generative AI systems. Similar to other IChatClient delegators, it layers metrics and spans around any underlying IChatClient implementation, providing enhanced observability.

using Microsoft.Extensions.AI;
using OpenTelemetry.Trace;

// Configure OpenTelemetry exporter
var sourceName = Guid.NewGuid().ToString();
var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();

var sampleChatClient = new SampleChatClient(
    new Uri("http://coolsite.ai"), "target-ai-model");

IChatClient client = new ChatClientBuilder(sampleChatClient)
    .UseOpenTelemetry(
        sourceName: sourceName,
        configure: static c => c.EnableSensitiveData = true)
    .Build();

Console.WriteLine((await client.GetResponseAsync("What is AI?")).Message);

The preceding example depends on the 📦 OpenTelemetry.Exporter.Console NuGet package.

Provide options

Every call to GetResponseAsync or GetStreamingResponseAsync can optionally supply a ChatOptions instance containing additional parameters for the operation. The most common parameters among AI models and services show up as strongly typed properties on the type, such as ChatOptions.Temperature. Other parameters can be supplied by name in a weakly typed manner via the ChatOptions.AdditionalProperties dictionary.

You can also specify options when building an IChatClient with the fluent ChatClientBuilder API and chaining a call to the ConfigureOptions extension method. This delegating client wraps another client and invokes the supplied delegate to populate a ChatOptions instance for every call. For example, to ensure that the ChatOptions.ModelId property defaults to a particular model name, you can use code like the following:

using Microsoft.Extensions.AI;

IChatClient client = new ChatClientBuilder(
        new OllamaChatClient(new Uri("http://localhost:11434")))
    .ConfigureOptions(options => options.ModelId ??= "phi3")
    .Build();

// will request "phi3"
Console.WriteLine(await client.GetResponseAsync("What is AI?"));

// will request "llama3.1"
Console.WriteLine(await client.GetResponseAsync(
    "What is AI?", new() { ModelId = "llama3.1" }));

The preceding example depends on the 📦 Microsoft.Extensions.AI.Ollama NuGet package.

Functionality pipelines

IChatClient instances can be layered to create a pipeline of components, each adding specific functionality. These components can come from Microsoft.Extensions.AI, other NuGet packages, or custom implementations. This approach allows you to augment the behavior of the IChatClient in various ways to meet your specific needs. Consider the following example code that layers a distributed cache, function invocation, and OpenTelemetry tracing around a sample chat client:

using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OpenTelemetry.Trace;

// Configure OpenTelemetry exporter
var sourceName = Guid.NewGuid().ToString();
var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();

// Explore changing the order of the intermediate "Use" calls to see that impact
// that has on what gets cached, traced, etc.
IChatClient client = new ChatClientBuilder(
        new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
    .UseDistributedCache(new MemoryDistributedCache(
        Options.Create(new MemoryDistributedCacheOptions())))
    .UseFunctionInvocation()
    .UseOpenTelemetry(
        sourceName: sourceName,
        configure: static c => c.EnableSensitiveData = true)
    .Build();

ChatOptions options = new()
{
    Tools =
    [
        AIFunctionFactory.Create(
            () => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining",
            name: "GetCurrentWeather",
            description: "Gets the current weather")
    ]
};

for (int i = 0; i < 3; ++i)
{
    List<ChatMessage> history =
    [
        new ChatMessage(ChatRole.System, "You are a helpful AI assistant"),
        new ChatMessage(ChatRole.User, "Do I need an umbrella?")
    ];

    Console.WriteLine(await client.GetResponseAsync(history, options));
}

The preceding example depends on the following NuGet packages:

Custom `IChatClient` middleware

To add additional functionality, you can implement IChatClient directly or use the DelegatingChatClient class. This class serves as a base for creating chat clients that delegate operations to another IChatClient instance. It simplifies chaining multiple clients, allowing calls to pass through to an underlying client.

The DelegatingChatClient class provides default implementations for methods like GetResponseAsync, GetStreamingResponseAsync, and Dispose, which forward calls to the inner client. You can derive from this class and override only the methods you need to enhance behavior, while delegating other calls to the base implementation. This approach helps create flexible and modular chat clients that are easy to extend and compose.

The following is an example class derived from DelegatingChatClient to provide rate limiting functionality, utilizing the RateLimiter:

using Microsoft.Extensions.AI;
using System.Runtime.CompilerServices;
using System.Threading.RateLimiting;

public sealed class RateLimitingChatClient(
    IChatClient innerClient, RateLimiter rateLimiter)
        : DelegatingChatClient(innerClient)
{
    public override async Task<ChatResponse> GetResponseAsync(
        IList<ChatMessage> chatMessages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);

        if (!lease.IsAcquired)
        {
            throw new InvalidOperationException("Unable to acquire lease.");
        }

        return await base.GetResponseAsync(chatMessages, options, cancellationToken)
            .ConfigureAwait(false);
    }

    public override async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IList<ChatMessage> chatMessages,
        ChatOptions? options = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);

        if (!lease.IsAcquired)
        {
            throw new InvalidOperationException("Unable to acquire lease.");
        }

        await foreach (var update in base.GetStreamingResponseAsync(chatMessages, options, cancellationToken)
            .ConfigureAwait(false))
        {
            yield return update;
        }
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            rateLimiter.Dispose();
        }

        base.Dispose(disposing);
    }
}

The preceding example depends on the 📦 System.Threading.RateLimiting NuGet package. Composition of the RateLimitingChatClient with another client is straightforward:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

var client = new RateLimitingChatClient(
    new SampleChatClient(new Uri("http://localhost"), "test"),
    new ConcurrencyLimiter(new()
    {
        PermitLimit = 1,
        QueueLimit = int.MaxValue
    }));

await client.GetResponseAsync("What color is the sky?");

To simplify the composition of such components with others, component authors should create a Use* extension method for registering the component into a pipeline. For example, consider the following extension method:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

public static class RateLimitingChatClientExtensions
{
    public static ChatClientBuilder UseRateLimiting(
        this ChatClientBuilder builder, RateLimiter rateLimiter) =>
        builder.Use(innerClient => new RateLimitingChatClient(innerClient, rateLimiter));
}

Such extensions can also query for relevant services from the DI container; the IServiceProvider used by the pipeline is passed in as an optional parameter:

using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using System.Threading.RateLimiting;

public static class RateLimitingChatClientExtensions
{
    public static ChatClientBuilder UseRateLimiting(
        this ChatClientBuilder builder, RateLimiter? rateLimiter = null) =>
        builder.Use((innerClient, services) =>
            new RateLimitingChatClient(
                innerClient,
                rateLimiter ?? services.GetRequiredService<RateLimiter>()));
}

The consumer can then easily use this in their pipeline, for example:

using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddChatClient(services =>
    new SampleChatClient(new Uri("http://localhost"), "test")
        .AsBuilder()
        .UseDistributedCache()
        .UseRateLimiting()
        .UseOpenTelemetry()
        .Build(services));

using var app = builder.Build();

// Elsewhere in the app
var chatClient = app.Services.GetRequiredService<IChatClient>();

Console.WriteLine(await chatClient.GetResponseAsync("What is AI?"));

app.Run();

This example demonstrates hosted scenario, where the consumer relies on dependency injection to provide the RateLimiter instance. The preceding extension methods demonstrate using a Use method on ChatClientBuilder. The ChatClientBuilder also provides Use overloads that make it easier to write such delegating handlers.

For example, in the earlier RateLimitingChatClient example, the overrides of GetResponseAsync and GetStreamingResponseAsync only need to do work before and after delegating to the next client in the pipeline. To achieve the same thing without writing a custom class, you can use an overload of Use that accepts a delegate that's used for both GetResponseAsync and GetStreamingResponseAsync, reducing the boilerplate required:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

RateLimiter rateLimiter = new ConcurrencyLimiter(new()
{
    PermitLimit = 1,
    QueueLimit = int.MaxValue
});

var client = new SampleChatClient(new Uri("http://localhost"), "test")
    .AsBuilder()
    .UseDistributedCache()
    .Use(async (chatMessages, options, nextAsync, cancellationToken) =>
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);

        if (!lease.IsAcquired)
        {
            throw new InvalidOperationException("Unable to acquire lease.");
        }

        await nextAsync(chatMessages, options, cancellationToken);
    })
    .UseOpenTelemetry()
    .Build();

// Use client

The preceding overload internally uses an AnonymousDelegatingChatClient, which enables more complicated patterns with only a little additional code. For example, to achieve the same result but with the RateLimiter retrieved from DI:

using System.Threading.RateLimiting;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;

var client = new SampleChatClient(new Uri("http://localhost"), "test")
    .AsBuilder()
    .UseDistributedCache()
    .Use(static (innerClient, services) =>
    {
        var rateLimiter = services.GetRequiredService<RateLimiter>();

        return new AnonymousDelegatingChatClient(
            innerClient, async (chatMessages, options, nextAsync, cancellationToken) =>
            {
                using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
                    .ConfigureAwait(false);

                if (!lease.IsAcquired)
                {
                    throw new InvalidOperationException("Unable to acquire lease.");
                }

                await nextAsync(chatMessages, options, cancellationToken);
            });
    })
    .UseOpenTelemetry()
    .Build();

For scenarios where the developer would like to specify delegating implementations of GetResponseAsync and GetStreamingResponseAsync inline, and where it's important to be able to write a different implementation for each in order to handle their unique return types specially, another overload of Use exists that accepts a delegate for each.

Dependency injection

IChatClient implementations will typically be provided to an application via dependency injection (DI). In this example, an IDistributedCache is added into the DI container, as is an IChatClient. The registration for the IChatClient employs a builder that creates a pipeline containing a caching client (which will then use an IDistributedCache retrieved from DI) and the sample client. The injected IChatClient can be retrieved and used elsewhere in the app.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

// App setup
var builder = Host.CreateApplicationBuilder();

builder.Services.AddDistributedMemoryCache();
builder.Services.AddChatClient(new SampleChatClient(
        new Uri("http://coolsite.ai"), "target-ai-model"))
    .UseDistributedCache();

using var app = builder.Build();

// Elsewhere in the app
var chatClient = app.Services.GetRequiredService<IChatClient>();

Console.WriteLine(await chatClient.GetResponseAsync("What is AI?"));

app.Run();

The preceding example depends on the following NuGet packages:

What instance and configuration is injected can differ based on the current needs of the application, and multiple pipelines can be injected with different keys.

The `IEmbeddingGenerator` interface

The IEmbeddingGenerator<TInput,TEmbedding> interface represents a generic generator of embeddings. Here, TInput is the type of input values being embedded, and TEmbedding is the type of generated embedding, which inherits from the Embedding class.

The Embedding class serves as a base class for embeddings generated by an IEmbeddingGenerator. It's designed to store and manage the metadata and data associated with embeddings. Derived types like Embedding<T> provide the concrete embedding vector data. For instance, an embedding exposes a Embedding<T>.Vector property to access its embedding data.

The IEmbeddingGenerator interface defines a method to asynchronously generate embeddings for a collection of input values, with optional configuration and cancellation support. It also provides metadata describing the generator and allows for the retrieval of strongly typed services that can be provided by the generator or its underlying services.

The following sample implementation of IEmbeddingGenerator shows the general structure (however, it just generates random embedding vectors).

using Microsoft.Extensions.AI;

public sealed class SampleEmbeddingGenerator(
    Uri endpoint, string modelId)
        : IEmbeddingGenerator<string, Embedding<float>>
{
    public EmbeddingGeneratorMetadata Metadata { get; } =
        new(nameof(SampleEmbeddingGenerator), endpoint, modelId);

    public async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
        IEnumerable<string> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        // Simulate some async operation
        await Task.Delay(100, cancellationToken);

        // Create random embeddings
        return
        [
            .. from value in values
            select new Embedding<float>(
                Enumerable.Range(0, 384)
                          .Select(_ => Random.Shared.NextSingle())
                          .ToArray())
        ];
    }

    public object? GetService(Type serviceType, object? serviceKey) => this;

    public TService? GetService<TService>(object? key = null)
        where TService : class => this as TService;

    void IDisposable.Dispose() { }
}

The preceding code:

Defines a class named SampleEmbeddingGenerator that implements the IEmbeddingGenerator<string, Embedding<float>> interface.
Has a primary constructor that accepts an endpoint and model ID, which are used to identify the generator.
Exposes a Metadata property that provides metadata about the generator.
Implements the GenerateAsync method to generate embeddings for a collection of input values:
- Simulates an asynchronous operation by delaying for 100 milliseconds.
- Returns random embeddings for each input value.

You can find actual concrete implementations in the following packages:

The following sections show specific IEmbeddingGenerator usage examples:

Create embeddings
Custom IEmbeddingGenerator middleware

Create embeddings

The primary operation performed with an IEmbeddingGenerator<TInput,TEmbedding> is embedding generation, which is accomplished with its GenerateAsync method.

using Microsoft.Extensions.AI;

IEmbeddingGenerator<string, Embedding<float>> generator =
    new SampleEmbeddingGenerator(
        new Uri("http://coolsite.ai"), "target-ai-model");

foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

Custom `IEmbeddingGenerator` middleware

As with IChatClient, IEmbeddingGenerator implementations can be layered. Just as Microsoft.Extensions.AI provides delegating implementations of IChatClient for caching and telemetry, it provides an implementation for IEmbeddingGenerator as well.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OpenTelemetry.Trace;

// Configure OpenTelemetry exporter
var sourceName = Guid.NewGuid().ToString();
var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();

// Explore changing the order of the intermediate "Use" calls to see that impact
// that has on what gets cached, traced, etc.
var generator = new EmbeddingGeneratorBuilder<string, Embedding<float>>(
        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "target-ai-model"))
    .UseDistributedCache(
        new MemoryDistributedCache(
            Options.Create(new MemoryDistributedCacheOptions())))
    .UseOpenTelemetry(sourceName: sourceName)
    .Build();

var embeddings = await generator.GenerateAsync(
[
    "What is AI?",
    "What is .NET?",
    "What is AI?"
]);

foreach (var embedding in embeddings)
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

The IEmbeddingGenerator enables building custom middleware that extends the functionality of an IEmbeddingGenerator. The DelegatingEmbeddingGenerator<TInput,TEmbedding> class is an implementation of the IEmbeddingGenerator<TInput, TEmbedding> interface that serves as a base class for creating embedding generators that delegate their operations to another IEmbeddingGenerator<TInput, TEmbedding> instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as GenerateAsync and Dispose, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.

The following is an example implementation of such a delegating embedding generator that rate limits embedding generation requests:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

public class RateLimitingEmbeddingGenerator(
    IEmbeddingGenerator<string, Embedding<float>> innerGenerator, RateLimiter rateLimiter)
        : DelegatingEmbeddingGenerator<string, Embedding<float>>(innerGenerator)
{
    public override async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
        IEnumerable<string> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);

        if (!lease.IsAcquired)
        {
            throw new InvalidOperationException("Unable to acquire lease.");
        }

        return await base.GenerateAsync(values, options, cancellationToken);
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            rateLimiter.Dispose();
        }

        base.Dispose(disposing);
    }
}

This can then be layered around an arbitrary IEmbeddingGenerator<string, Embedding<float>> to rate limit all embedding generation operations performed.

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

IEmbeddingGenerator<string, Embedding<float>> generator =
    new RateLimitingEmbeddingGenerator(
        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "target-ai-model"),
        new ConcurrencyLimiter(new()
        {
            PermitLimit = 1,
            QueueLimit = int.MaxValue
        }));

foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

In this way, the RateLimitingEmbeddingGenerator can be composed with other IEmbeddingGenerator<string, Embedding<float>> instances to provide rate limiting functionality.

Share via

Artificial intelligence in .NET (Preview)

Install the package

The `IChatClient` interface

Request chat completion

Request chat completion with streaming

Tool calling

Cache responses

Use telemetry

Provide options

Functionality pipelines

Custom `IChatClient` middleware

Dependency injection

The `IEmbeddingGenerator` interface

Create embeddings

Custom `IEmbeddingGenerator` middleware

See also

Additional resources

Share via

Artificial intelligence in .NET (Preview)

Install the package

The IChatClient interface

Request chat completion

Request chat completion with streaming

Tool calling

Cache responses

Use telemetry

Provide options

Functionality pipelines

Custom IChatClient middleware

Dependency injection

The IEmbeddingGenerator interface

Create embeddings

Custom IEmbeddingGenerator middleware

See also

Additional resources

The `IChatClient` interface

Custom `IChatClient` middleware

The `IEmbeddingGenerator` interface

Custom `IEmbeddingGenerator` middleware