Artificial intelligence in .NET (Preview)
With a growing variety of artificial intelligence (AI) services available, developers need a way to integrate and interact with these services in their .NET applications. The Microsoft.Extensions.AI
library provides a unified approach for representing generative AI components, which enables seamless integration and interoperability with various AI services. This article introduces the library and provides installation instructions and usage examples to help you get started.
Install the package
To install the 📦 Microsoft.Extensions.AI NuGet package, use the .NET CLI or add a package reference directly to your C# project file:
dotnet add package Microsoft.Extensions.AI --prelease
For more information, see dotnet add package or Manage package dependencies in .NET applications.
Usage examples
The IChatClient interface defines a client abstraction responsible for interacting with AI services that provide chat capabilities. It includes methods for sending and receiving messages with multi-modal content (such as text, images, and audio), either as a complete set or streamed incrementally. Additionally, it provides metadata information about the client and allows retrieving strongly typed services.
Important
For more usage examples and real-world scenarios, see AI for .NET developers.
In this section
The IChatClient
interface
The following sample implements IChatClient
to show the general structure.
using System.Runtime.CompilerServices;
using Microsoft.Extensions.AI;
public sealed class SampleChatClient(Uri endpoint, string modelId) : IChatClient
{
public ChatClientMetadata Metadata { get; } = new(nameof(SampleChatClient), endpoint, modelId);
public async Task<ChatCompletion> CompleteAsync(
IList<ChatMessage> chatMessages,
ChatOptions? options = null,
CancellationToken cancellationToken = default)
{
// Simulate some operation.
await Task.Delay(300, cancellationToken);
// Return a sample chat completion response randomly.
string[] responses =
[
"This is the first sample response.",
"Here is another example of a response message.",
"This is yet another response message."
];
return new([new ChatMessage()
{
Role = ChatRole.Assistant,
Text = responses[Random.Shared.Next(responses.Length)],
}]);
}
public async IAsyncEnumerable<StreamingChatCompletionUpdate> CompleteStreamingAsync(
IList<ChatMessage> chatMessages,
ChatOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Simulate streaming by yielding messages one by one.
string[] words = ["This ", "is ", "the ", "response ", "for ", "the ", "request."];
foreach (string word in words)
{
// Simulate some operation.
await Task.Delay(100, cancellationToken);
// Yield the next message in the response.
yield return new StreamingChatCompletionUpdate
{
Role = ChatRole.Assistant,
Text = word,
};
}
}
public object? GetService(Type serviceType, object? serviceKey) => this;
public TService? GetService<TService>(object? key = null)
where TService : class => this as TService;
void IDisposable.Dispose() { }
}
You can find other concrete implementations of IChatClient
in the following NuGet packages:
- 📦 Microsoft.Extensions.AI.AzureAIInference: Implementation backed by Azure AI Model Inference API.
- 📦 Microsoft.Extensions.AI.Ollama: Implementation backed by Ollama.
- 📦 Microsoft.Extensions.AI.OpenAI: Implementation backed by either OpenAI or OpenAI-compatible endpoints (such as Azure OpenAI).
Request chat completion
To request a completion, call the IChatClient.CompleteAsync method. The request is composed of one or more messages, each of which is composed of one or more pieces of content. Accelerator methods exist to simplify common cases, such as constructing a request for a single piece of text content.
using Microsoft.Extensions.AI;
IChatClient client = new SampleChatClient(
new Uri("http://coolsite.ai"), "target-ai-model");
var response = await client.CompleteAsync("What is AI?");
Console.WriteLine(response.Message);
The core IChatClient.CompleteAsync
method accepts a list of messages. This list represents the history of all messages that are part of the conversation.
using Microsoft.Extensions.AI;
IChatClient client = new SampleChatClient(
new Uri("http://coolsite.ai"), "target-ai-model");
Console.WriteLine(await client.CompleteAsync(
[
new(ChatRole.System, "You are a helpful AI assistant"),
new(ChatRole.User, "What is AI?"),
]));
Each message in the history is represented by a ChatMessage object. The ChatMessage
class provides a ChatMessage.Role property that indicates the role of the message. By default, the ChatRole.User is used. The following roles are available:
- ChatRole.Assistant: Instructs or sets the behavior of the assistant.
- ChatRole.System: Provides responses to system-instructed, user-prompted input.
- ChatRole.Tool: Provides additional information and references for chat completions.
- ChatRole.User: Provides input for chat completions.
Each chat message is instantiated, assigning to its Contents property a new TextContent. There are various types of content that can be represented, such as a simple string or a more complex object that represents a multi-modal message with text, images, and audio:
- AudioContent
- DataContent
- FunctionCallContent
- FunctionResultContent
- ImageContent
- TextContent
- UsageContent
Request chat completion with streaming
The inputs to IChatClient.CompleteStreamingAsync are identical to those of CompleteAsync
. However, rather than returning the complete response as part of a ChatCompletion object, the method returns an IAsyncEnumerable<T> where T
is StreamingChatCompletionUpdate, providing a stream of updates that collectively form the single response.
using Microsoft.Extensions.AI;
IChatClient client = new SampleChatClient(
new Uri("http://coolsite.ai"), "target-ai-model");
await foreach (var update in client.CompleteStreamingAsync("What is AI?"))
{
Console.Write(update);
}
Tip
Streaming APIs are nearly synonymous with AI user experiences. C# enables compelling scenarios with its IAsyncEnumerable<T>
support, allowing for a natural and efficient way to stream data.
Tool calling
Some models and services support tool calling, where requests can include tools for the model to invoke functions to gather additional information. Instead of sending a final response, the model requests a function invocation with specific arguments. The client then invokes the function and sends the results back to the model along with the conversation history. The Microsoft.Extensions.AI
library includes abstractions for various message content types, including function call requests and results. While consumers can interact with this content directly, Microsoft.Extensions.AI
automates these interactions and provides:
- AIFunction: Represents a function that can be described to an AI service and invoked.
- AIFunctionFactory: Provides factory methods for creating commonly used implementations of
AIFunction
. - FunctionInvokingChatClient: Wraps an
IChatClient
to add automatic function invocation capabilities.
Consider the following example that demonstrates a random function invocation:
using System.ComponentModel;
using Microsoft.Extensions.AI;
[Description("Gets the current weather")]
string GetCurrentWeather() => Random.Shared.NextDouble() > 0.5
? "It's sunny"
: "It's raining";
IChatClient client = new ChatClientBuilder(
new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
.UseFunctionInvocation()
.Build();
var response = client.CompleteStreamingAsync(
"Should I wear a rain coat?",
new() { Tools = [AIFunctionFactory.Create(GetCurrentWeather)] });
await foreach (var update in response)
{
Console.Write(update);
}
The preceding example depends on the 📦 Microsoft.Extensions.AI.Ollama NuGet package.
The preceding code:
- Defines a function named
GetCurrentWeather
that returns a random weather forecast.- This function is decorated with a DescriptionAttribute, which is used to provide a description of the function to the AI service.
- Instantiates a ChatClientBuilder with an OllamaChatClient and configures it to use function invocation.
- Calls
CompleteStreamingAsync
on the client, passing a prompt and a list of tools that includes a function created with Create. - Iterates over the response, printing each update to the console.
Cache responses
If you're familiar with Caching in .NET, it's good to know that Microsoft.Extensions.AI provides other such delegating IChatClient
implementations. The DistributedCachingChatClient is an IChatClient
that layers caching around another arbitrary IChatClient
instance. When a unique chat history is submitted to the DistributedCachingChatClient
, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same prompt is submitted, such that a cached response can be found in the cache, the DistributedCachingChatClient
returns the cached response rather than needing to forward the request along the pipeline.
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
var sampleChatClient = new SampleChatClient(
new Uri("http://coolsite.ai"), "target-ai-model");
IChatClient client = new ChatClientBuilder(sampleChatClient)
.UseDistributedCache(new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions())))
.Build();
string[] prompts = ["What is AI?", "What is .NET?", "What is AI?"];
foreach (var prompt in prompts)
{
await foreach (var update in client.CompleteStreamingAsync(prompt))
{
Console.Write(update);
}
Console.WriteLine();
}
The preceding example depends on the 📦 Microsoft.Extensions.Caching.Memory NuGet package. For more information, see Caching in .NET.
Use telemetry
Another example of a delegating chat client is the OpenTelemetryChatClient. This implementation adheres to the OpenTelemetry Semantic Conventions for Generative AI systems. Similar to other IChatClient
delegators, it layers metrics and spans around any underlying IChatClient
implementation, providing enhanced observability.
using Microsoft.Extensions.AI;
using OpenTelemetry.Trace;
// Configure OpenTelemetry exporter
var sourceName = Guid.NewGuid().ToString();
var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
.AddSource(sourceName)
.AddConsoleExporter()
.Build();
var sampleChatClient = new SampleChatClient(
new Uri("http://coolsite.ai"), "target-ai-model");
IChatClient client = new ChatClientBuilder(sampleChatClient)
.UseOpenTelemetry(
sourceName: sourceName,
configure: static c => c.EnableSensitiveData = true)
.Build();
Console.WriteLine((await client.CompleteAsync("What is AI?")).Message);
The preceding example depends on the 📦 OpenTelemetry.Exporter.Console NuGet package.
Provide options
Every call to CompleteAsync or CompleteStreamingAsync can optionally supply a ChatOptions instance containing additional parameters for the operation. The most common parameters among AI models and services show up as strongly typed properties on the type, such as ChatOptions.Temperature. Other parameters can be supplied by name in a weakly typed manner via the ChatOptions.AdditionalProperties dictionary.
You can also specify options when building an IChatClient
with the fluent ChatClientBuilder API and chaining a call to the ConfigureOptions
extension method. This delegating client wraps another client and invokes the supplied delegate to populate a ChatOptions
instance for every call. For example, to ensure that the ChatOptions.ModelId property defaults to a particular model name, you can use code like the following:
using Microsoft.Extensions.AI;
IChatClient client = new ChatClientBuilder(
new OllamaChatClient(new Uri("http://localhost:11434")))
.ConfigureOptions(options => options.ModelId ??= "phi3")
.Build();
// will request "phi3"
Console.WriteLine(await client.CompleteAsync("What is AI?"));
// will request "llama3.1"
Console.WriteLine(await client.CompleteAsync(
"What is AI?", new() { ModelId = "llama3.1" }));
The preceding example depends on the 📦 Microsoft.Extensions.AI.Ollama NuGet package.
Functionality pipelines
IChatClient
instances can be layered to create a pipeline of components, each adding specific functionality. These components can come from Microsoft.Extensions.AI
, other NuGet packages, or custom implementations. This approach allows you to augment the behavior of the IChatClient
in various ways to meet your specific needs. Consider the following example code that layers a distributed cache, function invocation, and OpenTelemetry tracing around a sample chat client:
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OpenTelemetry.Trace;
// Configure OpenTelemetry exporter
var sourceName = Guid.NewGuid().ToString();
var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
.AddSource(sourceName)
.AddConsoleExporter()
.Build();
// Explore changing the order of the intermediate "Use" calls to see that impact
// that has on what gets cached, traced, etc.
IChatClient client = new ChatClientBuilder(
new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
.UseDistributedCache(new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions())))
.UseFunctionInvocation()
.UseOpenTelemetry(
sourceName: sourceName,
configure: static c => c.EnableSensitiveData = true)
.Build();
ChatOptions options = new()
{
Tools =
[
AIFunctionFactory.Create(
() => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining",
name: "GetCurrentWeather",
description: "Gets the current weather")
]
};
for (int i = 0; i < 3; ++i)
{
List<ChatMessage> history =
[
new ChatMessage(ChatRole.System, "You are a helpful AI assistant"),
new ChatMessage(ChatRole.User, "Do I need an umbrella?")
];
Console.WriteLine(await client.CompleteAsync(history, options));
}
The preceding example depends on the following NuGet packages:
- 📦 Microsoft.Extensions.Caching.Memory
- 📦 Microsoft.Extensions.AI.Ollama
- 📦 OpenTelemetry.Exporter.Console
Custom IChatClient
middleware
To add additional functionality, you can implement IChatClient
directly or use the DelegatingChatClient class. This class serves as a base for creating chat clients that delegate operations to another IChatClient
instance. It simplifies chaining multiple clients, allowing calls to pass through to an underlying client.
The DelegatingChatClient
class provides default implementations for methods like CompleteAsync
, CompleteStreamingAsync
, and Dispose
, which forward calls to the inner client. You can derive from this class and override only the methods you need to enhance behavior, while delegating other calls to the base implementation. This approach helps create flexible and modular chat clients that are easy to extend and compose.
The following is an example class derived from DelegatingChatClient
to provide rate limiting functionality, utilizing the RateLimiter:
using Microsoft.Extensions.AI;
using System.Runtime.CompilerServices;
using System.Threading.RateLimiting;
public sealed class RateLimitingChatClient(
IChatClient innerClient, RateLimiter rateLimiter)
: DelegatingChatClient(innerClient)
{
public override async Task<ChatCompletion> CompleteAsync(
IList<ChatMessage> chatMessages,
ChatOptions? options = null,
CancellationToken cancellationToken = default)
{
using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
.ConfigureAwait(false);
if (!lease.IsAcquired)
{
throw new InvalidOperationException("Unable to acquire lease.");
}
return await base.CompleteAsync(chatMessages, options, cancellationToken)
.ConfigureAwait(false);
}
public override async IAsyncEnumerable<StreamingChatCompletionUpdate> CompleteStreamingAsync(
IList<ChatMessage> chatMessages,
ChatOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
.ConfigureAwait(false);
if (!lease.IsAcquired)
{
throw new InvalidOperationException("Unable to acquire lease.");
}
await foreach (var update in base.CompleteStreamingAsync(chatMessages, options, cancellationToken)
.ConfigureAwait(false))
{
yield return update;
}
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
rateLimiter.Dispose();
}
base.Dispose(disposing);
}
}
The preceding example depends on the 📦 System.Threading.RateLimiting NuGet package. Composition of the RateLimitingChatClient
with another client is straightforward:
using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;
var client = new RateLimitingChatClient(
new SampleChatClient(new Uri("http://localhost"), "test"),
new ConcurrencyLimiter(new()
{
PermitLimit = 1,
QueueLimit = int.MaxValue
}));
await client.CompleteAsync("What color is the sky?");
To simplify the composition of such components with others, component authors should create a Use*
extension method for registering the component into a pipeline. For example, consider the following extension method:
namespace Example.One;
// <one>
using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;
public static class RateLimitingChatClientExtensions
{
public static ChatClientBuilder UseRateLimiting(
this ChatClientBuilder builder, RateLimiter rateLimiter) =>
builder.Use(innerClient => new RateLimitingChatClient(innerClient, rateLimiter));
}
// </one>
Such extensions can also query for relevant services from the DI container; the IServiceProvider used by the pipeline is passed in as an optional parameter:
namespace Example.Two;
// <two>
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using System.Threading.RateLimiting;
public static class RateLimitingChatClientExtensions
{
public static ChatClientBuilder UseRateLimiting(
this ChatClientBuilder builder, RateLimiter? rateLimiter = null) =>
builder.Use((innerClient, services) =>
new RateLimitingChatClient(
innerClient,
rateLimiter ?? services.GetRequiredService<RateLimiter>()));
}
// </two>
The consumer can then easily use this in their pipeline, for example:
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddChatClient(services =>
new SampleChatClient(new Uri("http://localhost"), "test")
.AsBuilder()
.UseDistributedCache()
.UseRateLimiting()
.UseOpenTelemetry()
.Build(services));
using var app = builder.Build();
// Elsewhere in the app
var chatClient = app.Services.GetRequiredService<IChatClient>();
Console.WriteLine(await chatClient.CompleteAsync("What is AI?"));
app.Run();
This example demonstrates hosted scenario, where the consumer relies on dependency injection to provide the RateLimiter
instance. The preceding extension methods demonstrate using a Use
method on ChatClientBuilder. The ChatClientBuilder
also provides Use overloads that make it easier to write such delegating handlers.
- Use(IChatClient)
- Use(Func<IChatClient,IChatClient>)
- Use(Func<IServiceProvider,IChatClient,IChatClient>)
For example, in the earlier RateLimitingChatClient
example, the overrides of CompleteAsync
and CompleteStreamingAsync
only need to do work before and after delegating to the next client in the pipeline. To achieve the same thing without writing a custom class, you can use an overload of Use
that accepts a delegate that's used for both CompleteAsync
and CompleteStreamingAsync
, reducing the boilerplate required:
using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;
RateLimiter rateLimiter = new ConcurrencyLimiter(new()
{
PermitLimit = 1,
QueueLimit = int.MaxValue
});
var client = new SampleChatClient(new Uri("http://localhost"), "test")
.AsBuilder()
.UseDistributedCache()
.Use(async (chatMessages, options, nextAsync, cancellationToken) =>
{
using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
.ConfigureAwait(false);
if (!lease.IsAcquired)
{
throw new InvalidOperationException("Unable to acquire lease.");
}
await nextAsync(chatMessages, options, cancellationToken);
})
.UseOpenTelemetry()
.Build();
// Use client
The preceding overload internally uses an AnonymousDelegatingChatClient
, which enables more complicated patterns with only a little additional code. For example, to achieve the same result but with the RateLimiter retrieved from DI:
using System.Threading.RateLimiting;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
var client = new SampleChatClient(new Uri("http://localhost"), "test")
.AsBuilder()
.UseDistributedCache()
.Use(static (innerClient, services) =>
{
var rateLimiter = services.GetRequiredService<RateLimiter>();
return new AnonymousDelegatingChatClient(
innerClient, async (chatMessages, options, nextAsync, cancellationToken) =>
{
using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
.ConfigureAwait(false);
if (!lease.IsAcquired)
{
throw new InvalidOperationException("Unable to acquire lease.");
}
await nextAsync(chatMessages, options, cancellationToken);
});
})
.UseOpenTelemetry()
.Build();
For scenarios where the developer would like to specify delegating implementations of CompleteAsync
and CompleteStreamingAsync
inline, and where it's important to be able to write a different implementation for each in order to handle their unique return types specially, another overload of Use
exists that accepts a delegate for each.
Dependency injection
IChatClient implementations will typically be provided to an application via dependency injection (DI). In this example, an IDistributedCache is added into the DI container, as is an IChatClient
. The registration for the IChatClient
employs a builder that creates a pipeline containing a caching client (which will then use an IDistributedCache
retrieved from DI) and the sample client. The injected IChatClient
can be retrieved and used elsewhere in the app.
::code language="csharp" source="snippets/ai/ConsoleAI.DependencyInjection/Program.cs":::
The preceding example depends on the following NuGet packages:
What instance and configuration is injected can differ based on the current needs of the application, and multiple pipelines can be injected with different keys.
The IEmbeddingGenerator
interface
The IEmbeddingGenerator<TInput,TEmbedding> interface represents a generic generator of embeddings. Here, TInput
is the type of input values being embedded, and TEmbedding
is the type of generated embedding, which inherits from the Embedding class.
The Embedding
class serves as a base class for embeddings generated by an IEmbeddingGenerator
. It's designed to store and manage the metadata and data associated with embeddings. Derived types like Embedding<T>
provide the concrete embedding vector data. For instance, an embedding exposes a Embedding<T>.Vector property to access its embedding data.
The IEmbeddingGenerator
interface defines a method to asynchronously generate embeddings for a collection of input values, with optional configuration and cancellation support. It also provides metadata describing the generator and allows for the retrieval of strongly typed services that can be provided by the generator or its underlying services.
Sample implementation
Consider the following sample implementation of an IEmbeddingGenerator
to show the general structure but that just generates random embedding vectors.
using Microsoft.Extensions.AI;
public sealed class SampleEmbeddingGenerator(
Uri endpoint, string modelId)
: IEmbeddingGenerator<string, Embedding<float>>
{
public EmbeddingGeneratorMetadata Metadata { get; } =
new(nameof(SampleEmbeddingGenerator), endpoint, modelId);
public async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
IEnumerable<string> values,
EmbeddingGenerationOptions? options = null,
CancellationToken cancellationToken = default)
{
// Simulate some async operation
await Task.Delay(100, cancellationToken);
// Create random embeddings
return
[
.. from value in values
select new Embedding<float>(
Enumerable.Range(0, 384)
.Select(_ => Random.Shared.NextSingle())
.ToArray())
];
}
public object? GetService(Type serviceType, object? serviceKey) => this;
public TService? GetService<TService>(object? key = null)
where TService : class => this as TService;
void IDisposable.Dispose() { }
}
The preceding code:
- Defines a class named
SampleEmbeddingGenerator
that implements theIEmbeddingGenerator<string, Embedding<float>>
interface. - Has a primary constructor that accepts an endpoint and model ID, which are used to identify the generator.
- Exposes a
Metadata
property that provides metadata about the generator. - Implements the
GenerateAsync
method to generate embeddings for a collection of input values:- Simulates an asynchronous operation by delaying for 100 milliseconds.
- Returns random embeddings for each input value.
You can find actual concrete implementations in the following packages:
Create embeddings
The primary operation performed with an IEmbeddingGenerator<TInput,TEmbedding> is embedding generation, which is accomplished with its GenerateAsync method.
::code language="csharp" source="snippets/ai/ConsoleAI.CreateEmbeddings/Program.cs":::
Custom IEmbeddingGenerator
middleware
As with IChatClient
, IEmbeddingGenerator
implementations can be layered. Just as Microsoft.Extensions.AI
provides delegating implementations of IChatClient
for caching and telemetry, it provides an implementation for IEmbeddingGenerator
as well.
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OpenTelemetry.Trace;
// Configure OpenTelemetry exporter
var sourceName = Guid.NewGuid().ToString();
var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
.AddSource(sourceName)
.AddConsoleExporter()
.Build();
// Explore changing the order of the intermediate "Use" calls to see that impact
// that has on what gets cached, traced, etc.
var generator = new EmbeddingGeneratorBuilder<string, Embedding<float>>(
new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "target-ai-model"))
.UseDistributedCache(
new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions())))
.UseOpenTelemetry(sourceName: sourceName)
.Build();
var embeddings = await generator.GenerateAsync(
[
"What is AI?",
"What is .NET?",
"What is AI?"
]);
foreach (var embedding in embeddings)
{
Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}
The IEmbeddingGenerator
enables building custom middleware that extends the functionality of an IEmbeddingGenerator
. The DelegatingEmbeddingGenerator<TInput,TEmbedding> class is an implementation of the IEmbeddingGenerator<TInput, TEmbedding>
interface that serves as a base class for creating embedding generators that delegate their operations to another IEmbeddingGenerator<TInput, TEmbedding>
instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as GenerateAsync and Dispose
, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.
The following is an example implementation of such a delegating embedding generator that rate limits embedding generation requests:
using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;
public class RateLimitingEmbeddingGenerator(
IEmbeddingGenerator<string, Embedding<float>> innerGenerator, RateLimiter rateLimiter)
: DelegatingEmbeddingGenerator<string, Embedding<float>>(innerGenerator)
{
public override async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
IEnumerable<string> values,
EmbeddingGenerationOptions? options = null,
CancellationToken cancellationToken = default)
{
using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
.ConfigureAwait(false);
if (!lease.IsAcquired)
{
throw new InvalidOperationException("Unable to acquire lease.");
}
return await base.GenerateAsync(values, options, cancellationToken);
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
rateLimiter.Dispose();
}
base.Dispose(disposing);
}
}
This can then be layered around an arbitrary IEmbeddingGenerator<string, Embedding<float>>
to rate limit all embedding generation operations performed.
using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;
IEmbeddingGenerator<string, Embedding<float>> generator =
new RateLimitingEmbeddingGenerator(
new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "target-ai-model"),
new ConcurrencyLimiter(new()
{
PermitLimit = 1,
QueueLimit = int.MaxValue
}));
foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}
In this way, the RateLimitingEmbeddingGenerator
can be composed with other IEmbeddingGenerator<string, Embedding<float>>
instances to provide rate limiting functionality.