Używanie niestandardowych i lokalnych modeli sztucznej inteligencji z zestawem SDK jądra semantycznego

Artykuł
05/20/2024

W tym artykule pokazano, jak zintegrować niestandardowe i lokalne modele z zestawem SDK jądra semantycznego i używać ich do generowania tekstu i uzupełniania czatów.

Możesz dostosować kroki, aby używać ich z dowolnym modelem, do którego możesz uzyskać dostęp, niezależnie od tego, gdzie lub jak uzyskujesz do niego dostęp. Na przykład można zintegrować model codellama z zestawem SDK jądra semantycznego, aby umożliwić generowanie i dyskusję na temat kodu.

Niestandardowe i lokalne modele często zapewniają dostęp za pośrednictwem interfejsów API REST, na przykład zobacz Zgodność rozwiązania Ollama OpenAI. Zanim zintegrujesz model, będzie on hostowany i dostępny dla aplikacji platformy .NET za pośrednictwem protokołu HTTPS.

Wymagania wstępne

Konto platformy Azure z aktywną subskrypcją. Utwórz konto bezpłatnie.
Zestaw SDK platformy .NET
Microsoft.SemanticKernel Pakiet NuGet
Niestandardowy lub lokalny model, wdrożony i dostępny dla aplikacji .NET

Implementowanie generowania tekstu przy użyciu modelu lokalnego

W poniższej sekcji pokazano, jak zintegrować model z zestawem SDK jądra semantycznego, a następnie użyć go do generowania uzupełniania tekstu.

Utwórz klasę usługi, która implementuje ITextGenerationService interfejs. Na przykład:

class MyTextGenerationService : ITextGenerationService
{
    private IReadOnlyDictionary<string, object?>? _attributes;
    public IReadOnlyDictionary<string, object?> Attributes =>
        _attributes ??= new Dictionary<string, object?>();

    public string ModelUrl { get; init; } = "<default url to your model's Chat API>";
    public required string ModelApiKey { get; init; }

    public async IAsyncEnumerable<StreamingTextContent> GetStreamingTextContentsAsync(
        string prompt,
        PromptExecutionSettings? executionSettings = null,
        Kernel? kernel = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default
    )
    {
        // Build your model's request object, specify that streaming is requested
        MyModelRequest request = MyModelRequest.FromPrompt(prompt, executionSettings);
        request.Stream = true;

        // Send the completion request via HTTP
        using var httpClient = new HttpClient();

        // Send a POST to your model with the serialized request in the body
        using HttpResponseMessage httpResponse = await httpClient.PostAsJsonAsync(
            ModelUrl,
            request,
            cancellationToken
        );

        // Verify the request was completed successfully
        httpResponse.EnsureSuccessStatusCode();

        // Read your models response as a stream
        using StreamReader reader =
            new(await httpResponse.Content.ReadAsStreamAsync(cancellationToken));

        // Iteratively read a chunk of the response until the end of the stream
        // It is more efficient to use a buffer that is the same size as the internal buffer of the stream
        // If the size of the internal buffer was unspecified when the stream was constructed, its default size is 4 kilobytes (2048 UTF-16 characters)
        char[] buffer = new char[2048];
        while (!reader.EndOfStream)
        {
            // Check the cancellation token with each iteration
            cancellationToken.ThrowIfCancellationRequested();

            // Fill the buffer with the next set of characters, track how many characters were read
            int readCount = reader.Read(buffer, 0, buffer.Length);

            // Convert the character buffer to a string, only include as many characters as were just read
            string chunk = new(buffer, 0, readCount);

            yield return new StreamingTextContent(chunk);
        }
    }

    public async Task<IReadOnlyList<TextContent>> GetTextContentsAsync(
        string prompt,
        PromptExecutionSettings? executionSettings = null,
        Kernel? kernel = null,
        CancellationToken cancellationToken = default
    )
    {
        // Build your model's request object
        MyModelRequest request = MyModelRequest.FromPrompt(prompt, executionSettings);

        // Send the completion request via HTTP
        using var httpClient = new HttpClient();

        // Send a POST to your model with the serialized request in the body
        using HttpResponseMessage httpResponse = await httpClient.PostAsJsonAsync(
            ModelUrl,
            request,
            cancellationToken
        );

        // Verify the request was completed successfully
        httpResponse.EnsureSuccessStatusCode();

        // Deserialize the response body to your model's response object
        // Handle when the deserialization fails and returns null
        MyModelResponse response =
            await httpResponse.Content.ReadFromJsonAsync<MyModelResponse>(cancellationToken)
            ?? throw new Exception("Failed to deserialize response from model");

        // Convert your model's response into a list of ChatMessageContent
        return response
            .Completions.Select<string, TextContent>(completion => new(completion))
            .ToImmutableList();
    }
}

Dołącz nową klasę usługi podczas kompilowania elementu Kernel. Na przykład:

IKernelBuilder builder = Kernel.CreateBuilder();

// Add your text generation service as a singleton instance
builder.Services.AddKeyedSingleton<ITextGenerationService>(
    "myTextService1",
    new MyTextGenerationService
    {
        // Specify any properties specific to your service, such as the url or API key
        ModelUrl = "https://localhost:38748",
        ModelApiKey = "myApiKey"
    }
);

// Alternatively, add your text generation service as a factory method
builder.Services.AddKeyedSingleton<ITextGenerationService>(
    "myTextService2",
    (_, _) =>
        new MyTextGenerationService
        {
            // Specify any properties specific to your service, such as the url or API key
            ModelUrl = "https://localhost:38748",
            ModelApiKey = "myApiKey"
        }
);

// Add any other Kernel services or configurations
// ...
Kernel kernel = builder.Build();

Wyślij do modelu monit generowania tekstu bezpośrednio za pomocą Kernel klasy usługi lub . Na przykład:

var executionSettings = new PromptExecutionSettings
{
    // Add execution settings, such as the ModelID and ExtensionData
    ModelId = "MyModelId",
    ExtensionData = new Dictionary<string, object> { { "MaxTokens", 500 } }
};

// Send a prompt to your model directly through the Kernel
// The Kernel response will be null if the model can't be reached
string prompt = "Please list three services offered by Azure";
string? response = await kernel.InvokePromptAsync<string>(prompt);
Console.WriteLine($"Output: {response}");

// Alteratively, send a prompt to your model through the text generation service
ITextGenerationService textService = kernel.GetRequiredService<ITextGenerationService>();
TextContent responseContents = await textService.GetTextContentAsync(
    prompt,
    executionSettings
);
Console.WriteLine($"Output: {responseContents.Text}");

Implementowanie uzupełniania czatu przy użyciu modelu lokalnego

W poniższej sekcji pokazano, jak zintegrować model z zestawem SDK jądra semantycznego, a następnie użyć go do ukończenia czatu.

Utwórz klasę usługi, która implementuje IChatCompletionService interfejs. Na przykład:

class MyChatCompletionService : IChatCompletionService
{
    private IReadOnlyDictionary<string, object?>? _attributes;
    public IReadOnlyDictionary<string, object?> Attributes =>
        _attributes ??= new Dictionary<string, object?>();

    public string ModelUrl { get; init; } = "<default url to your model's Chat API>";
    public required string ModelApiKey { get; init; }

    public async Task<IReadOnlyList<ChatMessageContent>> GetChatMessageContentsAsync(
        ChatHistory chatHistory,
        PromptExecutionSettings? executionSettings = null,
        Kernel? kernel = null,
        CancellationToken cancellationToken = default
    )
    {
        // Build your model's request object
        MyModelRequest request = MyModelRequest.FromChatHistory(chatHistory, executionSettings);

        // Send the completion request via HTTP
        using var httpClient = new HttpClient();

        // Send a POST to your model with the serialized request in the body
        using HttpResponseMessage httpResponse = await httpClient.PostAsJsonAsync(
            ModelUrl,
            request,
            cancellationToken
        );

        // Verify the request was completed successfully
        httpResponse.EnsureSuccessStatusCode();

        // Deserialize the response body to your model's response object
        // Handle when the deserialization fails and returns null
        MyModelResponse response =
            await httpResponse.Content.ReadFromJsonAsync<MyModelResponse>(cancellationToken)
            ?? throw new Exception("Failed to deserialize response from model");

        // Convert your model's response into a list of ChatMessageContent
        return response
            .Completions.Select<string, ChatMessageContent>(completion =>
                new(AuthorRole.Assistant, completion)
            )
            .ToImmutableList();
    }

    public async IAsyncEnumerable<StreamingChatMessageContent> GetStreamingChatMessageContentsAsync(
        ChatHistory chatHistory,
        PromptExecutionSettings? executionSettings = null,
        Kernel? kernel = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default
    )
    {
        // Build your model's request object, specify that streaming is requested
        MyModelRequest request = MyModelRequest.FromChatHistory(chatHistory, executionSettings);
        request.Stream = true;

        // Send the completion request via HTTP
        using var httpClient = new HttpClient();

        // Send a POST to your model with the serialized request in the body
        using HttpResponseMessage httpResponse = await httpClient.PostAsJsonAsync(
            ModelUrl,
            request,
            cancellationToken
        );

        // Verify the request was completed successfully
        httpResponse.EnsureSuccessStatusCode();

        // Read your models response as a stream
        using StreamReader reader =
            new(await httpResponse.Content.ReadAsStreamAsync(cancellationToken));

        // Iteratively read a chunk of the response until the end of the stream
        // It is more efficient to use a buffer that is the same size as the internal buffer of the stream
        // If the size of the internal buffer was unspecified when the stream was constructed, its default size is 4 kilobytes (2048 UTF-16 characters)
        char[] buffer = new char[2048];
        while (!reader.EndOfStream)
        {
            // Check the cancellation token with each iteration
            cancellationToken.ThrowIfCancellationRequested();

            // Fill the buffer with the next set of characters, track how many characters were read
            int readCount = reader.Read(buffer, 0, buffer.Length);

            // Convert the character buffer to a string, only include as many characters as were just read
            string chunk = new(buffer, 0, readCount);

            yield return new StreamingChatMessageContent(AuthorRole.Assistant, chunk);
        }
    }
}

Dołącz nową klasę usługi podczas kompilowania elementu Kernel. Na przykład:

IKernelBuilder builder = Kernel.CreateBuilder();

// Add your chat completion service as a singleton instance
builder.Services.AddKeyedSingleton<IChatCompletionService>(
    "myChatService1",
    new MyChatCompletionService
    {
        // Specify any properties specific to your service, such as the url or API key
        ModelUrl = "https://localhost:38748",
        ModelApiKey = "myApiKey"
    }
);

// Alternatively, add your chat completion service as a factory method
builder.Services.AddKeyedSingleton<IChatCompletionService>(
    "myChatService2",
    (_, _) =>
        new MyChatCompletionService
        {
            // Specify any properties specific to your service, such as the url or API key
            ModelUrl = "https://localhost:38748",
            ModelApiKey = "myApiKey"
        }
);

// Add any other Kernel services or configurations
// ...
Kernel kernel = builder.Build();

Wyślij monit o ukończenie czatu do modelu bezpośrednio za pośrednictwem Kernel klasy usługi lub . Na przykład:

var executionSettings = new PromptExecutionSettings
{
    // Add execution settings, such as the ModelID and ExtensionData
    ModelId = "MyModelId",
    ExtensionData = new Dictionary<string, object> { { "MaxTokens", 500 } }
};

// Send a string representation of the chat history to your model directly through the Kernel
// This uses a special syntax to denote the role for each message
// For more information on this syntax see:
// https://learn.microsoft.com/en-us/semantic-kernel/prompts/your-first-prompt?tabs=Csharp
string prompt = """
    <message role="system">the initial system message for your chat history</message>
    <message role="user">the user's initial message</message>
    """;

string? response = await kernel.InvokePromptAsync<string>(prompt);
Console.WriteLine($"Output: {response}");

// Alteratively, send a prompt to your model through the chat completion service
// First, initialize a chat history with your initial system message
string systemMessage = "<the initial system message for your chat history>";
Console.WriteLine($"System Prompt: {systemMessage}");
var chatHistory = new ChatHistory(systemMessage);

// Add the user's input to your chat history
string userRequest = "<the user's initial message>";
Console.WriteLine($"User: {userRequest}");
chatHistory.AddUserMessage(userRequest);

// Get the models response and add it to the chat history
IChatCompletionService service = kernel.GetRequiredService<IChatCompletionService>();
ChatMessageContent responseMessage = await service.GetChatMessageContentAsync(
    chatHistory,
    executionSettings
);
Console.WriteLine($"Assistant: {responseMessage.Content}");
chatHistory.Add(responseMessage);

// Continue sending and receiving messages between the user and model
// ...

Udostępnij za pośrednictwem

Używanie niestandardowych i lokalnych modeli sztucznej inteligencji z zestawem SDK jądra semantycznego

Wymagania wstępne

Implementowanie generowania tekstu przy użyciu modelu lokalnego

Implementowanie uzupełniania czatu przy użyciu modelu lokalnego

Dodatkowe zasoby

Udostępnij za pośrednictwem

Używanie niestandardowych i lokalnych modeli sztucznej inteligencji z zestawem SDK jądra semantycznego

Wymagania wstępne

Implementowanie generowania tekstu przy użyciu modelu lokalnego

Implementowanie uzupełniania czatu przy użyciu modelu lokalnego

Powiązana zawartość

Dodatkowe zasoby