Sdílet prostřednictvím


Legacy input and output agent schema

Note

Databricks has deprecated the AI agent schemas ChatModel, SplitChatMessageRequest, and StringResponse. Databricks recommends migrating to the recommended ChatAgent schema to author agents. See Author AI agents in code.

AI agents must adhere to specific input and output schema requirements to be compatible with other features on Azure Databricks. This article explains how to use the legacy agent authoring signatures and interfaces: ChatModel interface, the SplitChatMessageRequest input schema, and the StringResponse output schema.

Author a legacy ChatModel agent

Important

Databricks recommends the ChatAgent interface for creating agents or gen AI apps. To migrate from ChatModel to ChatAgent, see MLflow documentation - Migrate from ChatModel to ChatAgent.

ChatModel is a legacy agent authoring interface in MLflow that extends OpenAI’s ChatCompletion schema, allowing you to maintain compatibility with platforms supporting the ChatCompletion standard while adding custom functionality. See MLflow: Getting Started with ChatModel for additional details.

Authoring your agent as a subclass of mlflow.pyfunc.ChatModel provides the following benefits:

  • Enables streaming agent output when invoking a served agent (bypassing {stream: true} in the request body).
  • Automatically enables AI gateway inference tables when your agent is served, providing access to enhanced request log metadata, such as the requester name.
  • Allows you to write agent code compatible with the ChatCompletion schema using typed Python classes.
  • MLflow automatically infers a chat completion-compatible signature when logging the agent, even without an input_example. This simplifies the process of registering and deploying the agent. See Infer Model Signature during logging.

The following code is best executed in a Databricks notebook. Notebooks provide a convenient environment for developing, testing, and iterating on your agent.

The MyAgent class extends mlflow.pyfunc.ChatModel, implementing the required predict method. This ensures compatibility with Mosaic AI Agent Framework.

The class also includes the optional methods _create_chat_completion_chunk and predict_stream to handle streaming outputs.

import re
from typing import Optional, Dict, List, Generator
from mlflow.pyfunc import ChatModel
from mlflow.types.llm import (
    # Non-streaming helper classes
    ChatCompletionRequest,
    ChatCompletionResponse,
    ChatCompletionChunk,
    ChatMessage,
    ChatChoice,
    ChatParams,
    # Helper classes for streaming agent output
    ChatChoiceDelta,
    ChatChunkChoice,
)

class MyAgent(ChatModel):
    """
    Defines a custom agent that processes ChatCompletionRequests
    and returns ChatCompletionResponses.
    """
    def predict(self, context, messages: list[ChatMessage], params: ChatParams) -> ChatCompletionResponse:
        last_user_question_text = messages[-1].content
        response_message = ChatMessage(
            role="assistant",
            content=(
                f"I will always echo back your last question. Your last question was: {last_user_question_text}. "
            )
        )
        return ChatCompletionResponse(
            choices=[ChatChoice(message=response_message)]
        )

    def _create_chat_completion_chunk(self, content) -> ChatCompletionChunk:
        """Helper for constructing a ChatCompletionChunk instance for wrapping streaming agent output"""
        return ChatCompletionChunk(
                choices=[ChatChunkChoice(
                    delta=ChatChoiceDelta(
                        role="assistant",
                        content=content
                    )
                )]
            )

    def predict_stream(
        self, context, messages: List[ChatMessage], params: ChatParams
    ) -> Generator[ChatCompletionChunk, None, None]:
        last_user_question_text = messages[-1].content
        yield self._create_chat_completion_chunk(f"Echoing back your last question, word by word.")
        for word in re.findall(r"\S+\s*", last_user_question_text):
            yield self._create_chat_completion_chunk(word)

agent = MyAgent()
model_input = ChatCompletionRequest(
    messages=[ChatMessage(role="user", content="What is Databricks?")]
)
response = agent.predict(context=None, model_input=model_input)
print(response)

While the agent class MyAgent is defined in one notebook, you should create a separate driver notebook. The driver notebook logs the agent to Model Registry and deploys the agent using Model Serving.

This separation follows the workflow recommended by Databricks for logging models using MLflow’s Models from Code methodology.

SplitChatMessageRequest input schema (deprecated)

SplitChatMessagesRequest allows you to pass the current query and history separately as agent input.

  question = {
      "query": "What is MLflow",
      "history": [
          {
              "role": "user",
              "content": "What is Retrieval-augmented Generation?"
          },
          {
              "role": "assistant",
              "content": "RAG is"
          }
      ]
  }

StringResponse output schema (deprecated)

StringResponse allows you to return the agent’s response as an object with a single string content field:

{"content": "This is an example string response"}