Define an agent’s input and output schema
MLflow Model Signatures define the input and output schema requirements for your AI agent. The Model Signature tells internal and external components how to interact with your agent. The Model Signature is a validation check to ensure that inputs adhere to schema requirements.
For example, to use the Agent Evaluation review app, your agent must adhere to the Agent Evaluation input schema.
Supported input schemas
Mosaic AI Agent Framework supports the following input schemas.
OpenAI chat completion schema
Note
Databricks recommends the OpenAI chat completion schema because it’s widely used and interoperable with many agent frameworks and applications. If the OpenAI chat completion schema does not meet your needs, you can define your own schema. See Custom agent schemas.
(Recommended) Databricks recommends using the OpenAI chat completion schema. The OpenAI chat completion schema should have an array of objects as a
messages
parameter. This format is best for RAG applications.question = { "messages": [ { "role": "user", "content": "What is Retrieval-Augmented Generation?", }, { "role": "assistant", "content": "RAG, or Retrieval Augmented Generation, is a generative AI design pattern that combines a large language model (LLM) with external knowledge retrieval. This approach allows for real-time data connection to generative AI applications, improving their accuracy and quality by providing context from your data to the LLM during inference. Databricks offers integrated tools that support various RAG scenarios, such as unstructured data, structured data, tools & function calling, and agents.", }, { "role": "user", "content": "How to build RAG for unstructured data", }, ] }
SplitChatMessageRequest
SplitChatMessagesRequest
is recommended for multi-turn chat applications, especially when you want to manage the current query and history separately.
question = {
"query": "What is MLflow",
"history": [
{
"role": "user",
"content": "What is Retrieval-augmented Generation?"
},
{
"role": "assistant",
"content": "RAG is"
}
]
}
Langchain Expression Language
If your agent uses LangChain, you can write your chain in LangChain Expression Language. In your chain definition code, you can use an itemgetter
to get the messages or query
or history
objects depending on your input format.
Supported output schemas
Mosaic AI Agent Framework supports the following output schemas.
ChatCompletionResponse
(Recommended) ChatCompletionResponse is recommended for customers with OpenAI response format interoperability.
LangChain - ChatCompletionsOutputParser
If your agent uses LangChain, use ChatCompletionsOutputParser()
from MLflow as your final chain step. This formats the LangChain AI message into an agent-compatible format.
from mlflow.langchain.output_parsers import ChatCompletionsOutputParser
chain = (
{
"user_query": itemgetter("messages")
| RunnableLambda(extract_user_query_string),
"chat_history": itemgetter("messages") | RunnableLambda(extract_chat_history),
}
| RunnableLambda(DatabricksChat)
| ChatCompletionsOutputParser()
)
PyFunc - annotate input and output classes
If you are using PyFunc, Databricks recommends using type hints to annotate the predict()
function with input and output data classes that are subclasses of classes defined in mlflow.models.rag_signatures
.
You can construct an output object from the data class inside predict()
. The returned object must be transformed into a dictionary representation to ensure it can be serialized.
from mlflow.models.rag_signatures import ChatCompletionRequest, ChatCompletionResponse, ChainCompletionChoice, Message
class RAGModel(PythonModel):
...
def predict(self, context, model_input: ChatCompletionRequest) -> ChatCompletionResponse:
...
return asdict(ChatCompletionResponse(
choices=[ChainCompletionChoice(message=Message(content=text))]
))
Explicit and inferred signatures
MLflow can infer the input and output schema of your agent at runtime and create a signature automatically. If you use supported input and output schemas, the inferred signatures are compatible with the Agent Framework. For information about supported schemas, see Supported input schemas.
However, if you use a custom agent schema, you must explicitly define your Model Signature according to the instructions in Custom agent schemas.
Custom agent schemas
You can customize an agent’s schema to pass and return additional fields to and from the agent by creating a subclass of a supported input/output schema. Then, add the extra keys custom_inputs
and custom_outputs
to contain the additional fields. See code examples for Pyfunc and Langchain and a UI-based method for using custom inputs.
To use the databricks-agents
SDK, Databricks client UIs such as the AI Playground and the Review App, and other Mosaic AI Agent Framework features, your agent’s schema must fulfill the following requirements:
- The agent must use
mlflow
version 2.17.1 or above. - In the agent notebook, mark additional fields added in your subclass as
Optional
and assign default values. - In the driver notebook, construct a
ModelSignature
usinginfer_signature
with instances of your subclasses. - In the driver notebook, construct an input example by calling
asdict
on your subclass.
PyFunc custom schemas
In addition to the requirements above, PyFunc-based agents must also meet the following requirements to interact with Mosaic AI agent features.
PyFunc custom schema requirements
In the agent notebook, the predict and predict stream functions must meet the following requirements:
- Have type hints for your input subclass.
- Use dot notation to access dataclass fields (for example, use
model_input.custom_input.id
instead ofmodel_input["custom_inputs"]
). - Return a
dictionary
. You can callasdict
on an instance of your subclass to format the return as a dictionary.
The following notebooks show a custom schema example using PyFunc.
PyFunc custom schema agent notebook
PyFunc custom schema driver notebook
Langchain custom schemas
The following notebooks show a custom schema example using LangChain. You can modify the wrap_output function in the notebooks to parse and extract information from the message stream.
Langchain custom schema agent notebook
Langchain custom schema driver notebook
Provide custom_inputs
in the AI Playground and agent review app
If you define a custom agent schema with additional inputs using the custom_inputs
field, you can manually provide these inputs in both the AI Playground and the agent review app. If no custom inputs are provided, the agent uses the default values specified in your schema.
In either the AI Playground or the Agent Review App, select the gear icon .
Enable custom_inputs.
Provide a JSON object that matches your agent’s defined input schema.
The JSON object must match the agent’s input schema. For example, if you have a custom_inputs
dataclass defined as follows:
@dataclass
class CustomInputs():
id: int = 0
user: str = "default"
Then the JSON string that you enter in the custom_inputs field must provide values for id
and user
, as shown in the following example:
{
"id": 123
"user": "dev_test",
}