開始使用 Databricks AI 代理程式

發行項
03/11/2025

本教學課程會逐步引導您建置使用擷取和工具的 AI 代理程式。數據集由已經分割成區塊的 Databricks 檔案子集所構成。在本教學課程中，您會建置代理程式，以根據關鍵詞擷取檔。

為了簡單起見，本教學課程會根據 TF-IDF，使用簡單的內存方法，針對檔案進行關鍵詞擷取和搜尋。範例筆記本包含教學課程中使用的所有程序代碼。如需使用馬賽克 AI 向量搜尋來可擴展地索引和搜索文件的實際範例，請參閱 ChatAgent 範例。

本教學課程涵蓋建置 Generative AI 應用程式的一些核心挑戰：

簡化常見工作的開發體驗，例如建立工具和偵錯代理程序執行。
作業挑戰，例如：
- 追蹤代理程序設定
- 以可預測的方式定義輸入和輸出
- 管理相依性版本
- 版本控制和部署
評估和提升代理人的品質和可靠性。

整體開發過程指南請參閱的「瀏覽生成式AI應用程式指南」，以獲得建置代理程式的指引。

範例筆記本

此獨立筆記本的設計旨在讓您以範例文件集快速開始使用 Mosaic AI 代理。它已準備好執行，不需要任何設定或數據。

馬賽克 AI 代理程序示範

拿筆記本

建立代理程式和工具

馬賽克 AI 代理程式架構支援許多不同的撰寫架構。此範例使用 LangGraph 來說明概念，但這不是 LangGraph 教學課程。

如需其他支持架構的範例，請參閱 ChatAgent 範例。

第一個步驟是建立代理程式。您必須指定 LLM 用戶端和工具清單。 databricks-langchain Python 套件包含與 LangChain 和 LangGraph 相容的用戶端，適用於在 Unity 目錄中註冊的 Databricks LLM 和工具。

端點必須是通過 AI 閘道函式調用的基礎模型 API 或外部模型。請參閱支援的模型。

from databricks_langchain import ChatDatabricks
llm = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct")

下列程式代碼會定義一個函式，該函式會從模型和某些工具建立代理程式，討論此代理程式代碼的內部內容不在本指南的範圍內。如需如何建置 LangGraph 代理程式的詳細資訊，請參閱 LangGraph 檔。

from typing import Optional, Sequence, Union

from langchain_core.language_models import LanguageModelLike
from langchain_core.runnables import RunnableConfig, RunnableLambda
from langchain_core.tools import BaseTool
from langgraph.graph import END, StateGraph
from langgraph.graph.graph import CompiledGraph
from langgraph.prebuilt.tool_executor import ToolExecutor
from mlflow.langchain.chat_agent_langgraph import ChatAgentState, ChatAgentToolNode


def create_tool_calling_agent(
  model: LanguageModelLike,
  tools: Union[ToolExecutor, Sequence[BaseTool]],
  agent_prompt: Optional[str] = None,
) -> CompiledGraph:
  model = model.bind_tools(tools)

  def routing_logic(state: ChatAgentState):
    last_message = state["messages"][-1]
    if last_message.get("tool_calls"):
      return "continue"
    else:
      return "end"

  if agent_prompt:
    system_message = {"role": "system", "content": agent_prompt}
    preprocessor = RunnableLambda(
      lambda state: [system_message] + state["messages"]
    )
  else:
    preprocessor = RunnableLambda(lambda state: state["messages"])
  model_runnable = preprocessor | model

  def call_model(
    state: ChatAgentState,
    config: RunnableConfig,
  ):
    response = model_runnable.invoke(state, config)

    return {"messages": [response]}

  workflow = StateGraph(ChatAgentState)

  workflow.add_node("agent", RunnableLambda(call_model))
  workflow.add_node("tools", ChatAgentToolNode(tools))

  workflow.set_entry_point("agent")
  workflow.add_conditional_edges(
    "agent",
    routing_logic,
    {
      "continue": "tools",
      "end": END,
    },
  )
  workflow.add_edge("tools", "agent")

  return workflow.compile()

定義代理程式工具

工具是建置代理程式的基本概念。它們提供整合 LLM 與人類定義程式代碼的能力。提供提示和工具清單時，呼叫工具的 LLM 會產生自變數來叫用此工具。如需有關工具及其與 Mosaic AI 代理程式搭配使用的詳細資訊，請參閱 AI 代理程式工具。

第一個步驟是根據 TF-IDF 建立關鍵詞擷取工具。此範例使用 scikit-learn 和 Unity 目錄工具。

databricks-langchain 套件提供使用 Unity 目錄工具的便利方式。下列程式代碼說明如何實作和註冊關鍵詞擷取器工具。

備註

Databricks 工作區具有內建工具，system.ai.python_exec，可用來擴充代理程式，讓您能夠在沙盒化執行環境中執行 Python 腳本。其他實用的內建工具包括外部連線和 AI 函式。

from databricks_langchain.uc_ai import (
  DatabricksFunctionClient,
  UCFunctionToolkit,
  set_uc_function_client,
)

uc_client = DatabricksFunctionClient()
set_uc_function_client(client)

# Change this to your catalog and schema
CATALOG = "main"
SCHEMA = "my_schema"


def tfidf_keywords(text: str) -> list[str]:
  """
  Extracts keywords from the provided text using TF-IDF.

  Args:
    text (string): Input text.
  Returns:
    list[str]: List of extracted keywords in ascending order of importance.
  """
  from sklearn.feature_extraction.text import TfidfVectorizer

  def keywords(text, top_n=5):
    vec = TfidfVectorizer(stop_words="english")
    tfidf = vec.fit_transform([text])  # Convert text to TF-IDF matrix
    indices = tfidf.toarray().argsort()[0, -top_n:]  # Get indices of top N words
    return [vec.get_feature_names_out()[i] for i in indices]

  return keywords(text)


# Create the function in the Unity Catalog catalog and schema specified
# When you use `.create_python_function`, the provided function’s metadata
# (docstring, parameters, return type) are used to create a tool in the specified catalog and schema.
function_info = uc_client.create_python_function(
  func=tfidf_keywords,
  catalog=CATALOG,
  schema=SCHEMA,
  replace=True,  # Set to True to overwrite if the function already exists
)

print(function_info)

以下是上述程式代碼的說明：

建立一個用戶端，利用 Databricks 工作區中的 Unity Catalog 作為註冊表，以便創建和發現工具。
定義執行關鍵詞擷取 TF-IDF Python 函式。
將 Python 函式註冊為 Unity Catalog 函式。

此工作流程可解決數個常見問題。您現在有工具的中央登錄，就像 Unity 目錄中的其他對象一樣，可以控管。例如，如果公司有標準方法來計算內部傳回率，您可以在 Unity 目錄中將其定義為函式，並使用 FinancialAnalyst 角色授與所有使用者或代理程式的存取權。

若要讓此工具可供 LangChain 代理程式使用，請使用 [UCFunctionToolkit]（/generative-ai/agent-framework/create-custom-tool.md] 來建立工具集合以提供給 LLM 以供選取：

# Use ".*" here to specify all the tools in the schema, or
# explicitly list functions by name
# uc_tool_names = [f"{CATALOG}.{SCHEMA}.*"]
uc_tool_names = [f"{CATALOG}.{SCHEMA}.tfidf_keywords"]
uc_toolkit = UCFunctionToolkit(function_names=uc_tool_names)

下列程式代碼示範如何測試工具：

uc_toolkit.tools[0].invoke({ "text": "The quick brown fox jumped over the lazy brown dog." })

下列程式代碼會建立使用關鍵詞擷取工具的代理程式。

import mlflow
mlflow.langchain.autolog()

agent = create_tool_calling_agent(llm, tools=[*uc_toolkit.tools])

agent.invoke({"messages": [{"role": "user", "content":"What are the keywords for the sentence: 'the quick brown fox jumped over the lazy brown dog'?"}]})

在生成的追蹤中，您可以看到 LLM 已選擇工具。

筆記本中顯示工具選擇的 MLflow 追蹤輸出。

使用追蹤來偵錯代理程式

MLflow 追蹤是一個功能強大的工具，可用於偵錯和觀察產生式 AI 應用程式，包括代理程式。它會透過範圍擷取詳細的執行資訊，其會封裝特定的程式代碼區段和記錄輸入、輸出和計時數據。

對於像 LangChain 這樣的流行庫，使用 mlflow.langchain.autolog()啟用自動追蹤。您也可以使用 mlflow.start_span() 來自訂追蹤。例如，您可以新增自定義數據值欄位或標籤以取得可檢視性。在該範圍內容中執行的程式代碼會與您所定義的欄位相關聯。在此記憶體內部 TF-IDF 範例中，為它指定名稱和範圍類型。

若要深入了解追蹤，請參閱代理程式的可視性使用 MLflow 追蹤。

下列範例會使用簡單的記憶體內 TF-IDF 索引來建立擷取器工具。它示範工具執行的自動記錄和自訂範圍追蹤，以取得額外的可觀察性：

from sklearn.feature_extraction.text import TfidfVectorizer
import mlflow
from langchain_core.tools import tool


documents = parsed_docs_df
doc_vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = doc_vectorizer.fit_transform(documents["content"])


@tool
def find_relevant_documents(query, top_n=5):
  """gets relevant documents for the query"""
  with mlflow.start_span(name="LittleIndex", span_type="RETRIEVER") as retriever_span:
    retriever_span.set_inputs({"query": query})
    retriever_span.set_attributes({"top_n": top_n})

    query_tfidf = doc_vectorizer.transform([query])
    similarities = (tfidf_matrix @ query_tfidf.T).toarray().flatten()
    ranked_docs = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)

    result = []
    for idx, score in ranked_docs[:top_n]:
      row = documents.iloc[idx]
      content = row["content"]
      doc_entry = {
        "page_content": content,
        "metadata": {
          "doc_uri": row["doc_uri"],
          "score": score,
        },
      }
      result.append(doc_entry)

    retriever_span.set_outputs(result)
    return result

此程式碼使用特殊範圍類型 RETRIEVER，這是保留給擷取工具。其他馬賽克 AI 代理程式功能（例如 AI 遊樂場、檢閱 UI 和評估）會使用 RETRIEVER 範圍類型來顯示擷取結果。

擷取器工具會要求您指定其架構，以確保與下游 Databricks 功能的相容性。如需 mlflow.models.set_retriever_schema的詳細資訊，請參閱指定自定義擷取器架構。

import mlflow
from mlflow.models import set_retriever_schema

uc_toolkit = UCFunctionToolkit(function_names=[f"{CATALOG}.{SCHEMA}.*"])

graph = create_tool_calling_agent(llm, tools=[*uc_toolkit.tools, find_relevant_documents])

mlflow.langchain.autolog()
set_retriever_schema(
  primary_key="chunk_id",
  text_column="chunk_text",
  doc_uri="doc_uri",
  other_columns=["title"],
)

graph.invoke(input = {"messages": [("user", "How do the docs say I use llm judges on databricks?")]})

顯示元數據的擷取結果。

定義代理程式

下一個步驟是評估代理程式並準備進行部署。概括而言，這牽涉到下列各項：

使用簽章定義代理程式的可預測 API。
新增模型組態，可讓您輕鬆地設定參數。
使用可重現環境的相依性來記錄模型，並可讓您設定其他服務的驗證。

MLflow ChatAgent 介面可簡化代理程式輸入和輸出的定義。若要使用它，請將代理程式定義為 ChatAgent的子類別，使用 predict 函式實作非串流推斷，並使用 predict_stream 函式進行串流推斷。

ChatAgent 與您選擇的代理程式撰寫架構無關，可讓您輕鬆地測試和使用不同的架構和代理程序實作-唯一的需求是實作 predict 和 predict_stream 介面。

使用 ChatAgent 撰寫代理程式提供許多優點，包括：

串流輸出支援
完整的工具通話訊息歷程記錄：傳回多個訊息，包括中繼工具通話訊息，以改善品質和交談管理。
多代理程式系統支援
Databricks 功能整合： 與 AI 遊樂場、代理程式評估及代理程式監視的現用相容性。
具類型的撰寫介面：使用具類型的 Python 類別撰寫代理程式碼，受益於 IDE 和筆記本自動完成。

如需有關撰寫 ChatAgent 的詳細資訊，請參閱。使用 ChatAgent 撰寫代理程式。

from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import (
  ChatAgentChunk,
  ChatAgentMessage,
  ChatAgentResponse,
  ChatContext,
)
from typing import Any, Optional


class DocsAgent(ChatAgent):
  def __init__(self, agent):
    self.agent = agent
    set_retriever_schema(
      primary_key="chunk_id",
      text_column="chunk_text",
      doc_uri="doc_uri",
      other_columns=["title"],
    )

  def predict(
    self,
    messages: list[ChatAgentMessage],
    context: Optional[ChatContext] = None,
    custom_inputs: Optional[dict[str, Any]] = None,
  ) -> ChatAgentResponse:
    # ChatAgent has a built-in helper method to help convert framework-specific messages, like langchain BaseMessage to a python dictionary
    request = {"messages": self._convert_messages_to_dict(messages)}

    output = agent.invoke(request)
    # Here 'output' is already a ChatAgentResponse, but to make the ChatAgent signature explicit for this demonstration, the code returns a new instance
    return ChatAgentResponse(**output)

下列程式代碼示範如何使用 ChatAgent。

AGENT = DocsAgent(agent=agent)
AGENT.predict(
  {
    "messages": [
      {"role": "user", "content": "What is DLT in Databricks?"},
    ]
  }
)

使用參數設定代理程式

Agent Framework 可讓您使用參數控制代理程序執行。這表示您可以快速測試不同的代理程式組態，例如切換 LLM 端點或嘗試不同的工具，而不需要變更基礎程序代碼。

下列程式代碼會建立設定字典，以在初始化模型時設定代理程序參數。

如需參數化代理程式的詳細資訊，請參閱參數化代理程式代碼，以便跨環境部署。

)

from mlflow.models import ModelConfig

baseline_config = {
  "endpoint_name": "databricks-meta-llama-3-1-70b-instruct",
  "temperature": 0.01,
  "max_tokens": 1000,
  "system_prompt": """You are a helpful assistant that answers questions about Databricks. Questions unrelated to Databricks are irrelevant.


  You answer questions using a set of tools. If needed, you ask the user follow-up questions to clarify their request.
  """,
  "tool_list": ["catalog.schema.*"],
}


class DocsAgent(ChatAgent):
  def __init__(self):
    self.config = ModelConfig(development_config=baseline_config)
    self.agent = self._build_agent_from_config()


def _build_agent_from_config(self):
  temperature = config.get("temperature", 0.01)
  max_tokens = config.get("max_tokens", 1000)
  system_prompt = config.get("system_prompt", """You are a helpful assistant.
    You answer questions using a set of tools. If needed you ask the user follow-up questions to clarify their request.""")
  llm_endpoint_name = config.get("endpoint_name", "databricks-meta-llama-3-3-70b-instruct")
  tool_list = config.get("tool_list", [])

  llm = ChatDatabricks(endpoint=llm_endpoint_name, temperature=temperature, max_tokens=max_tokens)
  toolkit = UCFunctionToolkit(function_names=tool_list)
  agent = create_tool_calling_agent(llm, tools=[*toolkit.tools, find_relevant_documents], prompt=system_prompt)

  return agent

記錄代理程式

定義代理程式之後，現在可以記錄它。在 MLflow 中，記錄代理程式表示儲存代理程式的組態（包括相依性），以便用於評估和部署。

備註

在筆記本中開發代理程式時，MLflow 會從筆記本環境推斷代理程式的相依性。

若要從筆記本記錄代理程式，您可以撰寫在單一數據格中定義模型的所有程式代碼，然後使用 %%writefile magic 命令，將代理程式的定義儲存至檔案：

%%writefile agent.py
...
<Code that defines the agent>

如果代理程式需要存取外部資源，例如 Unity 目錄來執行關鍵詞擷取工具，您必須設定代理程式的驗證，才能在部署資源時存取資源。

若要簡化 Databricks 資源的驗證，請啟用自動驗證傳遞：

from mlflow.models.resources import DatabricksFunction, DatabricksServingEndpoint


resources = [
  DatabricksServingEndpoint(endpoint_name=LLM_ENDPOINT_NAME),
  DatabricksFunction(function_name=tool.uc_function_name),
]


with mlflow.start_run():
  logged_agent_info = mlflow.pyfunc.log_model(
    artifact_path="agent",
    python_model="agent.py",
    pip_requirements=[
      "mlflow",
      "langchain",
      "langgraph",
      "databricks-langchain",
      "unitycatalog-langchain[databricks]",
      "pydantic",
    ],
    resources=resources,
  )

若要深入瞭解記錄代理程式，請參閱程式代碼型記錄。

評估代理程式

下一個步驟是評估代理程式，以查看其執行方式。代理程式評估具有挑戰性，並提出了許多問題，例如：

評估質量的正確計量為何？如何信任這些計量的輸出？
我需要評估許多想法，我要怎麼做呢？
- 快速進行評估，以便我的大部分時間不浪費在等候上。
- 快速比較這些不同版本的我的代理在品質、成本和延遲上的表現？
如何快速找出任何質量問題的根本原因？

身為數據科學家或開發人員，您可能不是實際的主題專家。本節後續部分說明可協助您定義良好輸出的代理評估工具。

建立評估集

若要定義代理程式的品質意義，您可以使用計量來測量評估集的代理程式效能。請參閱定義「品質」：評估集。

使用代理程式評估，您可以建立合成評估集並透過執行評估來量化品質。這個想法是從事實開始，比如一組文件，然後利用這些事實來生成一系列問題以進行“回溯”。您可以藉由提供一些指導方針來設定所產生的問題：

from databricks.agents.evals import generate_evals_df
import pandas as pd


databricks_docs_url = "https://raw.githubusercontent.com/databricks/genai-cookbook/refs/heads/main/quick_start_demo/chunked_databricks_docs_filtered.jsonl"
parsed_docs_df = pd.read_json(databricks_docs_url, lines=True)


agent_description = f"""
The agent is a RAG chatbot that answers questions about Databricks. Questions unrelated to Databricks are irrelevant.
"""
question_guidelines = f"""
# User personas
- A developer who is new to the Databricks platform
- An experienced, highly technical Data Scientist or Data Engineer


# Example questions
- what API lets me parallelize operations over rows of a delta table?
- Which cluster settings will give me the best performance when using Spark?


# Additional Guidelines
- Questions should be succinct, and human-like
"""


num_evals = 25
evals = generate_evals_df(
  docs=parsed_docs_df[
    :500
  ],  # Pass your docs. They should be in a Pandas or Spark DataFrame with columns `content STRING` and `doc_uri STRING`.
  num_evals=num_evals,  # How many synthetic evaluations to generate
  agent_description=agent_description,
  question_guidelines=question_guidelines,
)

生成的評估包括以下項目：

請求欄位，看起來像先前提到的 ChatAgentRequest：

{"messages":[{"content":"What command must be run at the start of your workload to explicitly target the Workspace Model Registry if your workspace default catalog is in Unity Catalog and you use Databricks Runtime 13.3 LTS or above?","role":"user"}]}

「預期擷取的內容」清單。擷取器架構是以 content 和 doc_uri 字段定義。

[{"content":"If your workspace’s [default catalog](https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html#view-the-current-default-catalog) is in Unity Catalog (rather than `hive_metastore`) and you are running a cluster using Databricks Runtime 13.3 LTS or above, models are automatically created in and loaded from the workspace default catalog, with no configuration required. To use the Workspace Model Registry in this case, you must explicitly target it by running `import mlflow; mlflow.set_registry_uri(\"databricks\")` at the start of your workload.","doc_uri":"https://docs.databricks.com/machine-learning/manage-model-lifecycle/workspace-model-registry.html"}]

預期事實清單。當您比較兩個回應時，很難找到它們之間的小差異。預期結果會將正確答案、部分正確答案與不正確的答案分開，來改善 AI 評委的品質，以及從事代理程式工作的人員體驗。
```
["The command must import the MLflow module.","The command must set the registry URI to \"databricks\"."]
```
此處的 source_id 欄位是 SYNTHETIC_FROM_DOC。當您建置更完整的評估集時，範例會來自各種不同來源，因此此字段會區分它們。

若要深入瞭解如何建立評估集，請參閱合成評估集。

使用 LLM 評委來評估代理

手動評估代理程式在這麼多生成範例中的效能不具可拓展性。大規模地，使用 LLM 作為評委是一個更合理的解決方案。若要使用使用代理程式評估時可用的內建評委，請使用下列程式代碼：

with mlflow.start_run(run_name="my_agent"):
  eval_results = mlflow.evaluate(
    data=evals,  # Your evaluation set
    model=model_info.model_uri,  # Logged agent from above
    model_type="databricks-agent",  # activate Mosaic AI Agent Evaluation
)

MLflow 實驗 - 評估結果。

簡單代理程序整體得分為 68%。您的結果可能會因為您使用的設定而有所不同。執行實驗來比較三個不同的 LLM 的成本和品質，就像變更設定和重新評估一樣簡單。

請考慮將模型組態變更為使用不同的 LLM、系統提示或溫度設定。

這些評委可以自定義為遵循人類專家將用來評估回應的相同指導方針。如需 LLM 評委的詳細資訊，請參閱內建 AI 評委。

使用代理評估，您可以使用自訂計量來定義測量特定代理品質的方式。您可以將評估視為整合測試，以及個別計量作為單元測試。下列範例會使用布爾值計量來檢查代理程式是否同時針對指定的要求使用關鍵詞擷取和擷取器：

from databricks.agents.evals import metric

@metric
def uses_keywords_and_retriever(request, trace):
  retriever_spans = trace.search_spans(span_type="RETRIEVER")
  keyword_tool_spans = trace.search_spans(name=f"{CATALOG}__{SCHEMA}__tfidf_keywords")
  return len(keyword_tool_spans) > 0 and len(retriever_spans) > 0


# same evaluate as above, with the addition of 'extra_metrics'
with mlflow.start_run(run_name="my_agent"):
  eval_results = mlflow.evaluate(
    data=evals,  # Your evaluation set
    model=model_info.model_uri,  # Logged agent from above
    model_type="databricks-agent",  # activate Mosaic AI Agent Evaluation,
    extra_metrics=[uses_keywords_and_retriever],
  )

請注意，代理程序永遠不會使用關鍵詞擷取。如何修正此問題？

顯示自定義計量輸出的評估結果。

部署及監視代理程式

當您準備好開始使用實際使用者測試代理程式時，Agent Framework 會提供生產就緒的解決方案，以在馬賽克 AI 模型服務上為代理程式提供服務。

將代理程式部署至模型服務提供下列優點：

模型服務會管理自動調整、記錄、版本控制及訪問控制，讓您專注於開發品質代理程式。
專家可以使用審核應用與代理員互動，並提供意見回饋，以便納入監控和評估。
您可以在即時流量上執行評估，以監視代理程式。雖然使用者流量不會包含地面事實，但 LLM 評委（以及您建立的自定義計量）會執行不受監督的評估。

下列程式代碼會將代理程式部署到服務端點。如需詳細資訊，請參閱部署代理以用於生成式 AI 應用。

from databricks import agents
import mlflow

# Connect to the Unity Catalog model registry
mlflow.set_registry_uri("databricks-uc")

# Configure UC model location
UC_MODEL_NAME = f"{CATALOG}.{SCHEMA}.getting_started_agent"
# REPLACE WITH UC CATALOG/SCHEMA THAT YOU HAVE `CREATE MODEL` permissions in

# Register to Unity Catalog
uc_registered_model_info = mlflow.register_model(
  model_uri=model_info.model_uri, name=UC_MODEL_NAME
)
# Deploy to enable the review app and create an API endpoint
deployment_info = agents.deploy(
  model_name=UC_MODEL_NAME, model_version=uc_registered_model_info.version
)

共用方式為