MLflow for gen AI agent and ML model lifecycle

This article describes how MLflow on Databricks is used to develop high-quality generative AI agents and machine learning models.

Note

If you’re just getting started with Azure Databricks, consider trying MLflow on Databricks Community Edition.

What is MLflow?

MLflow is an open source platform for developing models and generative AI applications. It has the following primary components:

  • Tracking: Allows you to track experiments to record and compare parameters and results.
  • Models: Allow you to manage and deploy models from various ML libraries to various model serving and inference platforms.
  • Model Registry: Allows you to manage the model deployment process from staging to production, with model versioning and annotation capabilities.
  • AI agent evaluation and tracing: Allows you to develop high-quality AI agents by helping you compare, evaluate, and troubleshoot agents.

MLflow supports Java, Python, R, and REST APIs.

Databricks-managed MLflow

Databricks provides a fully managed and hosted version of MLflow, building on the open source experience to make it more robust and scalable for enterprise use.

The following diagram shows how Databricks integrates with MLflow to train and deploy machine learning models.

MLflow integrates with Databricks to manage the ML lifecycle.

Databricks-managed MLflow is built on Unity Catalog and the Cloud Data Lake to unify all your data and AI assets in the ML lifecycle:

  1. Feature store: Databricks automated feature lookups simplifies integration and reduces mistakes.
  2. Train models: Use Mosaic AI to train models or fine-tune foundation models.
  3. Tracking: MLflow tracks training by logging parameters, metrics, and artifacts to evaluate and compare model performance.
  4. Model Registry: MLflow Model Registry, integrated with Unity Catalog centralizes AI models and artifacts.
  5. Model Serving: Mosaic AI Model Serving deploys models to a REST API endpoint.
  6. Monitoring: Mosaic AI Model Serving automatically captures requests and responses to monitor and debug models. MLflow augments this data with trace data for each request.

Model training

MLflow Models are at the core of AI and ML development on Databricks. MLflow Models are a standardized format for packaging machine learning models and generative AI agents. The standardized format ensures that models and agents can be used by downstream tools and workflows on Databricks.

  • MLflow documentation - Models.

Databricks provides features to help you train different kinds of ML models.

Experiment tracking

Databricks uses MLflow experiments as organizational units to track your work while developing models.

Experiment tracking lets you log and manage parameters, metrics, artifacts, and code versions during machine learning training and agent development. Organizing logs into experiments and runs allows you to compare models, analyze performance, and iterate more easily.

Model Registry with Unity Catalog

MLflow Model Registry is a centralized model repository, UI, and set of APIs for managing the model deployment process.

Databricks integrates Model Registry with Unity Catalog to provide centralized governance for models. Unity Catalog integration allows you to access models across workspaces, track model lineage, and discover models for reuse.

Model Serving

Databricks Model Serving is tightly integrated with MLflow Model Registry and provides a unified, scalable interface for deploying, governing, and querying AI models. Each model you serve is available as a REST API that you can integrate into web or client applications.

While they are distinct components, Model Serving heavily relies on MLflow Model Registry to handle model versioning, dependency management, validation, and governance.

AI agent development and evaluation

For AI agent development, Databricks integrates with MLflow similarly to ML model development. However, there are a few key differences:

  • To create AI agents on Databricks, use Mosaic AI Agent Framework, which relies on MLflow to track agent code, performance metrics, and agent traces.
  • To evaluate agents on Databricks, use Mosaic AI Agent Evaluation, which relies on MLflow to track evaluation results.
  • MLflow tracking for agents also includes MLflow Tracing. MLflow Tracing allows you to see detailed information about the execution of your agent’s services. Tracing records the inputs, outputs, and metadata associated with each intermediate step of a request, letting you quickly find the source of unexpected behavior in agents.

The following diagram shows how Databricks integrates with MLflow to create and deploy AI agents.

MLflow integrates with Databricks to manage the genAI app lifecycle.

Databricks-managed MLflow is built on Unity Catalog and the Cloud Data Lake to unify all your data and AI assets in the genAI app lifecycle:

  1. Vector & feature store: Databricks automated vector and feature lookups simplify integration and reduce mistakes.
  2. Create and evaluate AI agents: Mosaic AI Agent Framework and Agent Evaluation help you create agents and evaluate their output.
  3. Tracking & tracing: MLflow tracing captures detailed agent execution information for enhanced genAI observability.
  4. Model Registry: MLflow Model Registry, integrated with Unity Catalog centralizes AI models and artifacts.
  5. Model Serving: Mosaic AI Model Serving deploys models to a REST API endpoint.
  6. Monitoring: MLflow automatically captures requests and responses to monitor and debug models.

Open source vs. Databricks-managed MLflow features

For general MLflow concepts, APIs, and features shared between open source and Databricks-managed versions, refer to MLflow documentation. For features exclusive to Databricks-managed MLflow, see Databricks documentation.

The following table highlights the key differences between open source MLflow and Databricks-managed MLflow and provides documentation links to help you learn more:

Feature Availability on open source MLflow Availability on Databricks-managed MLflow
Security User must provide their own security governance layer Databricks enterprise-grade security
Disaster recovery Unavailable Databricks disaster recovery
Experiment tracking MLflow Tracking API MLflow Tracking API integrated with Databricks advanced experiment tracking
Model Registry MLflow Model Registry MLflow Model Registry integrated with Databricks Unity Catalog
Unity Catalog integration Open source integration with Unity Catalog Databricks Unity Catalog
Model deployment User-configured integrations with external serving solutions (SageMaker, Kubernetes, container services, and so on) Databricks Model Serving and external serving solutions
AI agents MLflow LLM development MLflow LLM development integrated with Mosaic AI Agent Framework and Agent Evaluation
Encryption Unavailable Encryption using customer-managed keys