Get started with MLflow experiments
This article gives an overview of how to use MLflow in Azure Databricks to automatically log training runs and track parameters, metrics, and models. For more details about using MLflow to track model development, see Track ML and deep learning training runs.
MLflow is an open source platform for managing the end-to-end machine learning lifecycle. MLflow provides simple APIs for logging metrics (for example, model loss), parameters (for example, learning rate), and fitted models, making it easy to analyze training results or deploy models later on.
Install MLflow
If you’re using Databricks Runtime for Machine Learning, MLflow is already installed. Otherwise, install the MLflow package from PyPI.
Automatically log training runs to MLflow
With Databricks Runtime 10.4 LTS ML and above, Databricks Autologging is enabled by default and automatically captures model parameters, metrics, files, and lineage information when you train models from a variety of popular machine learning libraries.
With Databricks Runtime 9.1 LTS ML, MLflow provides mlflow.<framework>.autolog()
APIs to automatically log training code written in many ML frameworks. You can call this API before running training code to log model-specific metrics, parameters, and model artifacts.
TensorFlow
Note
Keras models are also supported in mlflow.tensorflow.autolog()
.
# Also autoinstruments tf.keras
import mlflow.tensorflow
mlflow.tensorflow.autolog()
XGBoost
import mlflow.xgboost
mlflow.xgboost.autolog()
LightGBM
import mlflow.lightgbm
mlflow.lightgbm.autolog()
scikit-learn
import mlflow.sklearn
mlflow.sklearn.autolog()
PySpark
If performing tuning with pyspark.ml
, metrics and models are automatically logged to MLflow.
See Apache Spark MLlib and automated MLflow tracking.
View results
After executing your machine learning code, you can view results using the Experiment Runs sidebar. See View notebook experiment for instructions on how to view the experiment, run, and notebook revision used in the quickstart.
Track additional metrics, parameters, and models
You can log additional information by directly invoking the MLflow Tracking logging APIs.
Numerical metrics
import mlflow
mlflow.log_metric("accuracy", 0.9)
Training parameters
import mlflow
mlflow.log_param("learning_rate", 0.001)
Models
scikit-learn
import mlflow.sklearn
mlflow.sklearn.log_model(model, "myModel")
PySpark
import mlflow.spark
mlflow.spark.log_model(model, "myModel")
XGBoost
import mlflow.xgboost
mlflow.xgboost.log_model(model, "myModel")
TensorFlow
import mlflow.tensorflow
mlflow.tensorflow.log_model(model, "myModel")
Keras
import mlflow.keras
mlflow.keras.log_model(model, "myModel")
PyTorch
import mlflow.pytorch
mlflow.pytorch.log_model(model, "myModel")
SpaCy
import mlflow.spacy
mlflow.spacy.log_model(model, "myModel")
Other artifacts (files)
import mlflow
mlflow.log_artifact("/tmp/my-file", "myArtifactPath")
Example notebooks
Note
With Databricks Runtime 10.4 LTS ML and above, Databricks Autologging is enabled by default, and the code in these example notebooks is not required. The example notebooks in this section are designed for use with Databricks Runtime 9.1 LTS ML.
The recommended way to get started using MLflow tracking with Python is to use the MLflow autolog()
API. With MLflow’s autologging capabilities, a single line of code automatically logs the resulting model, the parameters used to create the model, and a model score. The following notebook shows you how to set up a run using autologging.
MLflow autologging quickstart Python notebook
If you need more control over the metrics logged for each training run, or want to log additional artifacts such as tables or plots, you can use the MLflow logging API functions demonstrated in the following notebook.