Deploy models for batch inference and prediction

Article
02/11/2025

This article describes what Databricks recommends for batch inference.

For real-time model serving on Azure Databricks, see Deploy models using Mosaic AI Model Serving.

Use ai_query for batch inference

Important

Databricks recommends using ai_query with Model Serving for batch inference. ai_query is a built-in Databricks SQL function that allows you to query existing model serving endpoints using SQL. It has been verified to reliably and consistently process datasets in the range of billions of tokens. See ai_query function for more detail about this AI function.

For quick experimentation, ai_query can be used for batch LLM inference with pay-per-token endpoints, which are pre-configured on your workspace.

When you are ready to run batch LLM inference on large or production data, Databricks recommends using provisioned throughput endpoints for faster performance.

See Provisioned throughput Foundation Model APIs to create a provisioned throughput endpoint.
See Perform batch LLM inference using ai_query.

For a traditional ML model batch inference example, see the following notebook:

Batch inference using BERT for named entity recognition notebook

Get notebook

Share via

Deploy models for batch inference and prediction

Use ai_query for batch inference

Batch inference using BERT for named entity recognition notebook

Feedback

Additional resources