Batch inference using Foundation Model APIs provisioned throughput

This article provides an example notebook that performs batch inference on a provisioned throughput endpoint using Foundation Model APIs and ai_query.

Requirements

  • A workspace in a Foundation Model APIs supported region.
  • One of the following:
    • All-purpose compute with compute size i3.2xlarge or larger running Databricks Runtime 15.4 ML LTS or above with at least 2 workers.
    • SQL warehouse medium and larger.

Run batch inference

Generally, setting up batch inference involves 2 steps:

  1. Creating the endpoint to be used for batch inference.
  2. Constructing the batch requests and sending those requests to the batch inference endpoint using ai_query.

The example notebook covers these steps and demonstrates batch inference using the Meta Llama 3.1 70B model.

Batch inference with a provisioned throughput endpoint notebook

Get notebook

Additional resources