Batch inference using Foundation Model APIs provisioned throughput
This article provides an example notebook that performs batch inference on a provisioned throughput endpoint using Foundation Model APIs and ai_query.
Requirements
- A workspace in a Foundation Model APIs supported region.
- One of the following:
- All-purpose compute with compute size
i3.2xlarge
or larger running Databricks Runtime 15.4 ML LTS or above with at least 2 workers. - SQL warehouse medium and larger.
- All-purpose compute with compute size
Run batch inference
Generally, setting up batch inference involves 2 steps:
- Creating the endpoint to be used for batch inference.
- Constructing the batch requests and sending those requests to the batch inference endpoint using
ai_query
.
The example notebook covers these steps and demonstrates batch inference using the Meta Llama 3.1 70B model.