Manage model serving endpoints
This article describes how to manage model serving endpoints using the Serving UI and REST API. See Serving endpoints in the REST API reference.
To create model serving endpoints use one of the following:
Get the status of the model endpoint
In the Serving UI, you can check the status of an endpoint from the Serving endpoint state indicator at the top of your endpoint’s details page.
Check the status and details of an endpoint programmatically using the REST API or the MLflow Deployments SDK:
REST API
GET /api/2.0/serving-endpoints/{name}
The following example creates an endpoint that serves the first version of the my-ads-model
model that is registered in the Unity Catalog model registry. You must provide the full model name including parent catalog and schema such as, catalog.schema.example-model
.
In the following example response, the state.ready
field is “READY”, which means the endpoint is ready to receive traffic. The state.update_state
field is NOT_UPDATING
and pending_config
is no longer returned because the update was finished successfully.
{
"name": "unity-model-endpoint",
"creator": "customer@example.com",
"creation_timestamp": 1666829055000,
"last_updated_timestamp": 1666829055000,
"state": {
"ready": "READY",
"update_state": "NOT_UPDATING"
},
"config": {
"served_entities": [
{
"name": "my-ads-model",
"entity_name": "myCatalog.mySchema.my-ads-model",
"entity_version": "1",
"workload_size": "Small",
"scale_to_zero_enabled": false,
"state": {
"deployment": "DEPLOYMENT_READY",
"deployment_state_message": ""
},
"creator": "customer@example.com",
"creation_timestamp": 1666829055000
}
],
"traffic_config": {
"routes": [
{
"served_model_name": "my-ads-model",
"traffic_percentage": 100
}
]
},
"config_version": 1
},
"id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"permission_level": "CAN_MANAGE"
}
MLflow Deployments SDK
from mlflow.deployments import get_deploy_client
client = get_deploy_client("databricks")
endpoint = client.get_endpoint(endpoint="chat")
assert endpoint == {
"name": "chat",
"creator": "alice@company.com",
"creation_timestamp": 0,
"last_updated_timestamp": 0,
"state": {...},
"config": {...},
"tags": [...],
"id": "88fd3f75a0d24b0380ddc40484d7a31b",
}
Stop a model serving endpoint
You can temporarily stop a model serving endpoint and start it later. When an endpoint is stopped, the resources provisioned for it are shut down, and the endpoint is not able to serve queries until it is started again. Only endpoints that serve custom models, are not route-optimized, and have no in-progress updates can be stopped. Stopped endpoints do not count against the resource quota. Queries sent to a stopped endpoint return a 400 error.
You can stop an endpoint from the endpoint’s details page in the Serving UI.
- Click the endpoint you want to stop.
- Click Stop in the upper-right corner.
Alternatively, you can stop a serving endpoint programmatically using the REST API as follows:
POST /api/2.0/serving-endpoints/{name}/config:stop
When you are ready to start a stopped model serving endpoint, you can do so from the endpoint’s details page in the Serving UI.
- Click the endpoint you want to start.
- Click Start in the upper-right corner.
Alternatively, you can start a stopped serving endpoint programmatically using the REST API as follows:
POST /api/2.0/serving-endpoints/{name}/config:start
Delete a model serving endpoint
To disable serving for a model, you can delete the endpoint it’s served on.
You can delete an endpoint from the endpoint’s details page in the Serving UI.
- Click Serving on the sidebar.
- Click the endpoint you want to delete.
- Click the kebab menu at the top and select Delete.
Alternatively, you can delete a serving endpoint programmatically using the REST API or the MLflow Deployments SDK
REST API
DELETE /api/2.0/serving-endpoints/{name}
MLflow Deployments SDK
from mlflow.deployments import get_deploy_client
client = get_deploy_client("databricks")
client.delete_endpoint(endpoint="chat")
Debug your model serving endpoint
To debug any issues with the endpoint, you can fetch:
- Model server container build logs
- Model server logs
These logs are also accessible from the Endpoints UI in the Logs tab.
For the build logs for a served model you can use the following request. See Debugging guide for Model Serving for more information.
GET /api/2.0/serving-endpoints/{name}/served-models/{served-model-name}/build-logs
{
“config_version”: 1 // optional
}
For the model server logs for a serve model, you can use the following request:
GET /api/2.0/serving-endpoints/{name}/served-models/{served-model-name}/logs
{
“config_version”: 1 // optional
}
Manage permissions on your model serving endpoint
You must have at least the CAN MANAGE permission on a serving endpoint to modify permissions. For more information on the permission levels, see Serving endpoint ACLs.
Get the list of permissions on the serving endpoint.
databricks permissions get servingendpoints <endpoint-id>
Grant user jsmith@example.com
the CAN QUERY permission on the serving endpoint.
databricks permissions update servingendpoints <endpoint-id> --json '{
"access_control_list": [
{
"user_name": "jsmith@example.com",
"permission_level": "CAN_QUERY"
}
]
}'
You can also modify serving endpoint permissions using the Permissions API.
Add a budget policy for a model serving endpoint
Important
This feature is in Public Preview and is not available for serving endpoints that serve External models or Foundation Model APIs pay-per-token workloads.
Budget policies allow your organization to apply custom tags on serverless usage for granular billing attribution. If your workspace uses budget policies to attribute serverless usage, you can add a budget policy to your model serving endpoints. See Attribute serverless usage with budget policies.
During model serving endpoint creation, you can select your endpoint’s budget policy from the Budget policy menu in the Serving UI. If you have a budget policy assigned to you, all endpoints that you create are assigned that budget policy, even if you do not select a policy from the Budget policy menu.
If you have MANAGE
permissions for an existing endpoint, you can edit and add a budget policy to that endpoint from the Endpoint details page in the UI.
Note
If you’ve been assigned a budget policy, your existing endpoints are not automatically tagged with your policy. You must manually update existing endpoints if you want to attach a budget policy to them.
Get a model serving endpoint schema
Important
Support for serving endpoint query schemas is in Public Preview. This functionality is available in Model Serving regions.
A serving endpoint query schema is a formal description of the serving endpoint using the standard OpenAPI specification in JSON format. It contains information about the endpoint including the endpoint path, details for querying the endpoint like the request and response body format, and data type for each field. This information can be helpful for reproducibility scenarios or when you need information about the endpoint, but you are not the original endpoint creator or owner.
To get the model serving endpoint schema, the served model must have a model signature logged and the endpoint must be in a READY
state.
The following examples demonstrate how to programmatically get the model serving endpoint schema using the REST API. For feature serving endpoint schemas, see What is Databricks Feature Serving?.
The schema returned by the API is in the format of a JSON object that follows the OpenAPI specification.
ACCESS_TOKEN="<endpoint-token>"
ENDPOINT_NAME="<endpoint name>"
curl "https://example.databricks.com/api/2.0/serving-endpoints/$ENDPOINT_NAME/openapi" -H "Authorization: Bearer $ACCESS_TOKEN" -H "Content-Type: application/json"
Schema response details
The response is an OpenAPI specification in JSON format, typically including fields like openapi
, info
, servers
and paths
. Since the schema response is a JSON object, you can parse it using common programming languages, and generate client code from the specification using third-party tools.
You can also visualize the OpenAPI specification using third-party tools like Swagger Editor.
The main fields of the response include:
- The
info.title
field shows the name of the serving endpoint. - The
servers
field always contains one object, typically theurl
field which is the base url of the endpoint. - The
paths
object in the response contains all supported paths for an endpoint. The keys in the object are the path URL. Eachpath
can support multiple formats of inputs. These inputs are listed in theoneOf
field.
The following is an example endpoint schema response:
{
"openapi": "3.1.0",
"info": {
"title": "example-endpoint",
"version": "2"
},
"servers": [{ "url": "https://example.databricks.com/serving-endpoints/example-endpoint"}],
"paths": {
"/served-models/vanilla_simple_model-2/invocations": {
"post": {
"requestBody": {
"content": {
"application/json": {
"schema": {
"oneOf": [
{
"type": "object",
"properties": {
"dataframe_split": {
"type": "object",
"properties": {
"columns": {
"description": "required fields: int_col",
"type": "array",
"items": {
"type": "string",
"enum": [
"int_col",
"float_col",
"string_col"
]
}
},
"data": {
"type": "array",
"items": {
"type": "array",
"prefixItems": [
{
"type": "integer",
"format": "int64"
},
{
"type": "number",
"format": "double"
},
{
"type": "string"
}
]
}
}
}
},
"params": {
"type": "object",
"properties": {
"sentiment": {
"type": "number",
"format": "double",
"default": "0.5"
}
}
}
},
"examples": [
{
"columns": [
"int_col",
"float_col",
"string_col"
],
"data": [
[
3,
10.4,
"abc"
],
[
2,
20.4,
"xyz"
]
]
}
]
},
{
"type": "object",
"properties": {
"dataframe_records": {
"type": "array",
"items": {
"required": [
"int_col",
"float_col",
"string_col"
],
"type": "object",
"properties": {
"int_col": {
"type": "integer",
"format": "int64"
},
"float_col": {
"type": "number",
"format": "double"
},
"string_col": {
"type": "string"
},
"becx_col": {
"type": "object",
"format": "unknown"
}
}
}
},
"params": {
"type": "object",
"properties": {
"sentiment": {
"type": "number",
"format": "double",
"default": "0.5"
}
}
}
}
}
]
}
}
}
},
"responses": {
"200": {
"description": "Successful operation",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"predictions": {
"type": "array",
"items": {
"type": "number",
"format": "double"
}
}
}
}
}
}
}
}
}
}
}
}