Debugging guide for Model Serving

Article
11/25/2024

This article demonstrates debugging steps for common issues that users might encounter when working with model serving endpoints. Common issues could include errors users encounter when the endpoint fails to initialize or start, build failures related to the container, or problems during the operation or running of the model on the endpoint.

Access and review logs

Databricks recommends reviewing build logs for debugging and troubleshooting errors in your model serving workloads. See Monitor model quality and endpoint health for information about logs and how to view them.

Check the event logs for the model in the workspace UI and check for a successful container build message. If you do not see a build message after an hour, reach out to Databricks support for assistance.

If your build is successful, but you encounter other errors see Debugging after container build succeeds. If your build fails, see Debugging after container build failure.

Installed library package versions

In your build logs you can confirm the package versions that are installed.

For MLflow versions, if you do not have a version specified, Model Serving uses the latest version.
For custom GPU serving, Model Serving installs the recommended versions of cuda and cuDNN according to public PyTorch and Tensorflow documentation.

Before model deployment validation checks

Databricks recommends applying the guidance in this section before you serve your model. The following parameters can catch issues early before waiting for the endpoint. See Validate the model input before deployment to validate your model input before deploying your model.

Test predictions before deployment

Before deploying your model to the serving endpoint, test offline predictions with a virtual environment using mlflow.models.predict and input examples. See MLflow documentation for testing predictions for more detailed guidance.


input_example = {
                  "messages":
                  [
                    {"content": "How many categories of products do we have? Name them.", "role": "user"}
                  ]
                }

mlflow.models.predict(
   model_uri = logged_chain_info.model_uri,
   input_data = input_example,
)

Validate the model input before deployment

Model serving endpoints expect a special format of json input to validate that your model input works on a serving endpoint before deployment. You can use validate_serving_input in MLflow to do such validation.

The following is an example of the auto-generated code in the run’s artifacts tab if your model is logged with a valid input example.

from mlflow.models import validate_serving_input

model_uri = 'runs:/<run_id>/<artifact_path>'

serving_payload = """{
 "messages": [
   {
     "content": "How many product categories are there?",
     "role": "user"
   }
 ]
}
"""

# Validate the serving payload works on the model
validate_serving_input(model_uri, serving_payload)

You can also test any input examples against the logged model by using convert_input_example_to_serving_input API to generate a valid json serving input.

from mlflow.models import validate_serving_input
from mlflow.models import convert_input_example_to_serving_input

model_uri = 'runs:/<run_id>/<artifact_path>'

# Define INPUT_EXAMPLE with your own input example to the model
# A valid input example is a data instance suitable for pyfunc prediction

serving_payload = convert_input_example_to_serving_input(INPUT_EXAMPLE)

# Validate the serving payload works on the model
validate_serving_input(model_uri, serving_payload)

Debugging after container build succeeds

Even if the container builds successfully, there might be issues when you run the model or during the operation of the endpoint itself. The following subsections detail common issues and how to troubleshoot and debug

Missing dependency

You might get an error like An error occurred while loading the model. No module named <module-name>.. This error might indicate that a dependency is missing from the container. Verify that you properly denoted all the dependencies that should be included in the build of the container. Pay special attention to custom libraries and ensure that the .whl files are included as artifacts.

Service logs looping

If your container build fails, check the service logs to see if you notice them looping when the endpoint tries to load the model. If you see this behavior try the following steps:

Open a notebook and attach to an All-Purpose cluster that uses a Databricks Runtime version, not Databricks Runtime for Machine Learning.
Load the model using MLflow and try debugging from there.

You can also load the model locally on your PC and debug from there. Load your model locally using the following:

import os
import mlflow

os.environ["MLFLOW_TRACKING_URI"] = "databricks://PROFILE"

ARTIFACT_URI = "model_uri"
if '.' in ARTIFACT_URI:
    mlflow.set_registry_uri('databricks-uc')
local_path = mlflow.artifacts.download_artifacts(ARTIFACT_URI)
print(local_path)

conda env create -f local_path/artifact_path/conda.yaml
conda activate mlflow-env

mlflow.pyfunc.load_model(local_path/artifact_path)

Model fails when requests are sent to the endpoint

You might receive an error like Encountered an unexpected error while evaluating the model. Verify that the input is compatible with the model for inference. when predict() is called on your model.

There is a code issue in the predict() function. Databricks recommends that you load the model from MLflow in a notebook and call it. Doing so highlights the issues in the predict() function, and you can see where the failure is happening within the method.

Workspace exceeds provisioned concurrency

You might receive a Workspace exceeded provisioned concurrency quota error.

You can increase concurrency depending on region availability. Reach out to your Databricks account team and provide your workspace ID to request a concurrency increase.

Debugging after container build failure

This section details issues that might occur when your build fails.

`OSError: [Errno 28] No space left on device`

The No space left error can be due to too many large artifacts being logged alongside the model unnecessarily. Check in MLflow that extraneous artifacts are not logged alongside the model and try to redeploy the slimmed down package.

Azure Firewall issues with serving models from Unity Catalog

You might see the error: Build could not start due to an internal error. If you are serving a model from UC and Azure Firewall is enabled, this is not supported by default..

Reach out to your Databricks account team to help resolve.

Build failure due to lack of GPU availability

You might see an the error: Build could not start due to an internal error - please contact your Databricks representative..