Track and export serving endpoint health metrics to Prometheus and Datadog
This article provides an overview of serving endpoint health metrics and shows how to use the metrics export API to export endpoint metrics to Prometheus and Datadog.
Endpoint health metrics measures infrastructure and metrics such as latency, request rate, error rate, CPU usage, memory usage, etc. This tells you how your serving infrastructure is behaving.
Requirements
Read access to the desired endpoint and personal access token (PAT) which can be generated in Settings in the Databricks Mosaic AI UI to access the endpoint.
An existing model serving endpoint. You can validate this by checking the endpoint health with the following:
curl -n -X GET -H "Authorization: Bearer [PAT]" https://[DATABRICKS_HOST]/api/2.0/serving-endpoints/[ENDPOINT_NAME]
Validate the export metrics API:
curl -n -X GET -H "Authorization: Bearer [PAT]" https://[DATABRICKS_HOST]/api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics
Serving endpoint metrics definitions
Metric | Description |
---|---|
Latency (ms) | Captures the median (P50) and 99th percentile (P99) round-trip latency times within Azure Databricks. This does not include additional Databricks-related latencies like authentication and rate limiting |
Request rate (per second) | Measures the number of requests processed per second. This rate is calculated by totaling the number of requests within a minute and then dividing by 60 (the number of seconds in a minute). |
Request error rate (per second) | Tracks the rate of 4xx and 5xx HTTP error responses per second. Similar to the request rate, it’s computed by aggregating the total number of unsuccessful requests within a minute and dividing by 60. |
CPU usage (%) | Shows the average CPU utilization percentage across all server replicas. In the context of Databricks infrastructure, a replica refers to virtual machine nodes. Depending on your configured concurrency settings, Databricks creates multiple replicas to manage model traffic efficiently. |
Memory usage (%) | Shows the average memory utilization percentage across all server replicas. |
Provisioned concurrency | Provisioned concurrency is the maximum number of parallel requests that the system can handle. Provisioned concurrency dynamically adjusts within the minimum and maximum limits of the compute scale-out range, varying in response to incoming traffic. |
GPU usage (%) | Represents the average GPU utilization, as reported by the NVIDIA DCGM exporter. If the instance type has multiple GPUs, each is tracked separately (such as, gpu0 , gpu1 , …, gpuN ). The utilization is averaged across all server replicas and sampled once a minute. Note: The infrequent sampling means this metric is most accurate under a constant load. |
GPU memory usage (%) | Indicates the average percentage of utilized frame buffer memory on each GPU based on NVIDIA DCGM exporter data. As with GPU usage, this metric is averaged across replicas and sampled every minute. It’s most reliable under consistent load conditions. |
Prometheus integration
Note
Regardless of which type of deployment you have in your production environment, the scraping configuration should be similar.
The guidance in this section follows the Prometheus documentation to start a Prometheus service locally using docker.
Write a
yaml
config file and name itprometheus.yml
. The following is an example:global: scrape_interval: 1m scrape_timeout: 10s scrape_configs: - job_name: "prometheus" metrics_path: "/api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics" scheme: "https" authorization: type: "Bearer" credentials: "[PAT_TOKEN]" static_configs: - targets: ["dbc-741cfa95-12d1.dev.databricks.com"]
Start Prometheus locally with the following command:
docker run \ -p 9090:9090 \ -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus
Navigate to
http://localhost:9090
to check if your local Prometheus service is up and running.Check the Prometheus scraper status and debug errors from:
http://localhost:9090/targets?search=
Once the target is fully up and running, you can query the provided metrics, like
cpu_usage_percentage
ormem_usage_percentage
, in the UI.
Datadog integration
Note
The preliminary set up for this example is based on the free edition.
Datadog has a variety of agents that can be deployed in different environments. For demonstration purposes, the following launches a Mac OS agent locally that scrapes the metrics endpoint in your Databricks host. The configuration for using other agents should be in a similar pattern.
Register a datadog account.
Install OpenMetrics integration in your account dashboard, so Datadog can accept and process OpenMetrics data.
Follow the Datadog documentation to get your Datadog agent up and running. For this example, use the DMG package option to have everything installed including
launchctl
anddatadog-agent
.Locate your OpenMetrics configuration. For this example, the configuration is at
~/.datadog-agent/conf.d/openmetrics.d/conf.yaml.default
. The following is an example configurationyaml
file.instances: - openmetrics_endpoint: https://[DATABRICKS_HOST]/api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics metrics: - cpu_usage_percentage: name: cpu_usage_percentage type: gauge - mem_usage_percentage: name: mem_usage_percentage type: gauge - provisioned_concurrent_requests_total: name: provisioned_concurrent_requests_total type: gauge - request_4xx_count_total: name: request_4xx_count_total type: gauge - request_5xx_count_total: name: request_5xx_count_total type: gauge - request_count_total: name: request_count_total type: gauge - request_latency_ms: name: request_latency_ms type: histogram tag_by_endpoint: false send_distribution_buckets: true headers: Authorization: Bearer [PAT] Content-Type: application/openmetrics-text
Start datadog agent using
launchctl start com.datadoghq.agent
.Every time you need to make changes to your config, you need to restart the agent to pick up the change.
launchctl stop com.datadoghq.agent launchctl start com.datadoghq.agent
Check the agent health with
datadog-agent health
.Check agent status with
datadog-agent status
. You should be able to see a response like the following. If not, debug with the error message. Potential issues may be due to an expired PAT token, or an incorrect URL.openmetrics (2.2.2) ------------------- Instance ID: openmetrics: xxxxxxxxxxxxxxxx [OK] Configuration Source: file:/opt/datadog-agent/etc/conf.d/openmetrics.d/conf.yaml.default Total Runs: 1 Metric Samples: Last Run: 2, Total: 2 Events: Last Run: 0, Total: 0 Service Checks: Last Run: 1, Total: 1 Average Execution Time : 274ms Last Execution Date : 2022-09-21 23:00:41 PDT / 2022-09-22 06:00:41 UTC (xxxxxxxx) Last Successful Execution Date : 2022-09-21 23:00:41 PDT / 2022-09-22 06:00:41 UTC (xxxxxxx)
Agent status can also be seen from the UI at:http://127.0.0.1:5002/.
If your agent is fully up and running, you can navigate back to your Datadog dashboard to query the metrics. You can also create a monitor or alert based on the metric data:https://app.datadoghq.com/monitors/create/metric.