Mosaic AI Gateway
Important
This feature is in Public Preview.
What is Mosaic AI Gateway?
Mosaic AI Gateway is designed to streamline the usage and management of generative AI models within an organization. It is a centralized service that brings governance, monitoring, and production readiness to model serving endpoints. It also allows you to run, secure, and govern AI traffic to democratize and accelerate AI adoption for your organization.
All data is logged into Delta tables in Unity Catalog.
To start visualizing insights from your AI Gateway data, download the example AI Gateway dashboard from GitHub. This dashboard leverages the data from the usage tracking and payload logging inference tables.
After you download the JSON file, import the dashboard into your workspace. For instructions on importing dashboards, see Import a dashboard file.
Supported features
The following table defines the available AI Gateway features and which model serving endpoint types support them.
Feature | Definition | External model endpoint | Foundation Model APIs provisioned throughput endpoint |
---|---|---|---|
Permission and rate limiting | Control who has access and how much access. | ✓ | ✓ |
Payload logging | Monitor and audit data being sent to model APIs using inference tables. | ✓ | ✓ |
Usage tracking | Monitor operational usage on endpoints and associated costs using system tables. | ✓ | ✓ |
AI Guardrails | Prevent unwanted data and unsafe data in requests and responses. See AI Guardrails. | ✓ | ✓ |
Traffic routing | Minimize production outages during and after deployment. | ✓ | ✓ |
Mosaic AI Gateway incurs charges on an enabled feature basis. During preview these paid features include AI Guardrails, payload logging and usage tracking. Features such as query permissions, rate limiting, and traffic routing are free of charge. Any new features are subject to charge.
The following table reflects the Databricks units (DBUs) per million (M) tokens rate for the paid AI Gateway features. Charges are listed under the Serverless Real-time Inference
SKU.
Feature | DBU rate |
---|---|
AI Guardrails | 21.429 DBUs per M tokens |
Payload logging | 2.857 DBUs per M tokens |
Usage tracking | 0.571 DBUs per M tokens |
AI Guardrails
AI Guardrails allow users to configure and enforce data compliance at the model serving endpoint level and to reduce harmful content on any requests sent to the underlying model. Bad requests and responses are blocked and a default message is returned to the user. See how to configure guardrails on a model serving endpoint.
Important
AI Guardrails are only available in regions that support Foundation Model APIs pay-per-token.
The following table summarizes the configurable guardrails. See Limitations.
Guardrail | Definition |
---|---|
Safety filtering | Safety filtering prevents your model from interacting with unsafe and harmful content, like violent crime, self-harm, and hate speech. AI Gateway safety filter is built with Meta Llama 3. Databricks uses Llama Guard 2-8b as the safety filter. To learn more about the Llama Guard safety filter and what topics apply to the safety filter, see the Meta Llama Guard 2 8B model card. Meta Llama 3 is licensed under the LLAMA 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring compliance with applicable model licenses. |
Personally identifiable information (PII) detection | Customers can detect any sensitive information such as names, addresses, credit card numbers for users. For this feature, AI Gateway uses Presidio to detect the following U.S. categories of PII: credit card numbers, email addresses, phone numbers, bank account numbers, and social security numbers. The PII classifier can help identify sensitive information or PII in structured and unstructured data. However, because it is using automated detection mechanisms, there is no guarantee that the service will find all sensitive information. Consequently, additional systems and protections should be employed. These classification methods are primarily scoped to U.S. categories of PII, such as U.S. phone numbers, and social security numbers. |
Topic moderation | Capability to list a set of allowed topics. Given a chat request, this guardrail flags the request if its topic is not in the allowed topics. |
Keyword filtering | Customers can specify different sets of invalid keywords for both the input and the output. One potential use case for keyword filtering is so the model does not talk about competitors. This guardrail uses keyword or string matching to decide if the keyword exists in the request or response content. |
Use AI Gateway
You can configure AI Gateway features on your model serving endpoints using the Serving UI. See Configure AI Gateway on model serving endpoints.
Limitations
The following are limitations during the preview:
- AI Gateway is only supported for:
- Foundation Model APIs provisioned throughput model serving endpoints.
- Model serving endpoints that serve external models.
- When AI guardrails are used, the request batch size, that is an embeddings batch size, completions batch size, or the
n
parameter of chat requests, can not exceed 16. - For provisioned throughput workloads, only rate limiting and payload logging using AI Gateway-enabled inference tables are supported.
- See AI Gateway-enabled inference table limitations.
- If you use function calling and specify AI guardrails, those guardrails are not applied to the requests and intermediate responses of the function. However, guardrails are applied to the final output response.