Content safety for models curated by Azure AI in the model catalog

บทความ
02/05/2025

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, learn about content safety capabilities for models from the model catalog deployed using serverless APIs.

Content filter defaults

Azure AI uses a default configuration of Azure AI Content Safety content filters to detect harmful content across four categories including hate and fairness, self-harm, sexual, and violence for models deployed via serverless APIs. To learn more about content filtering (preview), see Understand harm categories.

The default content filtering configuration for text models is set to filter at the medium severity threshold, filtering any detected content at this level or higher. For image models, the default content filtering configuration is set at the low configuration threshold, filtering at this level or higher. For models deployed using the Azure AI model inference service, you can create configurable filters by selecting the Content filters tab within the Safety + security page of the Azure AI Foundry portal.

Tip

Content filtering (preview) isn't available for certain model types that are deployed via serverless APIs. These model types include embedding models and time series models.

Content filtering (preview) occurs synchronously as the service processes prompts to generate content. You might be billed separately according to Azure AI Content Safety pricing for such use. You can disable content filtering (preview) for individual serverless endpoints either:

When you first deploy a language model
Later, by selecting the content filtering toggle on the deployment details page

Suppose you decide to use an API other than the Azure AI Model Inference API to work with a model that is deployed via a serverless API. In such a situation, content filtering (preview) isn't enabled unless you implement it separately by using Azure AI Content Safety. To get started with Azure AI Content Safety, see Quickstart: Analyze text content. You run a higher risk of exposing users to harmful content if you don't use content filtering (preview) when working with models that are deployed via serverless APIs.

Understand harm categories

Harm categories

Category	Description	API term
Hate and Fairness	Hate and fairness harms refer to any content that attacks or uses discriminatory language with reference to a person or identity group based on certain differentiating attributes of these groups. This includes, but isn't limited to: Race, ethnicity, nationality Gender identity groups and expression Sexual orientation Religion Personal appearance and body size Disability status Harassment and bullying	`Hate`
Sexual	Sexual describes language related to anatomical organs and genitals, romantic relationships and sexual acts, acts portrayed in erotic or affectionate terms, including those portrayed as an assault or a forced sexual violent act against one's will. This includes but isn't limited to: Vulgar content Prostitution Nudity and Pornography Abuse Child exploitation, child abuse, child grooming	`Sexual`
Violence	Violence describes language related to physical actions intended to hurt, injure, damage, or kill someone or something; describes weapons, guns, and related entities. This includes, but isn't limited to: Weapons Bullying and intimidation Terrorist and violent extremism Stalking	`Violence`
Self-Harm	Self-harm describes language related to physical actions intended to purposely hurt, injure, damage one's body or kill oneself. This includes, but isn't limited to: Eating Disorders Bullying and intimidation	`SelfHarm`

Severity levels

Level	Description
Safe	Content might be related to violence, self-harm, sexual, or hate categories. However, the terms are used in general, journalistic, scientific, medical, and similar professional contexts, which are appropriate for most audiences.
Low	Content that expresses prejudiced, judgmental, or opinionated views, includes offensive use of language, stereotyping, use-cases exploring a fictional world (for example, gaming, literature) and depictions at low intensity.
Medium	Content that uses offensive, insulting, mocking, intimidating, or demeaning language towards specific identity groups, includes depictions of seeking and executing harmful instructions, fantasies, glorification, promotion of harm at medium intensity.
High	Content that displays explicit and severe harmful instructions, actions, damage, or abuse; includes endorsement, glorification, or promotion of severe harmful acts, extreme or illegal forms of harm, radicalization, or nonconsensual power exchange or abuse.

How charges are calculated

Pricing details are viewable at Azure AI Content Safety pricing. Charges are incurred when the Azure AI Content Safety validates the prompt or completion. If Azure AI Content Safety blocks the prompt or completion, you're charged for both the evaluation of the content and the inference calls.

แชร์ผ่าน

Content safety for models curated by Azure AI in the model catalog

Content filter defaults

Understand harm categories

Harm categories

Severity levels

How charges are calculated

คำติชม

แหล่งทรัพยากรเพิ่มเติม

แชร์ผ่าน

Content safety for models curated by Azure AI in the model catalog

Content filter defaults

Understand harm categories

Harm categories

Severity levels

How charges are calculated

Related content

คำติชม

แหล่งทรัพยากรเพิ่มเติม