Mitigate false results in Azure AI Content Safety
This guide provides a step-by-step process for handling false positives and false negatives from Azure AI Content Safety models.
False positives are when the system incorrectly flags non-harmful content as harmful; false negatives are when harmful content is not flagged as harmful. Address these instances to ensure the integrity and reliability of your content moderation process, including responsible generative AI deployment.
Prerequisites
- An Azure subscription - Create one for free
- Once you have your Azure subscription, create a Content Safety resource in the Azure portal to get your key and endpoint. Enter a unique name for your resource, select your subscription, and select a resource group, supported region (see Region availability), and supported pricing tier. Then select Create.
Review and verification
Conduct an initial assessment to confirm that the flagged content is really a false positive or false negative. This can involve:
- Checking the context of the flagged content.
- Comparing the flagged content against the content safety risk categories and severity definitions:
- If you're using content safety in Azure OpenAI, see the Azure OpenAI content filtering doc.
- If you're using the Azure AI Content Safety standalone API, see the Harm categories doc or the Prompt Shields doc, depending on which API you're using.
Customize your severity settings
If your assessment confirms that you found a false positive or false negative, you can try customizing your severity settings to mitigate the issue. The settings depend on which platform you're using.
If you're using the Azure AI Content Safety standalone API directly, try experimenting by setting the severity threshold at different levels for harm categories based on API output. Alternatively, if you prefer the no-code approach, you can try out those settings in Content Safety Studio or Azure AI Foundry’s Content Safety page. Instructions can be found here.
In addition to adjusting the severity levels for false negatives, you can also use blocklists. More information on using blocklists for text moderation can be found in Use blocklists for text moderation.
Create a custom category based on your own RAI policy
Sometimes you might need to create a custom category to ensure the filtering aligns with your specific Responsible AI policy, as prebuilt categories or content filtering may not be enough.
Refer to the Custom categories documentation to build your own categories with the Azure AI Content Safety API.
Document issues and send feedback to Azure
If, after you’ve tried all the steps mentioned above, Azure AI Content Safety still can't resolve the false positives or negatives, there is likely a policy definition or model issue that needs further attention.
Document the details of the false positives and/or false negatives by providing the following information to the Content safety support team:
- Description of the flagged content.
- Context in which the content was posted.
- Reason given by Azure AI Content Safety for the flagging (if positive).
- Explanation of why the content is a false positive or negative.
- Any adjustments already attempted by adjusting severity settings or using custom categories.
- Screenshots or logs of the flagged content and system responses.
This documentation helps in escalating the issue to the appropriate teams for resolution.