Common usage scenarios for sensitive information types
This article describes how to implement some common sensitive information type (SIT) use case scenarios. You can use these procedures as examples and adapt them to your specific needs.
Tip
If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview trials hub. Learn details about signing up and trial terms.
Protect credit card numbers
Contoso Bank needs to classify the credit card numbers that they issue as sensitive. Their credit cards start with a set of six-digit patterns. They would like to customize the out-of-the-box credit card definition to detect only those credit card numbers starting with their six-digit patterns.
Suggested solution
- Create a copy of the credit card SIT. Use the steps to copy and modify an existing SIT to copy the credit card SIT.
- Edit the high confidence pattern. Follow the steps in Edit or delete a SIT pattern.
- Add the starts with check and add the list of pattern digits (formatted & unformatted). For example to ensure that the SIT only considers credit cards starting with 411111 & 433512 should be considered valid, add the following to the list 4111 11, 4111-11, 411111, 4335 12, 4335-12, 433512.
- Repeat step 2 & 3 for the low confidence pattern.
Test numbers similar to Social Security numbers
Contoso has identified a few nine-digit test numbers that trigger false positive matches in the Social Security Number (SSN) Microsoft Purview Data Loss Prevention (DLP) policy. They would like to exclude these numbers from the list of valid matches SSN matches.
Suggested solution
- Create a copy of the SSN SIT. Use the steps to copy and modify an existing SIT to copy the SSN SIT.
- Edit the high confidence pattern. Follow the steps in Edit or delete a SIT pattern.
- Add the numbers you want to exclude in the exclude specific values check. For example, to exclude 239-23-532 & 23923532, just adding 23923532 is sufficient.
- Repeat step 2 & 3 for other confidence patterns as well
Phone numbers in signature trigger match
Australia-based Contoso finds that phone numbers in email signatures are triggering a match for their Australia company number DLP policy.
Suggested solution
Add a 'NOT' group in supporting elements using a keyword list containing commonly used keywords in email signatures, such as Phone, Mobile, email, Thanks and regards, etc. Keep the proximity of this keyword list to a smaller value (for instance, 50 characters) for better accuracy. For more information, see Get started with custom sensitive information types.
Unable to trigger ABA routing policy
DLP policy is unable to trigger ABA routing number policy in large excel files because the required keyword isn't within 300 characters.
Suggested solution
Create a copy of the built-in SIT and edit it to change the proximity of the keyword list from 300 characters to Anywhere in the document.
Tip
You may edit the keyword list to include/exclude keywords that are relevant to your organization.
Unable to detect credit card numbers with unusual delimiters
Contoso Bank has noticed some of their employees share credit card numbers with ‘/’ as a delimiter, for example, 4111/1111/1111/1111, which the out-of-the-box credit card definition doesn't detect. Contoso would like to define their own regex and validate it using LuhnCheck.
Suggested solution
- Create a copy of the credit card SIT using the steps in Customize a built-in sensitive information type
- Add a new pattern
- In the primary element, select regular expression
- Define the regular expression that includes ‘/’ as part of the regular expression; choose validator and then select luhncheck or func_credit_card to ensure the regex also passes the LuhnCheck.
Ignore a disclaimer notice
Many organizations add legal disclaimers, disclosure statements, signatures, or other information to the top or bottom of email messages that enter or leave their organizations. In some cases, emails sent within an organization itself can contain such text. For example, employees might add signatures with motivational quotes, social messages, and so on. A disclaimer or signature can contain the terms that are present in the lexicon of a CC and might generate many false positives.
For example, a typical disclaimer might contain words like sensitive, or confidential. Policies looking for sensitive info will detect such disclaimers as incidents, leading to lot of false positives. Thus, providing customers with an option to ignore disclaimers can reduce the number of false positives and increase the efficiency of the compliance team.
Example of disclaimer
Consider the following disclaimer:
"IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Contoso may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient. If you have received this message in error, please forward it to the sender and delete it completely from your computer system."
If the SIT is configured to detect confidential as a keyword, the pattern invokes a match every time an email includes the disclaimer, leading to considerable number of false positives.
Ignore disclaimer using prefix and suffix in SIT
One way to ignore the instances of keywords in the disclaimer is by excluding the instances of keywords that are preceded by a prefix and followed by a suffix.
Consider this disclaimer:
"IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Contoso may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient. If you have received this message in error, please forward it to the sender and delete it completely from your computer system."
Say we have two instances of the keyword confidential. If we configure the SIT to ignore instances of this keyword that are preceded by prefixes (italicized in the example) and followed by suffixes (bolded in the example), then we can successfully ignore disclaimers in most cases.
To ignore the disclaimer using prefix and suffix:
- Add additional checks in the current SIT to exclude prefix and suffix text to the keyword that we want to ignore in the disclaimer.
- Choose to exclude the prefix and in the Prefixes text box enter contain information that is.
- Choose to exclude the suffix and in the Suffixes text box enter and legally privileged.
- Repeat this process for other instances of the keywords in the disclaimer, as shown in the following graphic.
Ignore disclaimer by excluding secondary elements
Another way to add a list of supporting elements (instances in disclaimer) that need to be excluded is to exclude secondary elements.
Consider this disclaimer:
"IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Contoso may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient. If you have received this message in error, please forward it to the sender and delete it completely from your computer system."
We have two instances of the keyword “confidential” in this example. If we configure the SIT to ignore instances of this keyword in the disclaimer (underlined as red in the following image), we can achieve ignoring disclaimers in most of the cases.
To ignore the disclaimer using secondary elements:
- Select Not any of these group in the supporting elements.
- Add the instances of disclaimer that we want to ignore as a keyword list/dictionary.
- Add the keywords as a new line that we want to ignore. Remember that the length of each text can't be more than 50 characters.
- Set the proximity of this element to be within 50-60 characters of the primary element.