Microsoft Purview Custom SIT - Date of Birth

Brendan 0 Reputation points
2025-01-31T02:58:55.7666667+00:00

Hi

I am trying to develop a custom SIT for Microsoft Purview to detect Data of Birth in emails and attachments.

Using AI to write the following REGEX.

1. Core DOB Regex:

This regex aims to capture various date formats, including:

  • MM/DD/YYYY
  • MM-DD-YYYY
  • YYYY/MM/DD
  • YYYY-MM-DD
  • DD/MM/YYYY
  • DD-MM-YYYY

\b(?:(?:0|1)\/-\/-\d{2}|(?:19|20)\d{2}\/-\/-|(?:0|\d|3)\/-\/-\d{2})\b

2. Contextual Keywords:

\b(?:DOB|Date of Birth|Born on)\s*:\s*(?:(?:0|1)\/-\/-\d{2}|(?:19|20)\d{2}\/-\/-|(?:0|\d|3)\/-\/-\d{2})\b

After creating the custom SIT, it does not appear to work with DLP detection.

Any help would be appreciated.

Microsoft 365
Microsoft 365
Formerly Office 365, is a line of subscription services offered by Microsoft which adds to and includes the Microsoft Office product line.
5,647 questions
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,371 questions
{count} votes

Accepted answer
  1. phemanth 13,620 Reputation points Microsoft Vendor
    2025-01-31T07:10:19.9233333+00:00

    @Brendan

    Welcome to the Microsoft Q&A forum.

    Creating a custom Sensitive Information Type (SIT) for detecting dates of birth in Microsoft Purview can be tricky. Let's refine your regex and ensure it aligns with the requirements for DLP detection.

    Core DOB Regex

    Your current regex seems to have some issues. Here's a refined version that should capture the various date formats more accurately:

    \b(?:\d{2}[\/-]\d{2}[\/-]\d{4}|\d{4}[\/-]\d{2}[\/-]\d{2})\b
    

    This regex captures:

    • MM/DD/YYYY or MM-DD-YYYY
    • YYYY/MM/DD or YYYY-MM-DD
    • DD/MM/YYYY or DD-MM-YYYY

    Contextual Keywords

    For the contextual keywords, you can use the following regex to match phrases like "DOB", "Date of Birth", or "Born on" followed by a date:

    \b(?:DOB|Date of Birth|Born on)\s*:\s*(?:\d{2}[\/-]\d{2}[\/-]\d{4}|\d{4}[\/-]\d{2}[\/-]\d{2})\b
    
    
    1. Testing: Ensure you test your regex thoroughly with various date formats and contexts to confirm it works as expected.
    2. Validators: Use the date validator in Microsoft Purview to ensure the dates match the expected formats.
    3. Documentation: Refer to the Microsoft documentation on creating custom SITs and using regex in DLP policies for additional guidance.

    I: Sensitive information type REGEX validators and additional checks

    2: Create custom sensitive information types

    3: Learn about using regular expressions (regex) in DLP policies

    I hope the above steps will resolve the issue, please do let us know if issue persists. Thank you

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.