Outlook attachments are not being scanned for SIT data

Brendan 0 Reputation points
2025-01-09T02:51:59.2766667+00:00

We are currently evaluating our Purview Information Protection implementation to ensure all outgoing emails containing Sensitive Information (SIT) and/or Personally Identifiable Information (PII) are properly identified and protected. This includes scanning both email content and any associated attachments.

While Purview successfully identifies and encrypts emails with SIT data in the message body, we have encountered an issue where attachments are not being scanned for the same sensitive information. This occurs despite the attachments being in common formats such as .docx and .pdf, which we understand are compatible with Purview's scanning capabilities.

We are investigating whether the lack of Optical Character Recognition (OCR) functionality within our Purview configuration is contributing to this issue. Could you please clarify if OCR is required for the comprehensive scanning of email attachments?

Any assistance in resolving this matter would be greatly appreciated.

Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,366 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Ganesh Gurram 3,600 Reputation points Microsoft Vendor
    2025-01-15T04:51:06.1166667+00:00

    @Brendan Short - I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer

    Ask: Outlook attachments are not being scanned for SIT data

    Solution: The issue occurred because attachments sent externally were not being scanned for Sensitive Information Types (SIT).

    This happened because the "Auto-labeling for files and emails" functionality within the sensitivity label policy was used to automatically apply a sensitivity label to the email after detecting particular SITs, which then triggered encryption for outgoing emails. However, this method of applying encryption only scans the body of the email for SITs and does not scan the attachments.

    To resolve the issue, Data Loss Prevention (DLP) functionality within Purview was used to create the necessary policies for DLP. With this approach, M365 attachments in emails were scanned correctly for SITs as per Purview documentation.

    OCR is only required when scanning image files.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information. 

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue. 

     

    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members

    1 person found this answer helpful.

  2. Ganesh Gurram 3,600 Reputation points Microsoft Vendor
    2025-01-09T16:34:43.3866667+00:00

    Hi @Brendan
    Thanks for the question and using MS Q&A platform.

    Optical Character Recognition (OCR) is indeed required for scanning images and certain types of content within attachments in Microsoft Purview. While Purview can identify sensitive information in the email body, attachments in formats like .docx and .pdf may not be scanned for sensitive information unless OCR is enabled.

    OCR functionality allows Purview to scan images for sensitive information, which is essential for identifying content that may not be in a text format. If OCR is not configured, it could lead to the issue you're experiencing where attachments are not being scanned for Sensitive Information Types (SIT) or Personally Identifiable Information (PII).

    To ensure comprehensive scanning of email attachments, you should verify that OCR is enabled in your Purview configuration. References:

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


  3. Brendan Short 0 Reputation points
    2025-01-15T06:30:28.6+00:00

    To clarify for others, the problem was that attachments sent external to the network were not being scanned for SIT.

    The reason why is;

    I was using the "Auto-labeling for files and emails" functionality within the sensitivity label policy to automatically allocate a sensitivity label to the email after detecting particular SIT's, which then triggered encryption for outgoing emails. The problem is that this method of applying encryption does not scan attachments within the emails, only the body of the email is scanned for SIT.

    To resolve the issue, I used the Data Loss Prevention (DLP) functionality of Purview to create policies required for DLP. M365 attachments in emails were then scanning correctly for SIT's as per Purview documentation.

    OCR is only required when you are needing to scan IMAGE files.

    @Ganesh Gurram - I am happy for you to repost this solution so I can accept the answer.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.