Implement data classification in Microsoft 365

5 minutes

Data classification in Microsoft 365 employs a process called zero change management. This process scans an organization's sensitive content and labeled content before the organization creates any policies. It enables an organization to see the effect that all the retention and sensitivity labels have in its environment. It also empowers the organization to start assessing its protection and governance policy needs.

The data classification features in Microsoft 365 enable an organization to gain insight into the data it classifies. With this knowledge, the organization can establish the policies it needs to protect and govern its sensitive data. As a Microsoft 365 Global administrator or Compliance administrator, you can evaluate and then tag content in your organization. As such, you can:

Control where it goes.
Protect it no matter where it is.
Preserve and delete the data according to your organization's needs.

Organizations evaluate tagged content through the application of sensitivity labels, retention labels, and trainable classifiers. There are various ways to do the discovery, evaluation, and tagging. The result is that an organization might tag and classify large numbers of documents and emails with one or more of these labels. Once an organization applies its retention and sensitivity labels, its next step is to see how its items use the labels, and what happens to those items.

Data classification in the Microsoft Purview compliance portal

In Microsoft 365, the Data classification page in the Microsoft Purview compliance portal provides visibility into tagged content, including:

The number items classified as a sensitive information type and what those classifications are.
The top applied sensitivity labels in both Microsoft 365 and Microsoft Entra ID Protection.
The top applied retention labels.
A summary of activities that users are taking on your sensitive content.
The locations of your sensitive and retained data.

Note

Azure Active Directory (Azure AD) is now Microsoft Entra ID. Learn more.

You can also manage the following compliance features on the Data classification page:

Trainable classifiers
Sensitive information types
Exact data match-based sensitive information types
Content explorer
Activity explorer

Once an organization develops its data classification framework, its next step is implementation. The Microsoft Purview compliance portal enables administrators to discover, classify, review, and monitor their data in accordance with their data classification framework. Organizations can use sensitivity labels to help protect its data. It does so by enforcing various protections, such as encryption and content marking. You can apply labels to data in multiple ways:

Manually
By default based on policy settings
Automatically, as the result of a condition such as personal user information

For smaller organizations or organizations with a relatively streamlined data classification framework, creating a single sensitivity label for each of their data classification levels may suffice. The following example shows a one-to-one data classification level to sensitivity label mapping:

Classification label	Sensitivity label	Label settings	Published to
Unrestricted	Unrestricted	Apply 'Unrestricted' footer	All users
General	General	Apply 'General' footer	All users

Tip

During an information protection pilot that Microsoft conducted with its internal users, they expressed confusion as to the intent of the 'Personal' label. They didn't know if this meant personal information or merely related to a personal matter. As a result, Microsoft changed the label to 'non-business'. The hope was that this new label was clearer and would eliminate any confusion. This example shows that taxonomy doesn't need to be perfect from the start. Start with what you think is right, pilot it, and if needed, adjust the label based on feedback.

Larger organizations often have a global reach and more complex information security needs. They may find it challenging to have a one-to-one relationship between the number of classification levels in their policies and the number of sensitivity labels in their Microsoft 365 environment. This challenge is especially true in global organizations where a given data classification level such as 'Restricted' may have a different definition or a different set of controls depending on region.

Microsoft Purview Data classification

The Microsoft Purview compliance portal includes the Microsoft 365 Data classification feature. The Data classification group on the compliance portal's navigation pane contains the features designed to help organizations discover, classify, review, and monitor sensitive and business-critical data.

The following sections outline the features under the Data classification group.

Overview

This page displays information that shows how an organization uses sensitive information and labels. You can view insights like:

Top sensitive info types.
How many items contain each type.
An overview of the top classification activities detected across Microsoft 365 locations.
The top sensitivity and retention labels applied to content.

Classifiers

An organization can use this page to refine its classification strategy by defining new categories of sensitive information and content. A tab appears at the top of the page for each classifier type, which includes:

Trainable classifiers. Uses machine learning to help identify categories of content in an organization. Microsoft provides built-in classifiers for identifying common content types like Resume and Source Code, and offensive language like profanity and threats. These classifiers are ready to use in compliant solutions like retention labels, sensitivity labels, and communication compliance. If the built-in classifiers don't meet an organization's needs, it can create custom ones to identify specific types of content, such as company contracts. After an organization creates a custom classifier, it must train the classifier to improve its accuracy. Training should continue until the classifier is ready for publishing and use in Microsoft Purview.
Sensitive info types. An organization can also use sensitive information types to help refine its classification strategy. Sensitive information types help identify sensitive items. This classifier helps prevent sensitive items from being inadvertently or inappropriately shared. It can also help locate relevant data in eDiscovery and apply governance actions to certain types of information. For example, organizations can use the sensitive info type classifier to protect information such as credit card and bank account numbers. Microsoft provides many built-in types for detecting sensitive info spanning regions around the globe. Alternatively, an organization can also create custom types tailored to its sensitive information. You define a custom sensitive information type (SIT) based on:
- Patterns
- Keyword evidence such as employee number, social security number, or ID
- Character proximity to evidence in a particular pattern
- Confidence levels
Exact data matches (EDM) classifiers. Let's assume an organization wants a custom sensitive information type that uses exact or nearly exact data values. It wants this custom sensitive information type instead of one that found matches based on generic patterns. What should the organization do? Exact Data Match (EDM)-based classification enables organizations to create custom sensitive information types that refer to exact values in a database of sensitive information. The system can refresh the database daily, which can contain up to 100 million rows of data. As employees, patients, and clients come and go, and as records change, an organization's custom sensitive information types remain current and applicable. It can also use EDM-based classification with policies, such as Microsoft Purview Data Loss Prevention policies or Microsoft Cloud App Security file policies. With EDM-based classification, an organization can create a custom sensitive information type designed to:
- Be dynamic and easily refreshed.
- Be more scalable.
- Result in fewer false-positives.
- Work with structured sensitive data.
- Handle sensitive information more securely, not sharing it with anyone, including Microsoft.

Content explorer

An organization can use Content explorer to review:

How many items contain sensitive information types.
How many items have sensitivity labels or retention labels applied.

To review specific activities, drill down by selecting the information type or label, and open the location that stores the item. Locations can include Exchange, SharePoint, and OneDrive. You can use the Search feature to quickly find specific locations until you see the list of email and documents with that information or label. Use Search again to find items based on their name, file type, and more. The Search feature also displays quick insights into the sensitive information types within an item, such as how many instances there are and their confidence level.

An organization can go deeper by opening the source content. Doing so enables it to review the sensitive information and context, and the file's metadata. Organizations can then export the results to an Excel spreadsheet containing details like file names and their sensitive info types and applied labels.

Screenshot showing the Content explorer page in the Microsoft Purview compliance portal.

Activity explorer

Organizations can use Activity explorer to review classification activity across locations, including Exchange, SharePoint, OneDrive, and endpoint devices. Classification activities include label changes and file updates. You can filter activity by date range, activity type, location, user, sensitivity label, and even more using the Filter menu.

The historical chart organizes recent activity by color. Hover over a column to see how many instances occurred that day. Scroll through the complete list of each activity logged across Microsoft 365 locations and devices. Then select an activity to review more detail. An organization can also customize the list to show the data that best suits its administrative needs.

It's common for users to create and share sensitive data every day. As such, discovering, classifying, and reviewing it should be an ongoing process. Organizations should monitor the Overview page to discover new insights to help ensure the system is detecting, protecting, and governing sensitive data appropriately.