Prepare your data for searches in Microsoft 365 Copilot

Completed

To prepare for Microsoft 365 Copilot, you must get your information ready for search. For example, if your organization already established the right information access controls and policies, then your users only have access to the information that they need and nothing else as they search in places like SharePoint. So if your organization already implemented these types of controls, then you're one step ahead.

If not, the good news is there are tools and controls that you can use to get visibility into how your organization shares information. You can put automated controls in place to ensure the right level of access and stop oversharing before you roll out Microsoft 365 Copilot. So just as you would prepare the information in your Microsoft 365 tenant for search, the same principles apply for Microsoft 365 Copilot. Why? Because Microsoft 365 Copilot only retrieves information each user explicitly has access to.

Diagram showing a robot looking at servers filled with data.

Data preparation tips

Organizations that improve their Microsoft 365 data quality and organization enable Microsoft 365 Copilot to generate more accurate, relevant suggestions tailored to their business needs. Administrators should consider implementing the following best practices to improve their organization's data quality:

  • Clean out redundant, outdated, and trivial (ROT) content. Perform an extensive audit of all organizational content including documents, emails, chats, wikis. Remove any outdated materials that are no longer accurate or relevant. For example, delete old product spec sheets from 5+ years ago, promotional emails for expired campaigns, and resolved IT ticket conversations. Enable archive/deletion policies on collaborative platforms to automate removal of stale content. Ongoing removal of obsolete, irrelevant content focuses Microsoft 365 Copilot on current, high-value information. The use of retention policies and retention labels can help organizations comply proactively with industry regulations and internal policies, reduce risk if a litigation event or a security breach occurs, and ensure users only work with content that's current and relevant to them.

    Besides removing ROT content, organizations should also identify and clean up inactive SharePoint sites. Inactive sites often contain outdated content, cluttering Copilot’s data source and leading to less accurate responses. Removing these sites helps Copilot focus on current information for better results. Organizations can identify inactive sites by running the Inactive SharePoint Sites policy, which is a SharePoint Advanced Management tool. This policy combats content sprawl and reduces ROT content by automatically identifying and managing inactive SharePoint sites. It enables you to define inactivity criteria, such as lack of updates or user activity over a set period. Once you identify this criteria, site owners receive email notifications to confirm the active/inactive state of the site.
  • Organize content into logical folders and sites. Structure your files, emails, and pages thoughtfully. Design a detailed taxonomy for categorizing and organizing documents, emails, and pages. For example, within document libraries, place financial reports under the "Finance" folder and marketing assets under "Marketing." Within SharePoint, create sites for "HR Policies," "IT Resources," "Product Documentation," and so on. Add metadata like client names and project codes on files to further identify them. When organizations design a logical information architecture in this manner, it enables Microsoft 365 Copilot to infer relationships and relevance.
  • Tag files with keywords. Make extensive use of labels, hashtags, metadata tags on all documents, emails, and pages to describe characteristics. For example, label customer support tickets with #refund, #payment-issue, or other issue tags. Add product attributes like model number, market, and manufacturing year as metadata tags on images and spec sheets. Thorough labeling and tagging allows Microsoft 365 Copilot to rapidly categorize, search, and recommend content.
  • Standardize file names. Mandate consistent file naming conventions like "Q3 2023 Earnings Report" instead of abbreviation-filled names. Set up recommended templates for documents and presentations. Organizations that use consistent, descriptive names rather than abbreviations enable Microsoft 365 Copilot to better grasp content.
  • Consolidate multiple versions. Wherever feasible, retain only the final version of documents, presentations, and so on. Find old iterations of files and consolidate to only retain the most current version. The final version should clearly indicate it's the latest. Eliminating redundant drafts and outdated versions reduces confusion and contradictions for Microsoft 365 Copilot.
  • Promote data hygiene habits. Implement organization-wide training and change management to promote good data hygiene habits among employees. Provide guidelines on effectively naming files, tagging content, retaining only current versions, deleting stale emails and content, and other practices. Consider gamification by recognizing top contributors to data hygiene. Organizations should also monitor data usage consent. If any data sets include personal information, ensure employees provide proper consent for use in Microsoft 365 Copilot. Build data quality expectations into employee goal setting and reviews. Organizations that establish a culture focused on maintaining clean, well-organized data ensure high quality over time, which maximizes Microsoft 365 Copilot's effectiveness.

Data governance considerations

When administrators prepare data for Microsoft 365 Copilot, they should also consider the following data governance factors:

  • Implement Microsoft SharePoint Advanced Management tools. Implementing Microsoft SharePoint Advanced Management (SAM) is a strategic move for administrators aiming to enhance their organization’s data governance, especially when preparing to integrate Microsoft 365 Copilot. SharePoint Advanced Management offers robust tools for data classification, retention, and compliance, ensuring that sensitive information is properly managed and protected. These tools are crucial for maintaining data integrity and meeting regulatory requirements, which is important as organizations increasingly rely on AI-driven tools like Copilot. Administrators who apply these advanced management features can establish a solid foundation of trust and security, which is essential for the successful deployment and operation of Microsoft 365 Copilot. Moreover, SAM provides comprehensive auditing and reporting capabilities that give administrators deep insights into data usage and access patterns. This transparency helps in identifying potential risks and ensuring that data governance policies are being followed. With these insights, organizations can make informed decisions about data management and quickly address any compliance issues that arise. As Microsoft 365 Copilot integrates with various data sources and workflows, having a well-governed data environment ensures the AI tool can function optimally, delivering accurate and reliable insights while maintaining the highest standards of data security and compliance.

  • Assign a data steward to oversee preparation and continue maintaining quality. Organizations have sensitive information under their control such as financial data, proprietary data, credit card numbers, health records, or social security numbers. As such, they should consider designating an experienced data governance expert or a team of governance experts as the official data steward for Microsoft 365. This person should be responsible for auditing data, establishing access rules, training users on hygiene, continuously monitoring how Microsoft 365 and Microsoft 365 Copilot utilize organizational data, and enacting improvements. Having an accountable data steward helps ingrain excellent data habits and promotes accountability across the AI lifecycle. To help protect their sensitive data and reduce the risk from oversharing, organizations must prevent their users from inappropriately sharing sensitive data with people who shouldn't have it. They can accomplish this goal by implementing sensitivity labels and data loss protection (DLP) policies. For more information, see Learn about data loss prevention and Learn about sensitivity labels.

  • Document your data policies and practices related to Microsoft 365 and Microsoft 365 Copilot utilization. As a best practice, organizations should formally outline their data management policies, access rules, use cases, and procedures related to data security and governance in Microsoft 365. As previously stated, the powerful security tools within the Microsoft 365 and Azure ecosystems can help organizations tighten permissions and implement "just enough access." The policies and settings that administrators define in these tools are used not only by Microsoft 365 to prevent data oversharing, but also by Microsoft 365 Copilot. Organizations that don't document their data governance policies should consider doing so before implementing Microsoft 365 Copilot. Drafting a comprehensive data governance policy should codify rules for:

    • Restricted data
    • Anonymization procedures
    • Stewardship roles
    • Employee training requirements
    • Access authorization procedures
    • Monitoring practices
    • Other enforceable policies

    An organization should share its data governance policy across the entire company and regularly update it. For example, it can create a data governance policy that defines the confidential data restricted from user access, requires anonymization of certain datasets, and designates a data governance expert or team to continually oversee its Microsoft 365 data practices. Formalizing governance requirements in this manner helps create accountability across an organization.

Robust governance is crucial to ensure that both Microsoft 365 and Microsoft 365 Copilot comply with legal and ethical data standards. As a best practice, organizations should appoint cross-functional data, security, and compliance teams to enact data restrictions, anonymization, stewardship, SharePoint Advanced Management policies, and training.

Organizations should also keep in mind that while initial audits, access restrictions, and governance policies are crucial when first deploying Microsoft 365 and Microsoft 365 Copilot, they should view data governance as an iterative, continuous process. Why? Because data assets and usage patterns inevitably evolve over time. For example:

  • Organizations add new data repositories as business needs change. New repositories require auditing and proper access controls put in place.
  • Users and permissions change. In business, change is a constant occurrence, especially as employees come and go. For example, new employees join or existing employees change roles. Administrators should grant/revoke access accordingly.
  • Regulations and compliance requirements change. Data policies must reflect any new restrictions.

To keep pace, organizations should perform regular reviews and updates, such as:

  • Monthly audits of new data sources that can require access changes.
  • Quarterly scans of permissions and external sharing to identify any new overexposure.
  • Annual policy reviews to update for new regulations and refresh employee training.

Appointing a data governance expert or governance team to oversee this continuous process helps ingrain it as a living, evolving set of data policies and access controls. This process also enables adapting Microsoft 365 and Microsoft 365 Copilot governance to changes over time.

Enable or prevent content on a site to be searchable

When users search on a site, results can come from many places such as columns, libraries, and pages. A site owner can change search settings on a site to decide whether the site's content can appear in search results. This process is known as idexing. The search settings define what content is included in the search index. The content from sites that are indexed appears in search results. Site owners can use settings to control whether content from their sites can be indexed or prevented from being indexed.

Permissions on content also affect whether users are allowed to see the content in search results. A good understanding of how permissions and search settings work can help you ensure that users can see the right documents and sites in the search results.

Note

Search results are always security trimmed, so users only see content they have permission to see. The search settings only define what content is included in the search index.

There are specific scenarios where users have permissions to see the content but are still unable to find it in the search results. For more details, see Search results don't appear for group owners after creating a new Office 365 group.

Content is stored in many places including sites, lists, libraries, Web Parts, and columns. By default, most content contained in a site, list, library, Web Part page, or column is crawled and added to the search index. What's in the search index decides what content can appear in search results both in the classic and modern search experiences. The permissions that are set on items, lists, libraries, sites, and so forth, also affect whether users can see the content in search results.

By default, the content of a site can appear in search results. If a site owner or site collection administrator specifies that the content from a particular site can't appear in search results, then the other search results settings such as those for lists, libraries, ASPX pages, and columns set on that site wouldn't have any effect. Similarly, if a site owner or site collection administrator prevents list or library content from appearing in search results, then excluding columns wouldn't have any effect. It's important to know what settings are inherited from higher levels in order to plan search effectively.

Perform the following steps to either allow or prevent the contents of a SharePoint site from appearing in search results:

Note

To change this setting, you must have the Manage Permissions permission level. This permission level is included in the " Site Name " Owner group.

  1. On the SharePoint site, select the gear (Settings) icon in the upper corner of the screen.
  2. In the menu that appears, select Site settings. If you don't see Site settings, select Site information, and then select View all site settings.
  3. Under Search, select Search and offline availability.
  4. In the Indexing Site Content section, under Allow this site to appear in Search results, select Yes to allow the content of the site to appear in search results. To prevent the content from appearing in search results, select No.

Configure Microsoft Search in Bing tenant-level setting

Microsoft Search in Bing brings together the capabilities of Microsoft Search and Bing web search. It provides a familiar search experience that helps users find relevant results from your organization and the web. To help keep your users and your data private and secure, users must sign in to their work or school account on Bing before they can find internal results. Users already signed in to a Microsoft app, including Microsoft Edge, Outlook, and SharePoint, are automatically signed in when they go Bing.

For most organizations, including enterprise and education, Microsoft Search in Bing is on by default. Perform the following steps in the Microsoft 365 admin center to manage the Microsoft Search in Bing setting:

  1. In the Microsoft 365 admin center, select Show all in the navigation pane.
  2. Select Settings in the navigation pane, and in the Settings group, select Search & intelligence.
  3. On the Search and intelligence page, the Overview tab is displayed by default. Select the Configurations tab.
  4. On the Configurations tab, under the section titled Microsoft Search in Bing setting, select the Change. button.
  5. On the Microsoft Search in Bing detail pane that appears, verify the Enable Microsoft Search in Bing for your organization check box is selected. Select it if it was previously disabled and you wish to enable it.

Note

It takes up to 24 hours for this change to take effect.

If this setting is off, users won't get internal results when they search on Bing, Windows Search, or Microsoft Edge. They also won't be able to access Microsoft 365 Chat. Turning off Microsoft Search in Bing doesn't stop or prevent internal content from being added to your search index. It only disables Bing entry points to Microsoft Search. To find answers and internal results, users must use other entry points, such as SharePoint Online or an Office 365 app.