Evaluate and refine search results in eDiscovery (preview)
Evaluating and refining your search results is one of the most important steps in your eDiscovery investigation work. The search query you configure and the results that are returned help you determine if you've discovered items and information applicable to your investigation or if you need to modify your search to try discover additional pertinent items. This initial search of items and initial review of information helps you determine what actions are required after you finalize your search parameters.
Tip
Get started with Microsoft Security Copilot to explore new ways to work smarter and faster using the power of AI. Learn more about Microsoft Security Copilot in Microsoft Purview.
Evaluate search results
After you create a search and run it, the next step is to view the search statistics to help you verify whether relevant content is being found and the content locations with the most hits. You can also review a sample of the search results to further help you determine if the content is within scope of your investigation.
Statistics dashboard
If you selected Statistics as the initial result type for your search, you're automatically redirected to this dashboard when the search results are completed. If you're already familiar with previous versions of eDiscovery, information on the Statistics tab is similar to collection estimates. The search results for the Statistics dashboard are included in the following sections:
- Summary: This section shows the number of search hits, locations, data sources, and the total file size of partially indexed items.
- Search hits: Displays the total search hit count and volume from all items matching the query criteria from locations searched.
- Locations: Displays the fraction of locations with hits out of all locations searched. The numerator shows the locations with hits and denominator shows the number of locations searched. Locations with errors are shown in red. To view full details on all the locations and associated hits and errors, select Download report to download the full .csv report.
- Data sources: Displays the fraction of data sources with hits out of all data sources searched. The numerator shows the data sources with hits and denominator shows the number of data sources included in the search. This data source is consistent with the data source in the search design flow and should match the number of people or groups included in the search. A tenant-wide data source of All people and all groups counts as a single data source.
- Partially indexed items or "Advanced indexed items hits": Displays the count and volume of partially and unindexed items returned as part of the search. This card displays partially indexed items information if you choose to include partially or unindexed items as part of the search configuration. If you chose to include partially and unindexed items and enabled advanced indexing options, this card displays additional hits you get from advanced indexed items. The advanced indexed hit count is from a statistic sample on the partially indexed items, actual hits may be more and should be confirmed using the add to a review set and export search results actions.
- Search hit trends: This section shows the following search result cards. The charts are interactive, hover to display section names, percentages, and item numbers. Select View top 100 for more information about items included in each trend and to download the results to a .csv file:
- Top data sources: Displays the top five data sources that make up the most search hits matching your query. The name of these data sources (names of users, groups, or organization-wide locations) are listed with the hit count. These data sources should match what you selected in the data sources workflow when building the search query.
- Top sensitive information types (SITs): Displays the top five sensitive information types (SITs) in SharePoint files that are most often included in the search hits matching your query. Adding each SIT’s count doesn't necessarily equate to the total count hits because a single item/document may contain more than one SIT type. For example, a document contains both a password and social security number (SSN). In this example, it's counted twice. We recommend selecting View top 100 to get a deeper understanding of the locations of these SIT counts to verify if they overlap or not.
- Top keywords: Query keywords, which resulted in the most search hits matching your query.
- Top items types: Most frequent item types within search hits matching your query. This count is determined by itemClass for Exchange content and ContentType for SharePoint content.
- Indexing status: Breakdown of unindexed (including partially indexed) and fully indexed data items.
- Top communication participants: Senders or recipients for emails, Microsoft Teams chats, and calendar invites in Exchange locations.
- Top location type: Hit count by location type (mailbox versus site).
Select Regenerate view to rerun the query and to review the most current results. Select Download report to combine all Statistics results into a single .csv file. When viewing the top 100 results for any trend area, select Download report for a .csv file of the top 100 results of the selected hit trend.
Understanding statistics and search results
Depending on when you run a search in eDiscovery (preview), the statistics for the search can contain different results. For example, if you run two searches with the exact same conditions but at different times, you'll likely have different statistics results. These differences may be caused for the following reasons:
- Your organization is active: Because you have active users in a production environment, data in your organization is constantly moved, added, deleted, and retired. The same search conditions run against the same locations is likely to have different search results because the data in those locations have changed between the time the searches were run.
- Transient errors: When you run a search (or export or add to a review set) transient processing errors may occur, especially for large sets of data. These errors are often due to processing time-outs and can be mitigated by breaking up searches into smaller date ranges and exporting the data in parallel. Always try to break up searches into smaller sizes with more specific search conditions and more targeted with selected locations. This helps process run more efficiently with less chance of errors.
- Location access: There are scenarios where locations included in a search are invalid, not accessible, or times out during processing. When comparing the results between two searches with the same conditions, ensure the locations successfully searched match. For example, a search against 1,000 locations may have 1 failed location in the first run and no failed locations in the second run. This means the first run searched only 999 locations successfully and the second run searched 1,000 locations. The difference of one location is the reason why search results between two runs are different. Use the locations.csv report for search, export, and add to review set processes to view a comprehensive report on what locations were successful and what locations failed. Rerun searches for any failed locations that are failed.
- User running the search: Depending on the user starting the search process, the user may or may not have the compliance boundary or compliance search filter applied. This filter either filters locations based on mailbox properties or filters content based on content path (SharePoint sites). The results the for the user may be limited if a compliance boundary or search permission filter is applied. For example, one user doesn't have a compliance boundary applied but a second user has a compliance boundary applied that restricts this user to user mailboxes and OneDrive sites to a specific region. A search by the first user returns all mailbox and OneDrive matches for the search conditions for all regions and a search for the second user returns only matches for mailboxes and OneDrive sites only for the allowed region.
Sample dashboard
If you selected Sample as the initial result type for your search, you're automatically redirected to this dashboard when the search results are completed. The search results for the Sample dashboard columns contain the following information for each item:
- Subject/Title: The subject or title of the items included in the sample.
- Date: The date the item was created or sent.
- Sender/Author: The sender or author of the item.
Samples allow you inspect a representative subset of individual items and details for each item returned for the search. The number of samples per location and the number of sample locations defined in the search determine the number of sample items and location representation in the sample items.
Select a sample item to view the Source information for the item. If available for the item, this view displays a rich view of a selected item so that you can evaluate the relevancy of the item as it relates to the defined search data source and conditions.
Select Regenerate view to rerun the query and to review the most current results. Select Download reports to combine all Sample results into a single .csv file. Select View settings to view the settings applied to the sample view generation.
Refine search results
Based on the estimates and statistics returned by the search, you can edit and refine the search by changing the data sources that are searched and the search query to expand or narrow the search. You can update and rerun the search until you're confident that the search results contain the content that's most relevant to your case.
After you're satisfied with the search results, you can take the following actions: