SharePoint 2016: retrieving crawl results using PowerShell
Introduction
Crawl results of a site collection revealed a significant number of warnings and errors needing attention by the site collection administrator. To facilitate the site collection administrator's review and analysis, site collection crawl results were exported to a spreadsheet via PowerShell and then provided to the site collection administrator in this format. This tip shows how to do this. Note the following presumptions:
- The farm has a single search service application deployed
- The site collection is hosted in a dedicated web application
- Farm search content sources are individualized by web application
Get content source crawl log references
These first statements get references to the target content source and that content source's crawl log. These references are needed to retrieve key information associated with the crawl log.
[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.Office.Server.Search.Administration.CrawlLog")
$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "[search service application name]"
$ContentSource = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa | ? { $_.Name -eq "[content source name]"}
$log = New-Object Microsoft.Office.Server.Search.Administration.CrawlLog $ssa
Get content source crawl status
This statement gets a listing of content sources and the status of their crawls:
Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa
and use this statement to get that same information for the single content source identified earlier:
$ContentSource
Get content source crawl statistics
This statement gets list of the total number of crawl successes, warnings, errors, etc for a specific content source. It returns a listing of the same information seen when navigating to the search service application's Crawl Log - Content Source page, but just for a specific content source.
$log.GetCrawlStatisticsByHost("[content source name]")
Export content source crawl log results
This statement exports all of the crawl results for a single content source into a CSV file
$log.GetCrawledUrls($false, 1000000, $null, $false, $ContentSource.Id, -1, -1, [System.DateTime]::MinValue, [System.DateTime]::MaxValue) | Export-CSV -Path "D:\Temp\CrawlResults.csv" -NoTypeInformation
To export a listing of just those successful crawl results, use this:
$log.GetCrawledUrls($false, 1000000, $null, $false, $ContentSource.Id, 0, -1, [System.DateTime]::MinValue, [System.DateTime]::MaxValue) | Export-CSV -Path "D:\Temp\CrawlResults.csv" -NoTypeInformation
To export a listing of all crawl results for all content sources provisioned for a search service application, use this:
$log.GetCrawledUrls($false, 1000000, $null, $false, -1, -1, -1, [System.DateTime]::MinValue, [System.DateTime]::MaxValue) | Export-CSV -Path "D:\Temp\CrawlResults.csv" -NoTypeInformation
Additional details on useful filtering parameters can be found in the references.
References
Notes
- tbd