FS4SP SCOM management pack Best Practices
Administrators wanting to monitor and receive alerts to FS4SP with SCOM can add the FAST Search Server 2010 for SharePoint Management Pack. It is recommended to run SCOM 2007 R1 or R2. Below are some in depth descriptions of the most critical and best practices to monitor these variables.
Install:
Install the SCOM 2007 agent on all search nodes. Make them Agent Managed servers, and obtain and import the System
Center Operations Manager Management Pack for FAST Search Server 2010 for SharePoint.
Time Since Last Index:
Performance Counter Category - "FAST Search
Indexer Status" -> Time since last index
Default Warning at 5 min, Alert at 15 min
TSLI is a good measure of overall indexing health. The value is set to 0 every time a partition completes indexing. In real world situations we will see an alert for this any time crawling is suspended, crawls complete with some time between the next crawl(no crawling, no feeding, nothing to index). Best action to complete here is keep default settings but view your crawler logs and look for a pattern as to when there are open gaps of no crawling to schedule a task for. For example if your incremental crawls start off every morning at 1am and complete at 20:00 then you will have a five hour gap, you can set this monitoring object to be put into maintenance mode with a PS script.
$CrawlerGAPTSLI = get-monitor | where {$_.DisplayName -eq 'Status Code Check'}
new-maintenancewindow -starttime $Now -endtime $Now.addminutes(300) –comment “Crawler Gaps” –monitoringobject $CrawlerGAPTSLI
Doc API Queue:
Performance Counter Category - "FAST Search Indexer" -> API queue load
Warning at 75%, Alert at 90%
The document API queue is where the indexers store new API operations in memory before they are persisted to disk. If the
queue grows large, it may indicate a feeding bottleneck on the indexers. If the queue fills up completely, the indexer will stop indexing and manual
intervention may be required.
Documents in Indexer:
Performance Counter Category -
"FAST
Search Indexer" -> Documents in indexer
Alert at 0 through minimum expected documents
This is the number of valid documents on each indexer. We want to generate an alert if this drops to a threshold that is deemed below normal. In an example of a two row farm you would want to note how many documents you see per indexer, then set an alert for 2/3rds of that number should give plenty of room for deletes while alerting on a an indexer crash or worse. During growth periods it is suggested to bump this threshold up to meet the growth. Also An indexer restart / rebuild may cause this count to drop until the indexer can re-initialize.
Search Latency:
Performance Counter Category – "FAST
Search QRServer" -> Average latency - ms
The alert should be set based on benchmarking. 500 is a good baseline.
The average time that search requests take to write back to the requesting client.
Failed Queries:
Performance Counter Category – "FAST Search QRServer" -> # of Fail user queries/sec
Alert should be set at >0 assuming all queries are being correctly formatted.
Queries that are failing at the qrserver. Quick note here is to grep your query logs for failures, some users do expensive wildcard searches that will spawn an alert if this is set to high and SharePoint does not support wildcard searches by default.
Content Distributor - document processors count:
Performance Counter Category - "FAST Search Content Distributor" -> Document processors
Alert = 0
The number of document processors currently registered with the content distributor. This number can fluctuate wildly depending on the amount of documents being processed, it is normal for the doc procs to soft reset during heavy feeding.