Search Performance: Troubleshooting with the Crawl Load reports
Last year, I dove into a case of AV slowing down crawls and within that post, I provided guidance for leveraging the Crawl Load report to troubleshoot crawl performance. Since then, I've copied out that section into countless emails, so wanted to carve it out into its own blog post...
--I hope this helps...
------------------------------------
Follow the money crawl flow…
As we dive in, I will refer to the following high level flow for a crawl when describing a particular point of a crawl:
The Crawl Load report (found in Central Admin -> Crawl Health Reports -> in the Crawl Latency tab) illustrates the number of items being processed by the system at any a given moment in time, whereas the Crawl Latency chart (in the same tab, but presented under the Crawl Load chart) shows you the latency for various stages. At Ignite (around the 47m 30s mark), I talked about the Crawl Load report and noted that I use it as my starting point when troubleshooting any crawl performance issue.
For the Crawl Load, I use the analogy of cars on the road. If you take an aerial photo of rush hour traffic, you would see a lot of cars on the road, but have no real measure of how fast each car is actually moving. Being said, if in that photo you could see that lots of cars where packing in between mile marker 10 and 11, then there was likely something between mile marker 11 and 12 causing a slow-down (e.g. all the cars on the road are backed up by some bottleneck further down the road).
This Crawl Load chart is similar in that we can see where the items are in the system as they are being processed. Although overly simplified, it may also help to think about this Crawl Load chart as an indicator of where the Crawl Component’s [gatherer, aka “robot”] threads are currently working (e.g. each thread is working on some batch of items at any given time, which is why the chart shows number of items in the Y axis instead of threads). To take out the previous analogy further, each car [thread] is moving a batch of people [items] down the road [crawl flow].
When a crawl is flowing well, you’d typically expect to see a roughly even split of items in the “In Crawler Queue” and “Submitted to CTS”. In other words, about half of the threads are busy gathering new items to process and the other half of threads are waiting for callbacks on items being processed.
However…
…here we see the threads are mostly pooling in “Waiting in Content Plugin” (e.g. between steps 2b and 3 from the flow illustration above) and in “Submitted to CTS” (which means the threads are waiting on a callback until the processing in steps 3 to 6b completed). Going back to the traffic analogy again, we would expect to see corresponding bottlenecks somewhere down the road, which we can see using the “Content Processing Activity” report.
Comments
- Anonymous
April 29, 2017
This is great information! However the interpretation is missing. Can I assume that if I had lot of activity in the "waiting to commit" stage I know that the SQL server seems to be the bottleneck? In your case "waiting in content plugin" - what can it mean? Slow search server in terms of CPU/RAM? Network latency upon gathering stuff?Interpretation would be a great addition to this blog post.- Anonymous
June 09, 2017
The comment has been removed
- Anonymous