Estimate performance and capacity requirements for SharePoint Server 2010 Search
Applies to: SharePoint Server 2010
Summary: This article provides capacity planning information for different deployments of Microsoft SharePoint Server 2010 search, including small, medium, and large SharePoint Server 2010 farms.
This article provides capacity planning information for collaboration environment deployments of SharePoint Server 2010 search. It includes the following information for three sample search farm configurations:
Test environment specifications, such as hardware, farm topology, and configuration
The workload used for data generation, including the number and class of users and farm usage characteristics
Test farm dataset, including database contents and sizes
Health and performance data specific to the tested environment
This article also contains common test data and recommendations for how to determine the hardware, topology, and configuration you need to deploy a similar environment, and how to optimize the environment for appropriate capacity and performance characteristics.
SharePoint Server 2010 search contains a richer set of features and a more flexible topology model than earlier versions. Before you employ this architecture to deliver more powerful features and functionality to your users, you must carefully consider the effect on the farm’s capacity and performance.
After you read this document, you will understand how to do the following:
Define performance and capacity targets for the environment.
Plan the hardware required to support the number and type of users, and the features you intend to deploy.
Design the physical and logical topology for optimum reliability and efficiency.
Test, validate, and scale the environment to achieve performance and capacity targets.
Monitor the environment for key indicators.
Before you read this article, you should be familiar with the following:
SharePoint Server 2010 capacity management: Software boundaries and limits
Availability
Redundancy
Database-specific content
Planning overview
The scenarios in this article describe small, medium, and large test farms, with assumptions that can help you start planning for the correct capacity for the farm. These farm sizes are approximations based on the following assumptions:
The repositories crawled are primarily SharePoint Server content.
The vast majority of the user queries can be found in the same 33 percent of the index. This means that most users query for the same terms.
The default metadata settings are used, ensuring that the property database does not grow at a large rate.
In medium and large farms, dedicated crawl targets (front-end Web servers) exist that can serve content to the crawl components of these search farms.
Measurements taken on these farms can vary due to network and environmental conditions, and have up to a 10 percent margin of error.
Choosing a scenario
To choose the right scenario, consider the following questions:
Corpus size How much content has to be searchable? The total number of items should include all objects, including documents, Web pages, list items, and people.
Availability What are the availability requirements? Do customers need a search solution that can survive the failure of a particular server?
Content freshness How fresh do you want the search results to be? How long after a user changes the content do you want searches to provide the updated content in the results? How often do you expect the content to change?
Throughput How many people will be searching over the content simultaneously? This includes people typing in a query box, hidden queries like Web Parts automatically searching for data, or Microsoft Outlook 2010 Social Connectors requesting activity feeds that contain URLs that need security trimming from the search system.
Based on the answers to these questions, choose from one of the following scenarios:
Small farm Includes a single Search service application that shares some resources on a single SharePoint Server farm. Typical for small deployments, the amount of content over which to provide search is limited (fewer than 10 million items). Depending on the desired freshness goals, incremental crawls might occur during business hours.
Medium farm Includes one or more Search service applications in a dedicated farm, which provide search services to other farms. The amount of content over which to provide search is moderate (up to 40 million items), and to meet freshness goals, incremental crawls are likely to occur during business hours.
Large farm Includes one or more Search service applications in a dedicated farm, which provides search services to other farms. The amount of content over which to provide search is large (up to 100 million items), and to meet freshness goals, incremental crawls are likely to occur during business hours.
Search life cycle
These three scenarios enable you to estimate capacity at an early stage of the farm. Farms move through the following stages of the search life cycle as content is crawled:
Index acquisition This is the first stage of data population. The duration of this stage depends on the size of the content sources. It is characterized by the following:
Full crawls (possibly concurrent) of content.
Close monitoring of the crawl system to ensure that the hosts that are being crawled are not a bottleneck for the crawl.
Frequent master merges that, for each query component, are triggered when a certain amount of the index has changed.
Index maintenance This is the most common stage of a farm. It is characterized by the following:
There are incremental crawls of all content.
For SharePoint Server content crawls, a majority of the changes that are encountered during the crawl are security changes.
There are infrequent master merges that, for each query component, are triggered when a certain amount of the index has changed.
Index cleanup This stage occurs when a content change moves the farm out of the index maintenance stage — for example, when a content database or site is moved from one Search service application to another. This stage is triggered when either of the following occurs:
A host that is supplying content is not found by the search crawler for an extended period of time.
A content source or start address is deleted from a Search service application.
Scenarios
This section describes the configurations that were used for the small, medium, and large farm scenarios. It also includes the workload, dataset, performance data, and test data for each environment.
Small farm
Because the farm is small, multiple roles must be performed by some of the servers. We recommend combining a query component with a front-end Web server to avoid putting crawl components and query components on the same server. This configuration uses three application servers and one database server, as follows:
Because redundant query servers are generally suggested for enterprise search, two application servers were used for query components, given the following configuration:
One application server also hosts the Search Center. This configuration can be omitted if the small farm is used as a service farm, and the Search Centers are created on child content farms that use the Search service application from this parent service farm.
The preferred query configuration for fewer than 10 million items is to have one index partition. Each server then has one primary query component from the index partition. This active/active query component setup allows the failure of one of the query components, while still retaining the ability to serve queries from the remaining query component. If a query component fails, search sends queries (round-robin) to the next available query component.
One application server used for crawling and administration. This means that Central Administration, the search administration component, and a crawl component are created on this server.
A single database server to support the farm. The database server should have a dedicated number of input/output operations per second (IOPS) for crawl, property, and administration databases (for example, use different storage arrays).
Specifications
This section provides detailed information about the hardware, software, topology, and configuration of the test environment.
Topology
Hardware
Note
Because the test farm was running pre-release versions of SharePoint Server 2010 and the team wanted to avoid potential problems, the hardware that was used for the servers had more capacity than is typically required.
Web servers
Web server | Front-end Web server/Query (1) |
---|---|
Processor |
1px4c@3 GHz |
RAM |
16 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Storage |
2x450GB 15K SAS: RAID1:OS 2x450GB 15K SAS: RAID1:DATA1 2x450GB 15K SAS: RAID1:DATA2 |
Number of network interface cards (NICs) |
2 |
NIC speed |
1 gigabit |
Authentication |
NTLM |
Load balancer type |
None |
Software version |
SharePoint Server 2010 (pre-release version) |
Services running locally |
All services (including Search Query and Site Settings Service) |
Application servers
Server | Query (1) | Crawl/Administration (1) |
---|---|---|
Processor |
1px4c@3 GHz |
1px4c@3 GHz |
RAM |
16 GB |
16 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Windows Server 2008 R2, 64-bit |
Storage |
2x450GB 15K SAS:RAID1: OS 2x450GB 15K SAS:RAID1: DATA1 2x450GB 15K SAS:RAID1: DATA2 |
2x450GB 15K SAS: RAID1: OS 2x450GB 15K SAS: RAID1: TEMP 2x450GB 15K SAS: RAID0: DATA |
Number of NICs |
1 |
1 |
NIC speed |
1 gigabit |
1 gigabit |
Authentication |
NTLM |
NTLM |
Load balancer type |
none |
none |
Software version |
SharePoint Server 2010 (pre-release version) |
SharePoint Server 2010 (pre-release version) |
Services running locally |
SharePoint Server Search; Search Query and Site Settings Service |
SharePoint Server Search |
Database servers
Database | Shared (1) |
---|---|
Processor |
2px4c@3 GHz |
RAM |
16 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Storage |
2x450GB 15K SAS: RAID1: OS 2x450GB 15K SAS: RAID0: DATA 2x450GB 15K SAS: RAID0: LOGS Note Due to the reduced number of drives, the best practice of segregating databases on different IO channels was not applicable. |
Number of NICs |
2 |
NIC speed |
1 gigabit |
Authentication |
NTLM |
Software version |
Microsoft SQL Server 2008 Enterprise |
Workload
This section describes the workload that was used for data generation, including the number of users and farm usage characteristics.
Workload characteristics | Value |
---|---|
High level description of workload |
Search farms |
Average queries per minute |
6 |
Average concurrent users |
1 |
Total number of queries per day |
8,640 |
Dataset
This section describes the test farm dataset, including database contents and sizes, search indexes, and external data sources.
Object | Value |
---|---|
Search index size (number of items) |
9 million |
Size of crawl database |
52 GB |
Size of crawl database log file |
11 GB |
Size of property database |
68 GB |
Size of property database log file |
1 GB |
Size of search administration database |
2 GB |
Size of active index partitions |
38.4 GB (76.8 GB total, because the index is mirrored) |
Total number of search databases |
3 |
Other databases |
SharePoint_Config; SharePoint_AdminContent; State_Service; Bdc_Service_db; WSS_UsageApplication; WSS_Content |
Health and performance data
This section provides health and performance data that is specific to the test environment.
Query performance data
The following measurements were taken with 9 million items in the index. The columns give the measurements that were taken during the specific test, and the results are at the bottom of the following table. The measurements are described as follows:
Query latency These measurements were taken during a query latency test, where a test tool submitted a set of standard queries, as one user, and then measured the resulting latency. No crawls were under way during the test.
Query throughput These measurements were taken during a query throughput test, where a test tool submitted a standard set of queries against the farm, as an increasing number of concurrent users (up to 80), and then measured the resulting latency and throughput. No crawls were under way during the test.
Scorecard metric | Query latency | Query throughput | |
---|---|---|---|
CPU metrics |
Average SQL Server CPU |
3.4% |
12% |
Average front-end Web server CPU |
8% |
51% |
|
Average query server CPU |
13.3% |
95% |
|
Reliability |
Failure rate |
0 |
0 |
Front-end Web server crashes |
0 |
0 |
|
Application server crashes |
0 |
0 |
|
SQL Server |
Cache hit ratio (SQL Server) |
99.97% |
100% |
SQL Server locks: average wait time (ms) |
0.071 |
0.038 |
|
SQL Server locks: Lock wait time (ms) |
0.035 |
0.019 |
|
SQL Server locks: Deadlocks/sec |
0 |
0 |
|
SQL Server latches: average wait time (ms) |
31 |
0.017 |
|
SQL Server compilations/sec |
14.9 |
10.2 |
|
SQL Server statistics: SQL Server re-compilations/sec |
0.087 |
0.05 |
|
Average disk queue length (SQL Server) |
0.011 |
0.01 |
|
Disk queue length: writes (SQL Server) |
0.01 |
0.008 |
|
|
Disk reads/sec (SQL Server) |
0.894 |
0.05 |
Disk writes/sec (SQL Server) |
45 |
106 |
|
Application server |
Average disk queue length (query server) |
0.013 |
0.001 |
Disk queue length: writes (query server) |
0 |
0.001 |
|
Disk reads/sec (query server) |
11.75 |
0.06 |
|
Disk writes/sec (query server) |
4 |
5.714 |
|
Average memory used (query server) |
8.73% |
9% |
|
Maximum memory used (query server) |
8.75% |
9% |
|
Front-end Web server |
ASP.NET requests queued (average of all front-end Web servers) |
0 |
0 |
Average memory used (front-end Web server) |
9.67% |
95% |
|
Maximum memory used (front-end Web server) |
9.74% |
100% |
|
Test results |
Number of successes |
1,757 |
13,608 |
Number of errors |
0 |
0 |
|
Query UI latency (75th percentile) |
0.331 sec |
3.68 sec |
|
Query UI latency (95th percentile) |
0.424 sec |
3.93 sec |
|
Query throughput |
3.29 Requests/sec |
22.4 requests/sec |
Crawl performance data
The following measurements were taken during initial, sequential full crawls of the given content source. The content source size is given in millions of items. The columns give the measurements that were taken during the specific crawl, and the results are at the bottom of the table.
Scorecard metric | SharePoint content (4M) | File share (1M) | HTTP (non-SharePoint) (1M) | |
---|---|---|---|---|
CPU metrics |
Average SQL Server CPU |
5.4% |
11.7% |
23% |
Average indexer CPU |
41.6% |
69% |
71% |
|
Reliability |
Failure rate |
0 |
0 |
0 |
Front-end Web server crashes |
0 |
0 |
0 |
|
Application server crashes |
0 |
0 |
0 |
|
SQL Server |
Cache hit ratio (SQL Server) |
n/a |
n/a |
n/a |
SQL Server locks: average wait time (ms) |
436 |
51.76 |
84.73 |
|
SQL Server locks: lock wait time (ms) |
n/a |
n/a |
n/a |
|
SQL Server locks: deadlocks/sec |
n/a |
n/a |
n/a |
|
SQL Server latches: average wait time (ms) |
1.05 |
1.64 |
3.53 |
|
SQL Server compilations/sec |
n/a |
n/a |
n/a |
|
SQL Server statistics: SQL Server re-compilations/sec |
n/a |
n/a |
n/a |
|
Average disk queue length (SQL Server) |
27.124 |
6.85 |
45 |
|
Disk queue length: writes (SQL Server) |
17.6 |
6.7 |
21.57 |
|
Application server |
Average disk queue length (crawl server) |
0.008 |
0.032 |
0.02 |
Disk queue length: writes (crawl server) |
0.006 |
0.025 |
0.012 |
|
Average memory used (crawl server) |
14.16% |
10.4% |
11.5% |
|
Maximum memory used (crawl server) |
19.2% |
11.13% |
12.78% |
|
Front-end Web server |
ASP.NET requests queued (average of all front-end Web servers) |
0 |
0 |
0 |
Average memory used (front-end Web server) |
n/a |
n/a |
n/a |
|
Maximum memory used (front-end Web server) |
n/a |
n/a |
n/a |
|
Test results |
Number of successes |
3,934,881 |
1,247,838 |
996,982 |
Number of errors |
9,645 |
302 |
2 |
|
Portal crawl speed (items/sec) |
46.32 |
120.436 |
138.316 |
|
Anchor crawl speed (items/sec) |
5,197 |
3,466.219 |
2,192.982 |
|
Total crawl speed (items/sec) |
45.91 |
116.392 |
130.086 |
Test data
This section provides test data illustrating how the farm performed. Refer to the Optimizations section later in this article to understand how to improve farm performance.
Query latency
The following graph displays query latency percentiles for this farm as user load increases (gathered during the query throughput test). A query percentile of 95 percent means that 95 percent of the query latencies that were measured were below that value.
From this graph you can see that with a smaller index, this farm can maintain sub-second query latency, even as more concurrent users (20) are performing queries on this farm.
Query throughput
The following graph displays the query throughput for this farm as user load increases (gathered during the query throughput test).
Taking into account the previous two graphs, you can see that if you add user load beyond about 20 concurrent users, this farm will get no additional query throughput, and latency will increase.
Crawl rate
The following graph displays the crawl rate for this farm during the index acquisition stage of the search life cycle. The values represent a full crawl, in items crawled per second.
The extra overhead involved to effectively perform a full crawl on a SharePoint site content source results in a lower overall crawl rate in this farm.
Overall takeaway
This farm was near capacity on RAM for the query servers. Because the front-end Web server processes (also consuming RAM) were also on one of the query servers, it would affect latency on queries that are running on that server.
The next steps for improving performance for this farm would be to do the following:
Move front-end Web server processes to their own front-end Web server (that is, add another front-end Web server for redundancy).
Add more RAM to both query servers. We recommend enough RAM on the query server for 33 percent of the active query component’s index partition plus 3 GB for the operating system and other processes.
Add storage arrays so that you can segregate databases on the database server.
Medium farm
The medium farm configuration uses one Web server, six application servers, and two database servers, as follows:
One Web server was used in this test configuration to host the Search Center. This Web server can be omitted if searches are always performed from a child farm by using a Search service application proxy (installed on the child farm). Otherwise, you would probably add another Web server to this farm for redundancy.
Two application servers are used for crawling and administration. This means the following:
Central Administration and the search administration component are created on one of the application servers.
Each server has two crawl components. Each crawl component is attached to a different crawl database for redundancy.
The remaining four application servers are used for query. For up to 40 million items, the standard configuration is to have four index partitions. Redundant query functionality is achieved by arranging query components so that each server has one active query component from one of the index partitions and a failover query component from a different index partition. However, this example farm shows what to do if you do plan to have more than 40 million items. You start with eight index partitions (each with its own active and failover query components) on the four application servers so that you minimize index repartitioning. We assume that each server meets the following guidelines to allow four components on the same application server:
Enough RAM and IOPS are available.
Each server has more than six CPU cores to support processing as follows:
Four CPU cores for the two active query components.
Two CPU cores for the two failover query components.
Two database servers support the farm. One database server is used for the two crawl databases. The other server is used for the property and search administration databases in addition to being used for the other SharePoint databases. The database servers should have a dedicated number of IOPS for each crawl, property, and search administration database (for example, use different storage arrays).
Specifications
This section provides detailed information about the hardware, software, topology, and configuration of the test environment.
Topology
Hardware
Note
Because the test farm was running pre-release versions of SharePoint Server 2010 and the team wanted to avoid potential problems, the hardware that was used for the servers had more capacity than is typically required.
Web server
Web server | Front-end Web server (1) |
---|---|
Processor |
2px4c@2.33 GHz |
RAM |
8 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Storage |
2x148GB 15K SAS: RAID1: OS |
Number of NICs |
2 |
NIC speed |
1 gigabit |
Authentication |
NTLM |
Load balancer type |
None |
Software version |
Microsoft SharePoint Server (pre-release version) |
Services running locally |
All services |
Application servers
There are six application servers in the farm. Four servers are used for serving queries and two servers are used for crawling.
Server (count) | Query (4) | Crawl (1), Crawl/Admin (1) |
---|---|---|
Processor |
2px4c@2.33 GHz |
2px4c@2.33 GHz |
RAM |
32 GB |
8 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Windows Server 2008 R2, 64-bit |
Storage |
2x148 GB 15K SAS: RAID1: OS 4x300GB 15K SAS:RAID10:Data |
2x148 GB 15K SAS:RAID1: OS/Data |
Number of NICs |
2 |
2 |
NIC speed |
1 gigabit |
1 gigabit |
Authentication |
NTLM |
NTLM |
Load balancer type |
None |
None |
Software version |
SharePoint Server 2010 (pre-release version) |
SharePoint Server 2010 (pre-release version) |
Services running locally |
SharePoint Server Search; Search Query and Site Settings Service |
SharePoint Server Search |
Database servers
There are two database servers. The first server contains the search administration, property, and other SharePoint Server databases. The second server contains the two crawl databases. Note that the storage volumes that were created optimized the existing hardware that was available for the test.
Database server | Search Administration; Property; SharePoint databases | Crawl databases |
---|---|---|
Processor |
2px4c@3.2 GHz |
4px2c@2.19 GHz |
RAM |
32 GB |
16 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Windows Server 2008 R2, 64-bit |
Storage |
2x148GB 15K SAS :RAID1: OS 2x148GB 15K SAS :RAID1: TEMP Log 2x450GB 15K SAS :RAID1: TEMP DB 6x450GB 15K SAS :RAID10: Property DB 2x450GB 15K SAS :RAID1:Search Admin, SharePoint DBs 2x450GB 15K SAS :RAID1:Logs |
2x148GB 15K SAS :RAID1: OS 2x148GB 15K SAS :RAID1: TEMP Log 2x300GB 15K SAS :RAID1: TEMP DB 6x146GB 15K SAS :RAID10: Crawl DB1 6x146GB 15K SAS :RAID10: Crawl DB2 2x300GB 15K SAS :RAID1:Crawl DB Log1 2x300GB 15K SAS :RAID1:Crawl DB Log2 |
Number of NICs |
2 |
2 |
NIC speed |
1 gigabit |
1 gigabit |
Authentication |
NTLM |
NTLM |
Software version |
SQL Server 2008 Enterprise |
SQL Server 2008 Enterprise |
Workload
This section describes the workload that was used for data generation, including the number of users and the farm usage characteristics.
Workload characteristics | Value |
---|---|
High level description of workload |
Search farms |
Average queries per minute |
12 |
Average concurrent users |
1 |
Total number of queries per day |
17,280 |
Timer jobs |
Search Health Monitoring – Trace Events; Crawl Log Report; Health Analysis Job; Search and Process |
Dataset
This section describes the test farm dataset, including database contents and sizes, search indexes, and external data sources.
Object | Value |
---|---|
Search index size (number of items) |
46 million |
Size of crawl database |
356 GB |
Size of crawl database log file |
85 GB |
Size of property database |
304 GB |
Size of property database log file |
9 GB |
Size of search administration database |
5 GB |
Size of active index partitions |
316 GB (79 GB per active component). |
Total number of databases |
4 |
Other databases |
SharePoint_Config; SharePoint_AdminContent; State_Service; Bdc_Service_db; WSS_UsageApplication; WSS_Content |
Health and performance data
This section provides health and performance data that is specific to the test environment.
Query performance data
The following measurements were taken with 46 million items in the search index. The columns give the measurements that were taken during the specific test, and the results are at the bottom of the table. The measurements are described as follows:
Query latency These measurements were taken during a query latency test in which a test tool submitted a set of standard set of queries, as one user, and then measured the resulting latency. No crawls were under way during the test.
Query throughput These measurements were taken during a query throughput test in which a test tool submitted a standard set of queries against the farm, as an increasing number of concurrent users (up to 80), and then measured the resulting latency and throughput. No crawls were under way during the test.
Scorecard metric | Query latency | Query throughput | |
---|---|---|---|
CPU metrics |
Average SQL Server CPU (property database server) |
5% |
98% |
Average front-end Web server CPU |
3% |
33% |
|
Average query server CPU |
3% |
47% |
|
Reliability |
Failure rate |
0.07% |
0% |
Front-end Web server crashes |
0 |
0 |
|
Application server crashes |
0 |
0 |
|
SQL Server (property database server) |
Cache hit ratio (SQL Server) |
100% |
99.9% |
SQL Server locks: average wait time (ms) |
0.000 |
0.159 |
|
SQL Server locks: lock wait time (ms) |
0.000 |
0.080 |
|
SQL Server locks: deadlocks/s |
0 |
0 |
|
SQL Server latches: average wait time (ms) |
0.041 |
1.626 |
|
SQL Server compilations/sec |
9.776 |
93.361 |
|
SQL Server statistics: SQL Server re-compilations/s |
0.059 |
0.071 |
|
Read/write ratio (IO per database) |
0.01 |
0.81 |
|
Average disk queue length (SQL Server) |
0.001 |
0.037 |
|
Disk queue length: writes (SQL Server) |
0.000 |
0.003 |
|
|
Disk reads/sec (SQL Server) |
0.057 |
14.139 |
Disk writes/sec (SQL Server) |
4.554 |
17.515 |
|
Application server |
Average disk queue length (query server) |
0.000 |
0.001 |
Disk queue length: writes (query server) |
0.000 |
0.001 |
|
Disk reads/sec (query server) |
0.043 |
0.266 |
|
Disk writes/sec (query server) |
4.132 |
5.564 |
|
Average memory used (query server) |
9% |
10% |
|
Maximum memory used (query server) |
9% |
10% |
|
Front-end Web server |
ASP.NET requests queued (average of all front-end Web servers) |
0 |
0 |
Average memory used (front-end Web server) |
47% |
48% |
|
Maximum memory used (front-end Web server) |
47% |
49% |
|
Test results |
Number of successes |
1,398 |
14,406 |
Number of Errors |
1 |
0 |
|
Query UI latency (75th percentile) |
0.47 sec |
2.57 sec |
|
Query UI latency (95th percentile) |
0.65 sec |
3.85 sec |
|
Query throughput |
2.38 request/sec |
27.05 request/sec |
Crawl performance data
The following measurements were taken during initial, full crawls of the given content source. The content source size is given in millions of items. The columns give the measurements that were taken during the specific crawl, and the results are at the bottom of the table.
Scorecard metric | SharePoint content (3.5 million) | File share (1 million) | HTTP (non-SharePoint) (1 million) | |
---|---|---|---|---|
CPU metrics |
Average SQL Server CPU (crawl database server, property database server) |
11%, 19% |
22%, 7% |
23%, 5% |
Maximum SQL Server CPU (crawl database server, property database server) |
96%, 100% |
86%, 45% |
79%, 28% |
|
Average indexer CPU |
41.6% |
69% |
71% |
|
Reliability |
Failure rate |
0.2% |
0.02% |
0% |
Front-end Web server crashes |
0 |
0 |
0 |
|
Application server crashes |
0 |
0 |
0 |
|
SQL Server (crawl database server, property database server) |
Cache hit ratio (SQL Server) |
99.5%, 99.8% |
Not collected |
99.9%, 99.3% |
SQL Server locks: average wait time [ms] |
1,881.749, 1,106.314 |
1,617.980, 2.882 |
983.137, 0.904 |
|
SQL Server locks: maximum wait time [ms] |
69,919.500, 1,081,703 |
55,412.000, 304.500 |
24,000.500, 47 |
|
SQL Server locks: average lock wait time [ms] |
339.658, 10,147.012 |
Not collected |
739.232, 0.136 |
|
SQL Server locks: maximum lock wait time [ms] |
598,106.544, 234,708,784 |
Not collected |
52,711.592, 23.511 |
|
SQL Server locks: deadlocks/s |
0.001, 0 |
Not collected |
0.008, 0 |
|
SQL Server latches: average wait time [ms] |
2.288, 13.684 |
3.042, 13.516 |
2.469, 20.093 |
|
SQL Server latches: maximum wait time [ms] |
2,636, 1,809 |
928, 858.5 |
242.929, 938.706 |
|
SQL Server compilations/sec: average |
20.384, 5.449 |
Not collected |
76.157, 6.510 |
|
SQL Server compilations/sec : maximum |
332.975, 88.992 |
Not collected |
295.076, 42.999 |
|
SQL Server statistics: SQL Server re-compilations/sec: average |
0.560, 0.081 |
Not collected |
0.229, 0.125 |
|
SQL Server statistics: SQL Server re-compilations/sec: maximum |
22.999, 88.492 |
Not collected |
17.999, 15.5 |
|
Read/write ratio (IO per database): maximum |
2.15, 1.25 |
Not collected |
1.45, 0.364 |
|
Average disk queue length (SQL Server) |
66.765, 27.314 |
129.032, 20.665 |
182.110, 11.816 |
|
Maximum disk queue length (SQL Server) |
4,201.185, 5,497.980 |
3,050.015, 762.542 |
1,833.765, 775.7 |
|
Disk queue length: writes (SQL Server): average |
58.023, 13.532 |
114.197, 19.9 |
175.621, 10.417 |
|
Disk queue length: writes (SQL Server): maximum |
1,005.691, 881.892 |
1,551.437, 761.891 |
1,018.642, 768.289 |
|
|
Disk reads/sec (SQL Server): average |
245.945, 94.131 |
Not collected |
137.435, 154.103 |
Disk reads/sec (SQL Server): maximum |
6,420.412, 6,450.870 |
Not collected |
3,863.283, 1,494.805 |
|
Disk writes/sec (SQL Server): average |
458.144, 286.884 |
Not collected |
984.668, 278.175 |
|
Disk writes/sec (SQL Server): maximum |
2,990.779, 5,164.949 |
Not collected |
2,666.285, 4,105.897 |
|
Application server |
Average disk queue length (crawl server) |
0.052 |
0.043 |
0.030 |
Disk queue length: writes (crawl server) |
0.029 |
0.031 |
0.026 |
|
Disk reads/sec (crawl server) |
5.405 |
Not collected |
0.798 |
|
Disk writes/sec (crawl server) |
48.052 |
Not collected |
102.235 |
|
Average memory used (crawl server) |
68% |
45% |
52% |
|
Maximum memory used (crawl server) |
76% |
47% |
59% |
|
Front-end Web server |
ASP.NET requests queued (Average of all front-end Web servers) |
0 |
0 |
0 |
Average memory used (front-end Web server) |
n/a |
n/a |
n/a |
|
Maximum memory used (front-end Web server) |
n/a |
n/a |
n/a |
|
Test results |
number of successes |
3,631,080 |
1,247,838 |
200,000 |
Number of errors |
7,930 |
304 |
0 |
|
Portal crawl speed (items/sec) |
82 |
148 |
81 |
|
Anchor crawl speed (items/sec) |
1,573 |
1,580 |
1,149 |
|
Total crawl speed (items/sec) |
79 |
136 |
76 |
Test data
This section provides test data that illustrates how the farm performed under load.
Query latency
The following graph shows the query latency percentiles for this farm as user load increases (gathered during the query throughput test). A query percentile of 95 percent means that 95 percent of the query latencies that were measured were below that value.
From this graph you can see that with a smaller index, this farm can maintain sub-second query latency at the 95th percentile, even as more concurrent users (22) are performing queries on this farm.
Query throughput
The following graph displays the query throughput for this farm as user load increases (gathered during the query throughput test).
Taking into account both this graph and the previous graph, you can see that, at 33 million items in the index, the farm can maintain sub-second latency at the 75th percentile with about 30 concurrent users. Additional concurrent user load can still be accommodated, but query latency will increase beyond the sub-second mark.
However, at 46 million items in the index, no additional concurrent user load can be accommodated, and query latency will increase.
Crawl rate
The following graph displays the crawl rate for this farm during the index acquisition stage of the search life cycle. The values represent a full crawl, in items crawled per second.
The extra overhead involved to effectively crawl a SharePoint site content source results in a lower overall crawl rate in this farm.
Overall takeaway
This farm was near capacity on RAM for the query servers.
The next steps for improving the performance of this farm would be to do the following:
Add more RAM to both query servers. We recommend enough RAM on the query server for 33 percent of the active query component’s index partition + 3 GB for the operating system and other processes.
Add more RAM to the database server that is hosting the property database. In this configuration, the key tables were about 92 GB in size (including indices), which suggests a 30 GB RAM requirement. However, the database server had only 32 GB RAM to serve the property database, search administration database, and the other SharePoint Server databases.
Add storage arrays so you can segregate databases on the database server.
Scale out to increase throughput or reduce query latency or both.
Although crawl speed is high on this farm with two crawl databases and four crawl components, it can be an important goal for the farm to have certain fresher parts of the index; that is, certain content that needs to be crawled very frequently. Adding another crawl database that is dedicated to hosts in the desired content source (with host distribution rules), and associating two additional crawl components to the database, would support the index freshness goal.
Large farm
The expected configuration uses one Web server, 13 application servers and three database servers, as follows:
One Web server is used to host a Search Center. This Web server can be omitted if searches are always performed from a content farm by using a Search service application proxy (installed on the content farm).
Three application servers are used for crawling and administration. This means that:
Central Administration and the search administration component are created on one of the application servers.
Each server has two crawl components. Each crawl component is attached to a separate crawl database.
The remaining ten application servers are used for query. The preferred configuration is to have ten index partitions. Each server then has one primary query component from one of the index partitions, in addition to a failover query component from a different index partition.
Four database servers support the farm. One server is used for the property and search administration databases. A second server is used for a property database. A third server is used for two crawl databases. The fourth server is used for one crawl database and the other SharePoint databases. The database servers should have a dedicated number of IOPS for each crawl, property, and search administration database (for example, use different storage arrays).
Specifications
This section provides detailed information about the hardware, software, topology, and configuration of the test environment.
Topology
This section describes the topology of the test environment.
Hardware
This section describes the hardware that was used for testing.
Note
Because the test farm was running pre-release versions of SharePoint Server 2010 and the team wanted to avoid potential problems, the hardware that was used for the servers had more capacity than is typically required.
Web servers
Web server | Front-end Web server (1) |
---|---|
Processor |
2px4c@2.33 GHz |
RAM |
8 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Storage |
2x148GB 15K SAS: RAID1: OS |
Number of NICs |
2 |
NIC speed |
1 gigabit |
Authentication |
NTLM |
Load balancer type |
none |
Software version |
SharePoint Server 2010 (pre-release version) |
Services running locally |
All services |
Application servers
There are 13 application servers in the farm. 10 servers are used for serving queries and three servers are used for crawling.
Server (count) | Query (10) | Crawl (2), Crawl/Administration (1) |
---|---|---|
Processor |
2px4c@2.5 GHz |
2px4c@2.5 GHz |
RAM |
32 GB |
32 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Windows Server 2008 R2, 64-bit |
Storage |
2x148GB 15K SAS: RAID1: OS 4x300GB 15K SAS:RAID10:Data |
2x148GB 15K SAS:RAID1: OS/Data |
Number of NICs |
2 |
2 |
NIC speed |
1 gigabit |
1 gigabit |
Authentication |
NTLM |
NTLM |
Load balancer type |
None |
None |
Software version |
SharePoint Server 2010 (pre-release version) |
SharePoint Server 2010 (pre-release version) |
Services running locally |
SharePoint Server Search; Search Query and Site Settings Service |
SharePoint Server Search |
Database servers
There are four database servers. The first server contains the search administration and property databases; the second server contains a property database; the third contains two crawl databases; the fourth contains a crawl database and a SharePoint database. Note that the storage volumes that were created optimized the existing hardware available for the test.
Database server | Search Administration, Property, and SharePoint databases | Crawl databases |
---|---|---|
Processor |
2px4c@3.2 GHz |
4px2c@2.19 GHz |
RAM |
32 GB |
16 GB |
Operating system |
Windows Server 2008 R2, 64-bit |
Windows Server 2008 R2, 64-bit |
Storage |
2x148GB 15K SAS :RAID1: OS 2x148GB 15K SAS :RAID1: TEMP Log 2x450GB 15K SAS :RAID1: TEMP DB 6x450GB 15K SAS :RAID10: Property DB 2x450GB 15K SAS :RAID1:Search Admin, SharePoint DBs 2x450GB 15K SAS :RAID1:Logs |
2x148GB 15K SAS :RAID1: OS 2x148GB 15K SAS :RAID1: TEMP Log 2x300GB 15K SAS :RAID1: TEMP DB 6x146GB 15K SAS :RAID10: Crawl DB1 6x146GB 15K SAS :RAID10: Crawl DB2 2x300GB 15K SAS :RAID1:Crawl DB Log1 2x300GB 15K SAS :RAID1:Crawl DB Log2 |
Number of NICs |
2 |
2 |
NIC speed |
1 gigabit |
1 gigabit |
Authentication |
NTLM |
NTLM |
Software version |
SQL Server 2008 Enterprise |
SQL Server 2008 Enterprise |
Query performance data
The following measurements were taken with 103 million items in the index. The columns give the measurements taken during the specific test, and the results are at the bottom of the table. The measurements taken are described as follows:
Query Latency These measurements were taken during a query latency test in which a test tool submitted a set of standard set of queries as one user and measured the resulting latency. No crawls were under way during the test.
Query Throughput These measurements were taken during a query throughput test in which a test tool submitted a standard set of queries against the farm as an increasing number of concurrent users (up to 120) and measured the resulting latency and throughput. No crawls were under way during the test.
Scorecard metric | Query throughput | |
---|---|---|
CPU metrics |
Average SQL Server CPU (property database server) |
34% |
Average front-end Web server CPU |
45% |
|
Average query server CPU |
20% |
|
Reliability |
Failure rate |
0% |
Front-end Web server crashes |
0 |
|
Application server crashes |
0 |
|
SQL Server (property database server) |
Cache hit ratio (SQL Server) |
100% |
SQL Server locks: average wait time (ms) |
0 |
|
SQL Server locks: lock wait time (ms) |
0 |
|
SQL Server locks: deadlocks/s |
0 |
|
SQL Server latches: average wait time (ms) |
1.401 |
|
SQL Server compilations/sec |
73.349 |
|
SQL Server statistics: SQL Server re-compilations/s |
0.006 |
|
Read/write ratio (IO per database) |
0.81 |
|
Average disk queue length (SQL Server) |
0.037 |
|
Disk queue length: writes (SQL Server) |
0.003 |
|
|
Disk reads/sec (SQL Server) |
9.88 |
Disk writes/sec (SQL Server) |
354.1 |
|
Application server |
Average disk queue length (query server) |
0.002 |
Disk queue length: writes (query server) |
0.002 |
|
Disk reads/sec (query server) |
0.035 |
|
Disk writes/sec (query server) |
6.575 |
|
Average memory used (query server) |
6.548% |
|
Maximum memory used (query server) |
6.601% |
|
Front-end Web server |
ASP.NET requests queued (average of all front-end Web servers) |
0 |
Average memory used (front-end Web server) |
18.081% |
|
Maximum memory used (front-end Web server) |
19.983% |
|
Test results |
Number of successes |
10,925 |
Number of Errors |
0 |
|
Query UI latency (75th percentile) |
3.431 sec |
|
Query UI latency (95th percentile) |
3.512 sec |
|
Query throughput |
36.42 request/sec |
Crawl performance data
The following measurements were taken during initial, sequential full crawls of the given content source. The content size is given in millions of items. The columns give the measurements that were taken during the specific crawl, and the results are at the bottom of the table.
Scorecard metric | SharePoint content (3.5 million) | File share (1 million) | |
---|---|---|---|
CPU metrics |
Average SQL Server CPU (crawl database server, property database server) |
15.74%, N/A |
24%, 6.6% |
Maximum SQL Server CPU (crawl database server, property database server) |
100%, N/A% |
100%, 45% |
|
Average indexer CPU |
44% |
49% |
|
Reliability |
Failure rate |
0.0% |
0.00% |
Front-end Web server crashes |
0 |
0 |
|
Application server crashes |
0 |
0 |
|
SQL Server (crawl database server, property database server) |
Cache hit ratio (SQL Server) |
99.8%, N/A% |
99.797%, 99.49% |
SQL Server locks: average wait time [ms] |
734.916, N/A |
1,165, 5.866 |
|
SQL Server locks: maximum wait time [ms] |
15,335, N/A |
28,683, 210.5 |
|
SQL Server locks: average lock wait time [ms] |
108.98, N/A |
847.72, 5.325 |
|
SQL Server locks: maximum lock wait time [ms] |
17,236.96, N/A |
124,353, 12,920 |
|
SQL Server locks: deadlocks/s |
0, N/A |
.012, 0 |
|
SQL Server latches: average wait time [ms] |
1.4, N/A |
2.233, 40.6 |
|
SQL Server latches: maximum wait time [ms] |
1,606, N/A |
917.8, 1,895 |
|
SQL Server compilations/sec: average |
24.28, N/A |
72.7, 11.39 |
|
SQL Server compilations/sec : maximum |
416, N/A |
460, 76.62 |
|
SQL Server statistics: SQL Server re-compilations/sec: average |
0.560, N/A |
0.295, .099 |
|
SQL Server statistics: SQL Server re-compilations/sec: maximum |
0.24, N/A |
17.50, 17.393 |
|
Read/write ratio (IO per database): maximum |
20.3, N/A |
1.18/.214 |
|
Average disk queue length (SQL Server) |
90.113, N/A |
138.64, 27.478 |
|
Maximum disk queue length (SQL Server) |
3,179, N/A |
2,783.543, 847.574 |
|
Disk queue length: writes (SQL Server): average |
86.93, N/A |
130,853, 26.086 |
|
Disk queue length: writes (SQL Server): maximum |
1,882, N/A |
2,781.197, 884.801 |
|
|
Disk reads/sec (SQL Server): average |
99, N/A |
147.462, 159.159 |
Disk reads/sec (SQL Server): maximum |
3,772, N/A |
2,403.336, 896.462 |
|
Disk writes/sec (SQL Server): average |
373, N/A |
475.886, 539.497 |
|
Disk writes/sec (SQL Server): maximum |
18,522, N/A |
2,031.888, 4,174.271 |
|
Application server |
Average disk queue length (crawl server) |
0.075 |
.063 |
Disk queue length: writes (crawl server) |
0.046 |
0.053 |
|
Disk reads/sec (crawl server) |
1.958 |
1.693 |
|
Disk writes/sec (crawl server) |
62.33 |
101.093 |
|
Average memory used (crawl server) |
59% |
56.38% |
|
Maximum memory used (crawl server) |
70% |
58.93% |
|
Front-end Web server |
ASP.NET requests queued (Average of all front-end Web servers) |
N/A |
N/A |
Average memory used (front-end Web server) |
N/A |
N/A |
|
Maximum memory used (front-end Web server) |
N/A |
N/A |
|
Test results |
Number of successes |
1,909,739 |
1,247,838 |
Number of errors |
9,361 |
331 |
|
Portal crawl speed (items/sec) |
70.3 |
131.44 |
|
Anchor crawl speed (items/sec) |
764 |
525.84 |
|
Total crawl speed (items/sec) |
64 |
105 |
Recommendations and troubleshooting
The following sections provide recommendations for how to determine the hardware, topology, and configuration that you need to deploy environments that are similar to these scenarios and how to optimize the environment for appropriate capacity and performance characteristics.
Recommendations
This section describes specific actions you can take to optimize the environment for appropriate capacity and performance characteristics.
Hardware recommendations
For specific information about overall minimum and recommended system requirements, see Hardware and software requirements (SharePoint Server 2010). Note that requirements for servers that were used for search supersede the overall system requirements. Use the following recommended guidelines for RAM, processor, and IOPS to meet performance goals.
Search sizing
This section explains the search system, including sizing requirements and guidelines, per component.
SharePoint Server 2010 can be deployed and configured in a wide variety of ways. As a result, there is no simple way to estimate how many users or items can be supported by a given number of servers. Therefore, make sure that you conduct testing in your own environment before you deploy SharePoint Server 2010 in a production environment.
Search query system
This section shows the components of the search query system for a given Search service application. The sizing requirements for each component are listed in the Scaling details table after the following diagram.
Object descriptions
The following list defines the search query system objects that are in the previous diagram:
Search proxy This is the Search service application proxy that is installed on any farm that consumes search from this Search service application. It runs in the context of the Web applications that are associated with the Search service application proxy.
Search Query and Site Setting Service This is also known as the query processor. Receiving the query from a Search service application proxy connection, a query processor does the following:
Sends the query to one active query component for each index partition (or to the property database or both, depending on the query)
Retrieves Best Bets and removes duplicates to get the results set
Security trims the results based on security descriptors in the search administration database
Retrieves the metadata of the final results set from the property database
Sends the query results back to the proxy
Index partition This is a logical group of query components, which represents a subset of the full-text index. The sum of index partitions composes the full-text index. However, note that query components contain the actual subset of the index. An index partition is associated with one property database.
Search query component A query component contains all or part of the full-text index. When queried by a query processor, the query component determines the best results from its index and returns those items. A query component can be created as either of the following:
Active, which means that it will respond to queries by default. Adding multiple active query components for the same index partition will increase throughput.
Failover, which means that it will only respond to queries if all active components for the same index partition have failed.
Search administration database Created at the same time as the Search service application, the search administration database contains the Search service application-wide data used for queries like Best Bets and security descriptors in addition to application settings that are used for administration.
Property database A property database contains the metadata (title, author, related fields) for the items in the index. The property database is used for property-based queries in addition to retrieving metadata that is needed for displaying the final results. If multiple index partitions exist, the index partitions can be mapped to different property databases.
Scaling details
Object | Scale considerations | RAM | IOPS (read/write) |
---|---|---|---|
Search proxy |
This scales with the front-end Web servers on which it is associated. |
N/A |
N/A |
Search Query and Site Settings Service |
This service, which is installed in the Services on Server page in Central Administration, should be started on each server with a query component. It can be moved to a separate server (or pair, for high availability), to avoid using RAM on the servers that contain the query components. Also, if a custom security trimmer is used, it can affect CPU and RAM resources. |
This uses RAM (process cache) for caching security descriptors for the index. |
N/A |
Index partition |
Increasing the number of index partitions decreases the number of items in the index partition, and this reduces the RAM and disk space that is needed on the query server that hosts the query component assigned to the index partition. |
N/A |
N/A |
Query component |
Each active query component on a server consumes memory when it is serving queries. Each active query component is created or modified as part of the topology of a Search service application. Both active and failover components consume IO when crawling is occurring. Servers can be dedicated to query components (for example, two active and two failover on the same server), assuming that RAM and IO requirements have been met. When possible, dedicate at least two CPU cores per active component per server, and at least one CPU core per failover component per server. |
For each active query component on an application server, 33% of its index should be in RAM (operating system cache). |
2 K needed per pair (active/failover) of query components on a given server. The query component needs IO for: Loading the index into RAM for queries Writing index fragments that are received from each crawl component Merging index fragments into its index, such as during a master merge |
Search administration database |
For each query, Best Bets and security descriptors are loaded from the search administration database. Ensure that the database server has enough RAM to serve this from cache. When possible, avoid placing this on a server with a crawl database, because the crawl database tends to reset the cache of its database server. |
Ensure that the database server has enough RAM to keep the critical table (MSSSecurityDescriptors) in RAM. |
700 |
Property database |
For each query, metadata is retrieved from the property database for the document results, so you can add RAM to the database server to improve performance. If multiple index partitions exist, you can partition the property database and move to a different database server to decrease RAM and IO requirements. |
Ensure that the database server has enough RAM to keep 33% of the critical tables (MSSDocSDIDs + MSSDocProps + MSSDocresults) in cache. |
2 K 30% read, 70% write |
Search crawl system
This section shows the components of the search crawl system. The sizing requirements of each component appear in the table that follows the diagram.
Object descriptions
This section defines the search crawl system objects in the previous diagram:
Administration component An administration component is used when a crawl is started, in addition to when an administration task is performed on the crawl system.
Crawl component A crawl component processes crawls, propagates the resulting index fragment files to query components, and adds information about the location and crawl schedule for content sources to its associated crawl database.
Search administration database The search administration database, which is created at the same time as the Search service application, stores the security descriptors that are discovered during the crawl, in addition to the application settings that are used for administration.
Crawl database A crawl database contains data that is related to the location of content sources, crawl schedules, and other information that is specific to crawl operations. They can be dedicated to specific hosts by creating host distribution rules. A crawl database only stores data. The crawl component that is associated with the given crawl database does the crawling.
Search Query System
Scaling details
Object | Scale considerations | RAM | IOPS (Optionally, % read/write) |
---|---|---|---|
Administration component |
The single administration component is not scalable. By default, it is located on a server that is hosting a crawl component (and Central Administration, on smaller farms). |
Minimal |
Minimal |
Crawl component |
Crawl components aggressively use CPU bandwidth. Optimally, a given crawl component can utilize four CPU cores. RAM is not as critical. In larger farms, dedicating servers to host crawl components minimizes the crawler effect on other components (especially using crawl components associated with different crawl databases, if redundancy is desired). |
Moderate. Note that when crawling East Asian documents, RAM requirements will increase due to the word breakers. |
300-400 |
Search administration database |
See Search query system above. When possible, avoid locating this on a server with a crawl database, because the crawl database tends to reset the cache of its database server. |
See Search query system above. |
700 |
Crawl database |
Crawl databases aggressively use IO bandwidth. RAM is not as critical. A crawl database needs 3.5 K IOPS for crawling activities; it will consume as much as 6 K IOPS, based on the available bandwidth. |
Moderate |
3.5 – 7 K 73% read, 27% write |
Calculate storage sizing
Calculate the following factors to help estimate storage requirements. The sizing factors are based on an internal pre-deployment system with an index that contains primarily SharePoint content (the size of the content databases is 13.3 terabytes). Overall, SharePoint search required approximately 20 percent of the content database disk space. As stated previously, make sure that you conduct testing in your own environment before you deploy SharePoint Server 2010 in a production environment.
Caveats:
The corpus that was used to derive these numbers was primarily (English) SharePoint content, so if your content differs (for example, it consists mostly of file shares or non-SharePoint HTTP sites), you will need to allow for more variation.
Even if the content is primarily SharePoint content, you can still vary the coefficients in the following circumstances:
If you have large document repositories, the coefficients will be significantly larger.
If the content is primarily images, you might be able to reduce the coefficients.
Content in a different language will probably affect the coefficients.
1. Calculate content database sizing factor (ContentDBSum)
Determine the sum of the SharePoint content databases that will be crawled. This is the ContentDBSum value that will be used as the correlation in the next storage computations.
2. Calculate index-related sizes (TotalIndexSize and QueryComponentIndexSize)
Determine the size of the total index (which is located on the query components and is used for full text queries):
Multiply ContentDBSum by .035. This is the TotalIndexSize before you partition and reserve space for merges and repartitioning.
Next, determine the number of index partitions you will have based on your scenario. A general guideline is that an index partition should have between 5 million and 10 million items. When you have determined the number of index partitions, you can calculate the size of the query component storage.
Divide TotalIndexSize by (number of index partitions). This is the QueryComponentIndexSize. It is used to calculate the following sizes:
For RAM, multiply QueryComponentIndexSize by 0.33. This is the minimum of RAM required for this query component, if it is active.
If the component is a failover component, it does not require the RAM until it becomes active.
For a given server, having multiple active query components on the same server means that you need to sum each active query component’s RAM to arrive at the RAM needs for the server.
For disk storage, use QueryComponentIndexSize in the following ways to estimate disk requirements, depending on whether you will ever repartition the index (meaning you expect the index to grow greater than the 10 million per partition boundary):
Multiply QueryComponentIndexSize by 3 to calculate disk storage for a single query component to allow room for index merging.
Multiply QueryComponentIndexSize by 4 to calculate disk storage for a single query component to allow room for index repartitioning.
For a given server, having multiple query components on the same server means that you have to arrange for storage for each of the query components given the IOPS requirements in the "Scaling Details" section of the "Search Query System" section earlier in this article.
3. Calculate property database sizes
Determine the size of the property databases in the following way:
Multiply ContentDBSum by .015. This is the TotalPropertyDBSize before partitioning.
Multiply ContentDBSum by .0031. This is the TotalPropertyDBLogSize before partitioning. This assumes that you use the simple recovery model for SQL Server databases.
Multiply ContentDBSum by .00034. This is the property database TempDBSize. Because we recommend having 33 percent of the key tables in the property database in RAM, use of the temporary database is not heavy.
Next, determine the number of property databases you will have, based on your scenario. A general guideline is that a property database should contain up to 50 million items, assuming that there are no query performance issues and that you have a limited number of managed properties (the standard configuration).
Divide TotalPropertyDBSize by (number of property databases). This is the PropertyDatabaseSize.
Divide TotalPropertyDBLogSize by (number of property databases). This is the PropertyDatabaseLogSize.
For RAM, multiply PropertyDatabaseSize by 0.33. This is the minimum amount of RAM that is recommended for this property database.
For a given database server, having multiple property databases on the same server means that you have to arrange for storage and RAM for each of the property databases, given the IOPS and RAM requirements in the "Scaling Details" section of the "Search Query System" section, earlier in this article.
4. Calculate crawl database sizes
Next, determine the size that is needed for the crawl database in the following way:
Multiply ContentDBSum by .046. This is the TotalCrawlDBSize before partitioning.
Multiply ContentDBSum by .011. This is the TotalCrawlDBLogSize before partitioning. This assumes that you use the simple recovery model for SQL Server databases.
Multiply ContentDBSum by .0011. This is the crawl database TempDBSize. Because the search crawl system affects the performance of the temporary database, we do not recommend locating other databases on servers that are hosting the crawl database or databases that would be affected by this usage.
Next, determine the number of crawl databases you will have, based on your scenario. A general guideline is that a crawl database should contain up to 25 million items, assuming there are no crawl performance issues.
Divide TotalCrawlDBSize by (number of crawl databases). This is the CrawlDatabaseSize.
Divide TotalCrawlDBLogSize by (number of crawl databases). This is the CrawlDatabaseLogSize.
For a given database server, having multiple crawl databases on the same server means that you need to arrange for storage for each of the crawl databases, given the IOPS requirements in the "Scaling Details" section of the "Search Crawl System" section earlier in this article. For RAM, we recommend at least 16 GB on database servers that are dedicated to crawl databases.
5. Calculate search administration database size
Determine the size of the search administration database (assuming Windows authentication) in the following way:
Multiply number of items in the index (in millions) by 0.3. This is the SearchAdminDBSize.
For RAM, multiply SearchAdminDBSize by 0.33. This is the minimum amount of RAM that is recommended for this search administration database.
For a given database server, having multiple databases on the same server means that you have to arrange for storage and RAM for each of the databases, given the IOPS and RAM requirements in the "Scaling Details" section of the "Search Query System" section earlier in this article.
Optional: Calculate backup size
To determine the disk space that is required for backing up one Search service application, perform the following calculation:
- TotalCrawlDBSize + TotalPropertyDBSize + TotalIndexSize + SearchAdminDBSize = the basic backup size.
This basic backup size is a starting point. It will also be affected by the following:
Additional index size that is included in the TotalIndexSize for any crawling that has occurred since the last master merge.
Growth over time due to additional items, queries, and security descriptors.
In addition, you will probably want to retain multiple backups from different times, in addition to reserving space for the next backup.
Sizing exercise
Using the sizing factors previously mentioned, the following is a sizing exercise for a 100 million item farm that will serve queries primarily over SharePoint content. Using the large farm scenario, you would assume the following:
Ten logical index partitions are needed to accommodate the 100 million items.
To serve queries, you need 10 active query components, one per index partition.
Query redundancy is important, so you have 10 failover query components, one per index partition (located on a different server than the active component).
To determine storage and RAM needs, perform the following steps:
In a SharePoint content farm with multiple content databases, add together the content databases you want to crawl to get 20 terabytes.
Using the index coefficient above, multiply 20 terabytes by 0.035 (Index Coefficient) to get 716.8 GB. This is the TotalIndexSize. If you had only one partition, this would be the size of the index, at rest.
Divide TotalIndexSize by the number of partitions: 716.8 GB / 71.68 GB. This is the index size that is required per query component (QueryComponentIndexSize), with one index partition. The size is the same for either active or failover query components.
Multiply TotalIndexSize by 4 if you plan to repartition; otherwise, multiply by 3 to support master merges. 71.68 GB * 4 = 286.72 GB. You should have 286.72 GB available on the query server’s disk to support one query component. If you have two query components on the same application server (as in the active/failover topology that we recommended in the large farm scenario), you would have a disk drive layout as follows:
Operating system drive (standard size).
Extra storage system 1: Query Component1_Share (size= at least 300 GB), used for active query component from Index partition 1.
Extra storage system 2: Query Component2_Share (size = at least 300 GB), used for failover (mirror) query component from Index partition 2.
Note
On this application server, with one active query component, you would want a minimum of 71.68 GB * 0.33 = 23.65 GB of RAM + 3 GB RAM for the operating system, (we use 32 GB), to cache most of the queries.
Software limits
The following table gives software boundaries that are imposed to support an acceptable search experience.
Object | Limit | Additional notes |
---|---|---|
SharePoint Search Service Application |
Recommended maximum of 20 per farm. Absolute maximum of 256 total service applications. |
You can deploy multiple Search service applications on the same farm, because you can assign search components and databases to separate servers. |
Indexed documents |
Overall recommended maximum of 10 million items per index partition and 100 million items per Search service application. |
SharePoint search supports index partitions, which each contain a subset of the entire search index. The recommended maximum is 10 million items for a given partition. The overall recommended maximum number of items, including people, list items, documents, and Web pages is 100 million. |
Index partitions |
Recommended maximum of 20 per Search service application |
This index partition is a logical subset of the index of the Search service application. The recommended limit is 20. Increasing the number of index partitions decreases the number of items in the index partition, which reduces the RAM and disk space that is needed on the query server that hosts the query component assigned to the index partition. However, this can affect relevance because the number of items in the index partition is decreased. The hard limit of index partitions is 128. |
Property database |
Recommended limit is 10 per Search service application |
The property database stores the metadata for items in each associated index partition that is associated with it. An index partition can only be associated with one property store. The recommended limit is 10 per Search service application, with a hard limit of 255 (same as index partitions). |
Crawl databases |
The limit is 32 crawl databases per application |
The crawl database stores the crawl data (including time and status) about all items that were crawled. The recommended limit is 25 million items per crawl database, or four total databases for a Search service application. |
Crawl components |
Recommended limit per application is 16 total crawl components, with two per crawl database, and two per server, assuming the server has at least eight processors (cores). |
The total number of crawl components per server must be fewer than 128/(total query components) to minimize propagation I/O degradation. Exceeding the recommended limit might not increase crawl performance. In fact, crawl performance might decrease based on available resources on crawl server, database, and content host. |
Query components |
Recommended limit per application is 128, with 64/(total crawl components) per server. |
The total number of query components is limited by the crawl components' ability to copy files. The maximum number of query components per server is limited by the query components' ability to absorb files propagated from crawl components. |
Concurrent crawls |
Recommended limit is 20 crawls per Search service application |
This is the number of crawls under way at the same time. Crawls are extremely expensive search tasks that can affect database load in addition to other application load. Exceeding 20 simultaneous crawls can cause the overall crawl rate to degrade. |
Content sources |
Recommended limit of 50 content sources per Search service application. |
The recommended limit can be exceeded up to the hard limit of 500 per Search service application. However, fewer start addresses should be used, and the concurrent crawl limit needs to be followed. |
Start addresses |
Recommended limit of 100 start addresses per content source. |
The recommended limit can be exceeded up to the hard limit of 500 per content source. However, fewer content sources should be used. A better approach when you have many start addresses is to put them on an HTML page as links, and then have the HTTP crawler crawl the page, following the links. |
Crawl rules |
Recommended limit of 100 per Search service application |
The recommendation can be exceeded; however, display of the crawl rules in search administration is degraded. |
Crawl logs |
Recommended limit of 100 million per application |
This is the number of individual log entries in the crawl log. It will follow the indexed documents limit. |
Metadata properties recognized per item |
The hard limit is 10,000. |
This is the number of metadata properties that, when an item is crawled, can be determined (and potentially mapped and used for queries). |
Crawled properties |
500,000 per Search service application |
These are properties that are discovered during a crawl. |
Managed properties |
100,000 per Search service application |
These are properties used by the search system in queries. Crawled properties are mapped to managed properties. We recommend a maximum of 100 mappings per managed property. Exceeding this limit might degrade crawl speed and query performance. |
Scopes |
Recommended limit of 200 per site |
This is a recommended limit per site. Exceeding this limit might degrade the crawl efficiency in addition to affecting end-user browser latency if the scopes are added to the display group. Also, the display of the scopes in search administration degrades as the number of scopes increases past the recommended limit. |
Display groups |
25 per site |
These are used for a grouped display of scopes through the user interface. Exceeding this limit will start degrading the search administration scope experience. |
Scope rules |
Recommended limit is 100 scope rules per scope, and 600 total per Search service application. |
Exceeding this limit will degrade freshness and delay potential results from scoped queries. |
Keywords |
Recommended limit of 200 per site collection |
The recommended limit can be exceeded up to the maximum (ASP.NET-imposed) limit of 5,000 per site collection with five Best Bets per keyword. Display of keywords on the site administration user interface will degrade. The ASP.NET-imposed limit can be modified by editing the Web.config and Client.config files (MaxItemsInObjectGraph). |
Authoritative pages |
Recommended limit of one top-level authoritative page, and as few as possible second-level and third-level pages, while achieving desired relevance. |
The hard limit is 200 per relevance level per Search service application, but adding pages might not achieve the desired relevance. Add the key site to the first relevance level. Add subsequent key sites as either second or third relevance levels, one at a time, evaluating relevance after each addition to ensure that the desired relevance effect is achieved. |
Alerts |
Recommended limit of 1,000,000 per Search service application |
This is the tested limit. |
Results removal |
100 URLS in one operation |
This is the maximum recommended number of URLs that should be removed from the system in one operation. |
Optimizations
The following sections discuss methods for improving farm performance.
Many factors can affect performance. These factors include the number of users; the type, complexity, and frequency of user operations; the number of post-backs in an operation; and the performance of data connections. Each of these factors can have a major effect on farm throughput. You should carefully consider each of these factors when you plan the deployment.
Capacity and performance of a search system is highly dependent on its topology. You can either scale up by increasing the capacity of the existing server computers or scale out by adding servers to the topology.
Search query system optimizations
In general, search query optimizations follow one of the following scenarios:
Users are complaining about query latency, so I have to decrease query latency.
Many more search requests than planned are occurring, and performance has started to degrade, so I have to increase query throughput.
Scaling out or scaling up the query subsystem always involves creating more query components. If you have excess capacity (RAM, IO, and CPU) on an existing query server, you can choose to scale up by creating more query components on that server, increasing RAM, CPU, or IO if you hit a bottleneck. Otherwise, you can choose to create more query components (or move the existing components) to a new server to scale out.
The following section shows various ways of adding query resources to the search query system.
How to reduce query latency
Adding query components to reduce latency
The following graph illustrates the effect of adding active query components on different servers without changing index size.
Add more active query components to retain sub-second query latency as the user load on the system (measured in simultaneous user queries) increases.
Adding query processors (Query and Site Settings Service) to reduce latency
The following graph illustrates the effect of adding active query processor services on different servers without changing any other parts of the query system.
Start other active instances of the Query and Site Settings Service on different servers to retain sub-second query latency as the user load on the system (measured in simultaneous user queries) increases.
Scale out to increase query throughput
Adding query components to increase query throughput
The following graph illustrates the effect of adding active query components on different servers without changing index size.
Add more active query components to increase query throughput as the user load on the system (measured in simultaneous user queries) increases.
Adding query processors (Query and Site Settings Service) to increase query throughput
The following graph illustrates the effect of adding active query processor services on different servers without changing any other parts of the query system.
Takeaway: Start other active instances of the Query and Site Settings Service on different servers to increase the throughput as the user load on the system (measured in simultaneous user queries) increases.
Search crawl system optimizations
In general, you perform search crawl optimizations when users complain about results that should be there but are not, or are there but are stale.
When you try to crawl the content source start address within freshness goals, you can run into the following crawl performance issues:
Crawl rate is low due to IOPS bottlenecks in the search crawl subsystem.
Crawl rate is low due to a lack of CPU threads in the search crawl subsystem.
Crawl rate is low due to slow repository responsiveness.
Each of these issues assumes that the crawl rate is low. See Use search administration reports (SharePoint Server 2010) (given the software life cycle phases) to establish a baseline for the typical crawl rate for the system over time. When this baseline regresses, the following sub-sections will show various ways of addressing these crawl performance issues.
Crawl IOPS bottleneck
After determining that a crawl database or property database is a bottleneck, you have to scale up or scale out the crawl system to address it by using the appropriate resolutions. The following table shows how adding IOPS (another crawl database) yields an improved crawl rate (until adding more components makes it the bottleneck again).
Takeaway: Always check the crawl database to make sure it is not the bottleneck. If crawl database IOPS are already bottlenecked, adding crawl components or increasing the number of threads does not help.
Topology (Crawl Component/ Crawl database) | CPU percent | RAM: buffer cache hit ratio % | Read latency | Write latency | Crawl speed (docs/sec) |
---|---|---|---|---|---|
2 / 1 database |
19.5 |
99.6 |
142 ms |
73 ms |
50 |
4 / 2 database |
8.502 |
99.55 |
45 ms |
75 ms |
~75 |
6 / 2 database |
22 |
99.92 |
55 ms |
1050 ms |
~75 |
Crawl CPU thread bottleneck
If you have a large number of hosts and have no other crawl bottlenecks, you have to scale up or scale out the crawl system by using the appropriate resolutions. The crawler can accommodate a maximum of 256 threads per Search service application. We recommend having a quad-core processor to realize the full benefit of the maximum number of threads. When it is conclusively determined that the repository is serving data fast enough (see the "Crawl bottleneck on repository" section later in this article), the crawl throughput can be increased by requesting data faster from the repository by increasing the number of crawler threads. This can be achieved in three ways as follows:
Change the indexer performance level to Partially Reduced or Maximum by using the following Windows PowerShell cmdlet:
Get-SPEnterpriseSearchService | Set-SPEnterpriseSearchService –PerformanceLevel “Maximum”
The Maximum value is used if you are using a processor with fewer than four cores.
Use crawler impact rules to increase the number of threads per host. This should take into consideration that a maximum of 256 threads is supported, and assigning a large number of threads to a few hosts might result in slower data retrieval from other repositories.
If there are a large number of hosts, the ideal solution is to add another crawl component on a separate indexer to crawl the hosts that are to be indexed faster.
The ideal way to seamlessly increase crawl throughput is to add another indexer if the search subsystem is not bottlenecked on IOPS and the repository is serving content fast.
Crawl bottleneck on repository
Sometimes, when a SharePoint Web application with many nested site collections or remote file shares is being crawled, the search crawler might be bottlenecked on the repository. A repository bottleneck can be identified if the following two conditions are true:
There is a low (less than 20 percent) CPU utilization on the crawl servers.
There is a large number of threads (almost all in the worst case) waiting on the network.
The bottleneck is identified by looking at the OSS Search Gatherer/Threads Accessing Network performance counter.
What this situation represents is that the threads are blocked while waiting for data from the repository. In an environment with multiple content sources, it might be useful to identify the host whose responsiveness is slow by pausing all other crawls, and then performing a crawl using the content source that has the suspected host as one of its start address.
When a problematic host has been identified, you need to investigate the cause of the slow response times. For SharePoint content in particular, refer to the Capacity management and sizing for SharePoint Server 2010 article.
The crawl throughput can be significantly improved by performance tuning the crawled data repositories.
Troubleshooting performance and scale issues
Understanding the load on the farm is critical before troubleshooting performance. The following section uses data from a live farm that contains 60 million items to show various system metrics at different phases in the search life cycle.
Performance samples during search life cycle
Metric | Index acquisition | Index maintenance | Index cleanup |
---|---|---|---|
SQL Server CPU (in %) Property database / crawl database |
14.8 / 19.13 |
35 / 55 |
11 / 63 |
SQL Server page life expectancy Property database / crawl database |
60,548 / 5,913 |
83,366 / 5,373 |
33,927 / 2,806 |
SQL Server average disk sec/ write Property database / crawl database |
9 ms / 48 ms MAX: 466 ms / 1,277 ms |
12 ms / 28 ms |
20 ms / 49 ms MAX: 253 ms / 1156 ms |
SQL Server average disk sec/ read Property database / crawl database |
8 ms / 43 ms MAX: 1,362 ms / 2,688 ms |
11 ms / 24 ms |
24 ms / 29 ms MAX: 2 039 ms / 2 142 ms |
Crawler CPU (in %) Index server 1 (2 crawl components) / Index server 2 (2 crawl components) |
18 / 11 |
25.76 / 21.62 |
8.34 / 7.49 Maximum peaks to 99% |
Disk Writes/Sec Index server 1 (2 crawl components) / Index server 2 (2 crawl components) |
64.32 / 42.35 |
93.31 / 92.45 |
99.81 / 110.98 |
Disk reads/sec Index server 1 (2 crawl components) / Index server 2 (2 crawl components) |
0.23 / 0.20 |
4.92 / 2.03 |
1.38 / 1.97 |
Average disk sec/ write Index server 1 (2 crawl components) / Index server 2 (2 crawl components) |
11 ms / 11 ms |
1 ms / 2 ms |
5 ms / 4 ms MAX: 1,962 ms / 3,235 ms |
Average disk sec/ read Index server 1 (2 crawl components) / Index server 2 (2 crawl components) |
1 ms / 2 ms |
12 ms / 11 ms |
10 ms / 16 ms MAX: 2,366 ms / 5,206 ms |
Troubleshooting query performance issues
SharePoint search has an instrumented query pipeline and associated administration reports that can help you troubleshoot server-based query performance issues. For more information, see Use search administration reports (SharePoint Server 2010). This section shows reports and then uses them to help understand how to troubleshoot issues on the server. In addition, this section also contains tools and guidance that is available to assist in addressing client-based (browser) performance issues.
Server-based query issues
Server-based query performance issues can be segregated into the following two levels:
Search front-end performance issues
Search back-end performance issues
The following two subsections give the details for troubleshooting each of them. Note that these are high-level guidelines.
Front-end performance issues
The first step in troubleshooting front-end performance should be reviewing the Overall Query Latency search administration report. The following is an example report:
In this report, front-end performance is represented by the following data series:
Server Rendering: This value represents, for the given minute, the average time spent per query in the various search Web Parts in the front-end Web server.
Object Model: This value represents, for the given minute, the average time spent in communication between the front-end Web server and the search back-end.
Troubleshooting server rendering issues
Server rendering issues can be affected by anything that occurs on the front-end Web server that is serving the search results page. In general, you want to understand how much time is being spent in retrieving the various Web Parts to find where the extra latency is being added. Enable the Developer Dashboard on the search results page for detailed latency reporting. Common issues that manifest as excess server rendering latency include the following:
Platform issues such as the following:
Slow Active Directory lookups
Slow SQL Server times
Slow requests to User Profile Application in people queries in SharePoint Server 2010 or all queries in FAST Search Server 2010 for SharePoint
Slow requests for fetching the user preferences
Slow calls to get the user’s token from secure token service
Code-behind issues such as modified search results pages (such as results.aspx and peopleresults.aspx) that are checked in but not published.
Troubleshooting object model issues
Object model issues can be affected by the following:
Issues with the Windows Communication Foundation (WCF) layer such as the following:
Timeouts and threadabortexception in WCF calls in the deployment.
Slow communication between the front-end Web server and application server. This can be due to IPsec issues or slow network connections.
Issues with communication between the content and service farms (if configured).
Back-end performance issues
The first step in troubleshooting back-end query performance should be reviewing the SharePoint Backend Query Latency search administration report. The following is an example report:
In this report, back-end performance is represented by the following data series (each is the average time spent per query, in the given minute), grouped by functional component:
Query Component:
- Full-text Query: The average time spent querying the full-text index for results.
Property database:
Multiple Results Retrieval: The average time spent retrieving document metadata, such as title or author, that is to appear in the query results.
Property Store Query: The average time spent querying the property database for property-based queries.
Search Administration database:
Best Bets: The average time spent determining whether there are Best Bets available for the query terms.
High Confidence Results: The average time spent retrieving high confidence results for queries.
Query Processor:
Security Trimming: The average time spent removing items the user does not have access to.
Duplicate Removal: The average time spent removing duplicates.
Results Population: The average time spent creating the in-memory table to be passed back to the object model.
Troubleshooting query component performance issues
Query components are resource intensive, especially when the component is active — that is, responding to query requests. Troubleshooting query component performance is one of the more complicated search areas. The following are general areas to consider:
The most resource intensive query component event is the master merge, where shadow indexes are merged with the master index. This event occurs independently for each query component. An example of the effect can be seen in the SharePoint Backend Query Latency report mentioned earlier in this article, at times prior to 1:30 P.M. If this event is affecting query latency, it is possible to define blackout periods during which a master merge event is avoided unless the percentage of change exceeds the defined limit.
Sustained high values for the environment indicate that you should probably do the following:
Examine the index size for each component on the server. Ensure that enough RAM exists on the server to allow approximately 33 percent of the sum of index sizes to be cached.
Examine the query component IO channel on the server. Ensure that you are not experiencing an IO bottleneck.
If IO and RAM are not the source of the performance issue, you should repartition the query components (adding index partitions), scaling out the additional query components to new servers.
Troubleshooting property database issues
Examine SQL Server health by using concepts in Storage and SQL Server capacity planning and configuration (SharePoint Server 2010). If you are executing custom queries, you might need to look at hints to guide the correct the query plan.
Troubleshooting search administration database issues
Examine SQL Server health by using concepts in Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).
Troubleshooting query processor issues
Troubleshooting query processor issues depends on which of the following areas of the query processor is affecting the query latency:
Security trimming:
For Windows claims, examine the Active Directory connection from the server that is hosting the query processor.
For all cases, the cache size that is used by the query processor can be increased if there is a correlation between a large number of SQL Server round trips (determined by SQL Server profiler). More than 25 percent of the queries should not need a SQL Server call to retrieve security descriptors from the search administration database. If they do, adjust the query processor cache size.
Duplicate removal:
- Look at whether you are crawling the same content in multiple places. Disable duplicate detection in the Search Center.
Multiple results retrieval:
- Examine SQL Server health by using concepts in Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).
Browser-based query issues
Users can be either delighted or exasperated by the speed of search results. Page-load time is one of the important factors in users' satisfaction with the search experience. Most of the focus around page-load time is on the server side, specifically the time it takes the server to return results. Client-side rendering, however, can make up a significant portion of page-load time and is important to consider.
The search user experience is designed to provide sub-second responses for total page-load time. Out of that time, client rendering typically takes less than 280 milliseconds, depending upon the browser and rendering measurement. This experience delights users with very fast results.
Customizations to the results experience can easily degrade rendering performance. Search administrators and developers must be vigilant in measuring the rendering time after each modification to ensure that performance has not regressed significantly. Every addition to the page, from a new Web Part to a new Cascading Style Sheet style, will increase rendering time on the browser and delay results for your users. The amount of delay, however, can vary greatly based on whether you follow best practices when you customize the page.
The following are some general guidelines:
Basic branding and style customizations to the page should not add more than approximately 25 ms to page-load time. Measure the page-load time before and after you implement customizations to observe the change.
Users typically notice a change (faster or slower) in an increment of 20 percent. Keep this in mind when making changes. 20 percent of the standard rendering time is only 50 ms. (Source: Designing and Engineering Time)
Cascading style sheets and JScript are the most common and largest causes of high rendering performance. If you must have customized cascading style sheets and JScript, you should ensure that they are minimized to one file each.
JScript can load on-demand after the page renders to provide the user with visible results sooner. Details of how to do this are discussed in the performance considerations article.
The more customizations that are added to the page, the slower it will load. Consider whether the added functionality and style is worth the extra delay of results for your users.
In addition to these guidelines, there is a great deal of information on the Internet about how to reduce page-load time and about the effect of slow pages on the user experience.
Troubleshooting crawl performance issues
SharePoint search can experience bottlenecks in the crawl sub-system as the system moves through the index acquisition, maintenance, and deletion phases. To effectively troubleshoot crawl performance issues, the Search Health Monitoring Reports should be used with the "Common bottlenecks and their causes" section later in this article to isolate crawl issues.
Troubleshooting during the index acquisition phase
The first place to identify crawl issues is the Crawl Rate Per Content Source health report. As shown later in this article, the report gives an overview of the crawl rate for each of the content sources in the system. In general, the crawl rate should be more than 15 documents/sec for a people content source and more than 35 documents/sec for all other types of content sources.
When a content source with suboptimal crawl rate is identified, we recommend the following steps:
Pause all other crawls except the content source that is under investigation. Did the crawl rate improve beyond the specified 15 to 35 documents/sec goal?
If the previous step does not help, then ensure that the repository that is being crawled is responsive enough and is not the cause for the slow crawl. Refer to the "Crawl bottleneck on repository" section earlier in in this article.
If the repository is not the bottleneck, identify the bottleneck in the crawl server or database server and optimize around them. Guidance can be found in the "Crawl IOPS bottleneck" and "Crawl CPU thread bottleneck" sections earlier in this article.
Troubleshooting during the index maintenance phase
The primary goal during the index maintenance phase is to keep the index as fresh as possible. Two of the key indicators are the following:
Index freshness: Are the crawls finishing in their budgeted time and in accordance with the IT guidelines for index freshness?
Incremental crawl speed: If the index freshness goal is not met, investigate whether the incremental crawl speeds are 10 docs/sec for people content sources and 35 documents/sec for all other content sources. If the incremental crawl speeds are suboptimal, a bottleneck analysis should be performed on the crawled repository and the crawl subsystem as described above.
Common bottlenecks and their causes
During performance testing, several different common bottlenecks were revealed. A bottleneck is a condition in which the capacity of a particular constituent of a farm is reached. This causes a plateau or decrease in farm throughput.
The following table lists some common bottlenecks and describes their causes and possible resolutions.
Bottleneck | Symptom (performance counter) | Resolution |
---|---|---|
Database RAM |
Property database, Search administration database exhibit: SQL Server buffer manager/ page life expectancy < 300(s) (should be > 1000 (s)) SQL Server buffer manager/ buffer cache hit ratio < 96% (should be > 98%) |
Add memory to the database server. Defragment the property database, if the weekly defrag rule has been disabled. Ensure that you are using SQL Server 2008 Enterprise edition, to enable page compression. Move the database to a separate server, and add multiple property databases, if they are necessary. |
Database server IOPS |
A property database or crawl database exhibits the following: Average disk sec/read and Average Disk sec/write ~50 ms or > 50 ms |
Ensure that the database server has enough RAM to keep 33 percent of the critical tables (MSSDocSDIDs + MSSDocProps + MSSDocresults) in cache. Increase the dedicated number of IOPS for the database by doing the following: Use different storage arrays Optimize storage configuration; for example, by adding spindles (disk drives) to the storage array. Run the SharePoint Health Analyzer Search - One or more property databases have fragmented indices rule, if it has been disabled. Run the SharePoint Health Analyzer Search - One or more crawl databases have fragmented indices rule. Ensure that you are using SQL Server 2008 Enterprise edition, to enable page compression. Move database to separate server, adding multiple property databases or crawl databases or both, if they are necessary. |
Query component IOPS |
The logical disk used for a query component’s index exhibits the following: Average Disk sec/read and Average disk sec/write ~30 ms or > 30 ms for a sustained period of time (that is, most of the day; not only during an index merge). |
Ensure that each application server has enough RAM to keep 33 percent of each active query component's index (on that server) in cache (operating system cache). Increase the dedicated number of IOPS for the drive that is used for the query component’s index by doing the following: Use different storage arrays for different components. Optimize storage configuration; for example, by adding spindles (disk drives) to the storage array. |
About the author
Brion Stone is a Senior Program Manager for SharePoint Server Search at Microsoft.