SharePoint Ports, Proxies and Protocols .... Search Communication
Following on from the blog that Martin Kearn posted , I wanted to expand on some of the mysteries of the communication that SharePoint uses for enterprise search. While we were putting together the material for the TechEd talk, this was by and large the most interesting communication section to work on.
Almost all administration communication within SharePoint is conducted over web services (HTTP/HTTPs traffic). By and large, Enterprise Search is the same, with the unsurprising exception of the search index propagation, and the rather surprising exception of search queries.
Note: Search in this article refers the Microsoft Office SharePoint Search service, which is distinct from the Windows SharePoint Services search service.
Administration
Administration of the search service takes place over the Search Administration web service. The service is located in an IIS Web site called “Office Server Web Services” on each server that is part of a SharePoint farm. The site holds entries for each service such as Search or Excel Services, for each Shared Service Provider present on the farm.
The web site is configured by default to run on port 56737, or 56738 if SSL is being used. This can be changed with the STSADM command:
stsadm -o setsspport –httpport <HTTP port number> -httpsport <HTTPS port number>
The Search administration web service is specified in the file SearchAdmin.asmx. The full path to the search admin web service is therefore (for http traffic):
https://<FQDN>:56737/<SharedServiceProviderName>/Search/SearchAdmin.asmx
The administration service provides all the methods necessary to control the Search service, such as starting content source index crawls, updating scopes, etc. The web service is available to be called by custom applications as well as by the system.
Crawling
The protocols that are used during search crawling depend on the content source that is being crawled. Which protocol is used for crawling sources is handled by a Protocol Handler, an object responsible for fetching the content to be indexed. By default, SharePoint comes with protocol handlers for the following protocols (from the msdn article Plan to crawl content (Search Server 2008) (https://technet.microsoft.com/en-us/library/cc280343.aspx) :
Protocol handler |
Used to crawl |
File |
File shares |
http |
Web sites |
https |
Web sites over Secure Sockets Layer (SSL) |
Notes |
Lotus Notes databases |
Rb |
Exchange public folders |
Rbs |
Exchange public folders over SSL |
Sps |
People profiles from Windows SharePoint Services 2.0 server farms |
Sps3 |
People profile crawls of Windows SharePoint Services 3.0 server farms only |
Sps3s |
People profile crawls from Windows SharePoint Services 3.0 server farms only over SSL |
Spsimport |
People profile import |
Spss |
People profile import from Windows SharePoint Services 2.0 server farms over SSL |
Sts |
Windows SharePoint Services 3.0 root URLs (internal protocol) |
Sts2 |
Windows SharePoint Services 2.0 sites |
Sts2s |
Windows SharePoint Services 2.0 sites over SSL |
Sts3 |
Windows SharePoint Services 3.0 sites |
Sts3s |
Windows SharePoint Services 3.0 sites over SSL |
Custom Protocol handlers can be written to fetch content from disparate sources. For more information, refer to Creating a Protocol Handler (https://msdn.microsoft.com/en-us/library/ms947581.aspx) . Each protocol handler is free to use whichever communication protocol it wishes to. For accessing external data, searching of Business Data Catalog information is in most cases the preferred solution. For more details refer to Enabling Business Data Search (https://msdn.microsoft.com/en-us/library/ms492695.aspx).
Index Propagation
Indexing and querying both make use of the Server Message Block (SMB) protocol to transfer data.
The SMB protocol was originally invented at IBM with the intention of rendering network file access available with the same ease as local file access. Around 1990, Microsoft merged the protocol with the LanManager product, and continued to develop it as a means for sharing files and folders, printers and miscellaneous other communication.
The SMB protocol was originally intended to run over NetBIOS, but from Windows 2000 was modified to run over TCP port 445, which it currently uses. With Windows Vista, Microsoft released SMB 2.0, which has several enhancements over the original protocol.
Given that SMB was designed for file and folder sharing, it comes as no surprise that the index propagation is done over SMB, and consists of partial file copies to the search index shared folder location.
This is a shared folder created on each Query server in a SharePoint farm, and although configurable when the Search Query role is activated on a server, is usually configured as \\<servername>\searchindexpropagation. By default, this location usually shares the folder at C:\Program Files\Microsoft Office Servers\12.0\Data\Applications\<shared service provider GUID>.
Search propagation is a co-ordinated effort between the Search Service on the Index server, the Search Service on the Query server, the database, and the file system, using the SMB protocol. The following diagram, taken from the public document describing the Search Index Propagation protocol [MS-CIPROP]: Index Propagation Protocol Specification (https://msdn.microsoft.com/en-us/library/cc313077.aspx), describes the interaction.
In this diagram, the top-right block refers to the SMB propagation of index files, which takes place as a standard file share copy.
Search Querying
Perhaps the biggest surprise is that search queries are issued from the Web Front-End (WFE) to the Search Query server using the SMB protocol. It would seem that this is a prime candidate for a web service query, and the fact that SMB is used has implications for extranet server topology design.
For example, if you design a SharePoint infrastructure architecture where the WFE’s are located in a separate segment of the perimeter network, and the rest of the servers in the farm are located within a more secure segment of the network (a form of the Back to back perimeter topology ), the SMB protocol will need to be opened in the firewall between the two network segments.
In the above diagram, Router A will need SQL Server ports and SMB ports to be allowed through. This means essentially that file-share access is enabled through Router A.
So why would the Search service use SMB?
The answer is performance – it turns out that SMB is used as the transport-level protocol for the Named Pipes . Named Pipes is a Microsoft Inter-Process Communication (IPC) mechanism which is binary and fast. For a long time it was the de facto communication mechanism in and across Windows Servers. For a long time it was the default communication mechanism for SQL Server, and is still available as a protocol for the server product. By using SMB as the transport layer, Microsoft provided Named Pipes as an IPC that was fast and efficient.
Perhaps some clue can be gathered in the Win32 API - to open an IO device, the CreateFile method is called. This API call is responsible for opening files, directories, physical volumes – as well as IO devices such as tape writers, parallel ports, and pipes.
Summary
Inter-server communication is something which almost always turns out to be slightly more complex than it first seems, and this is absolutely the case for Enterprise Search within MOSS. Enterprise search involves several processes and communication mechanisms. This has impact on all aspects of server farm design and maintenance, and is crucial to understand when troubleshooting search problems.
The first port of call to understand all of this should be the SharePoint Back-end protocol documents (https://msdn.microsoft.com/en-us/library/cc339473.aspx), which detail each of the processes and interactions, as well as communication mechanisms.
|
Peter Reid SharePoint Consultant Microsoft Consulting Services UK Click here to see my bio |
Comments
- Anonymous
February 21, 2010
Nice, diect, deep facts on the guts of MOSS search that I haven't seen elsewhere. Thank you!